# KG-TOSA: A Task-Oriented graph SAmpler for GNN
Fig.1: The TOSG’s generic graph pattern is based on two parameters: (i) the direction (outgoing and incoming) predicates, and (i) the number of hops.
## Installation * Clone the `KGTOSA` repo * Create `KGTOSA` Conda environment (Python 3.8) and install pip requirements. * Activate the `KGTOSA` environment ```commandline conda activate KGTOSA ``` Extract TOSG triples: 1. Node Classification ```python python -u TOSG_Extraction/TOSG_Extraction_NC.py --sparql_endpoint http://206.12.98.118:8890/sparql --graph_uri http://dblp.org --target_rel_uri https://dblp.org/rdf/schema#publishedIn --TOSG d1h1 --batch_size 1000000 --out_file DBLP-15M_PV --threads_count 32 ``` 2. Link Prediction ```python python -u TOSG_Extraction/TOSG_Extraction_LP.py --target_rel_uri=isConnectedTo --data_path= --dataset=YAGO3-10 --TOSG=d1h1 --file_sep=tab ``` Transform NC TOSG dataset into PYG dataset ```python python -u DatasetTransformer/TSV_TO_PYG_dataset.py --traget_node_type=Paper --target_rel=publishedIn --csv_path= --dataset_name=DBLP-15M_PV_d1h1 --file_sep=tab --split_rel=publish_year ``` Download KGTOSA NC datasets
  • MAG_42M_PV_FG
  • MAG_42M_PV_d1h1
  • DBLP-15M_PV_FG
  • DBLP-15M_PV_d1h1
  • YAGO4-30M_PC_FG
  • YAGO4-30M_PC_d1h1
  • Download KGTOSA LP datasets
  • YAGO3-10_FG_d2h1
  • WikiKG2_FG_d2h1
  • DBLP2023-010305_FG_d2h1
  • Reproduce KGTOSA Results: 1. Node Classification ```python # run RGCN python rgcn-KGTOSA.py --Dataset # run GraphSaint python graph_saint_KGTOSA.py --Dataset # run ShaDowSaint python graph_saint_Shadow_KGTOSA.py --Dataset # run SeHGNN python SeHGNN/ogbn/main.py --Dataset # run IBS python IBS/run_ogbn_ppr.py --with config/ ``` 2. Link Prediction
    extract the dataset folder under the data folder under each method path ```python # run RGCN python RGCN/main.py --Dataset --TargetRel # run MorsE python Morse/main.py --dataset --TargetRel --TargetRel