KGTOSA
README.md

KG-TOSA: A Task-Oriented graph SAmpler for GNN

Fig.1: The TOSG’s generic graph pattern is based on two parameters: (i) the direction (outgoing and incoming) predicates, and (i) the number of hops.

Installation

  • Clone the KGTOSA repo
  • Create KGTOSA Conda environment (Python 3.8) and install pip requirements.
  • Activate the KGTOSA environment
conda activate KGTOSA

Extract TOSG triples:

  1. Node Classification
python -u TOSG_Extraction/TOSG_Extraction_NC.py --sparql_endpoint http://206.12.98.118:8890/sparql --graph_uri http://dblp.org --target_rel_uri https://dblp.org/rdf/schema#publishedIn --TOSG d1h1 --batch_size 1000000 --out_file DBLP-15M_PV --threads_count 32  
  1. Link Prediction
python -u TOSG_Extraction/TOSG_Extraction_LP.py --target_rel_uri=isConnectedTo --data_path=<path> --dataset=YAGO3-10 --TOSG=d1h1 --file_sep=tab

Transform NC TOSG dataset into PYG dataset

python -u DatasetTransformer/TSV_TO_PYG_dataset.py --traget_node_type=Paper --target_rel=publishedIn --csv_path=<path> --dataset_name=DBLP-15M_PV_d1h1 --file_sep=tab --split_rel=publish_year 

Download KGTOSA NC datasets

  • MAG_42M_PV_FG
  • MAG_42M_PV_d1h1
  • DBLP-15M_PV_FG
  • DBLP-15M_PV_d1h1
  • YAGO4-30M_PC_FG
  • YAGO4-30M_PC_d1h1
  • Download KGTOSA LP datasets

  • YAGO3-10_FG_d2h1
  • WikiKG2_FG_d2h1
  • DBLP2023-010305_FG_d2h1
  • Reproduce KGTOSA Results:

    1. Node Classification
    # run RGCN  
    python rgcn-KGTOSA.py --Dataset <DatasetPath>
    # run GraphSaint  
    python graph_saint_KGTOSA.py --Dataset <DatasetPath>
    # run ShaDowSaint  
    python graph_saint_Shadow_KGTOSA.py --Dataset <DatasetPath>
    # run SeHGNN  
    python SeHGNN/ogbn/main.py --Dataset <DatasetPath>
    # run IBS
    python  IBS/run_ogbn_ppr.py --with config/<Config_path>  
    
    1. Link Prediction
      extract the dataset folder under the data folder under each method path
    # run RGCN  
    python RGCN/main.py --Dataset <DatasetName> --TargetRel <target_rel>
    # run MorsE  
    python Morse/main.py --dataset <DatasetName> --TargetRel <target_rel
    # run LHGNN  
    python LHGNN/main.py --dataset <DatasetName> --TargetRel <target_rel