kglids
repokglids
Conda environment (Python 3.8) and install pip requirements.kglids
environmentconda activate kglids
Try the Sample KGLiDS Colab notebook for a quick hands-on!
Generating the LiDS graph:
# sample configuration # list of data sources to process data_sources = [DataSource(name='benchmark', path='/home/projects/sources/kaggle', file_type='csv')]
cd kg_governor/data_profiling/src/
python main.py
cd kg_governor/knowledge_graph_construction/src/
python data_global_schema_builder.py
cd kg_governor/pipeline_abstraction/
python pipelines_analysis.py
Uploading LiDS graph to the graph-engine (we recommend using Stardog):
stardog-admin db create -o edge.properties=true -n Database_name
stardog data add --format turtle Database_name dataset_graph.ttl
stardog data add --format turtle Database_name default.ttl library.ttl
import os import stardog database_name = 'Database_name' connection_details = { 'endpoint': 'http://localhost:5820', 'username': 'admin', 'password': 'admin'} conn = stardog.Connection(database_name, **connection_details) conn.begin() ttl_files = [i for i in os.listdir(graphs_dir) if i.endswith('ttl')] for ttl in ttl_files: conn.add(stardog.content.File(graphs_dir + ttl), graph_uri= 'http://kglids.org/pipelineResource/' conn.commit() conn.close()
Using the KGLiDS APIs:
KGLiDS provides predefined operations in form of python apis that allow seamless integration with a conventional data science pipeline. Checkout the full list of KGLiDS APIs
To store the created knowledge graph in a standardized and well-structured way,
we developed an ontology for linked data science: the LiDS Ontology.
Checkout LiDS Ontology!
The following benchmark datasets were used to evaluate KGLiDS:
Dataset Discovery in Data Lakes
Kaggle
See the full list of supported APIs here.
If you find our work useful, please cite it in your research.
This repository is part of our submission. We will make it available to the public research community upon acceptance.
For any questions please contact us: