KGLiDS provides predefined operations in form of python apis that allow seamless integration with a conventional data science pipeline.
List of all APIs available:
| S.no | API | Description |
|---|---|---|
| 1. | query() |
Executes ad-hoc queries on fly |
| 2. | show_graph_info() |
Summarizes the information captured by KGLiDS. Shows: 1. Total number of datasets abstracted 2. Total number of tables abstracted 3. Total number of columns abstracted 4. Total number of pipelines abstracted |
| 3. | get_datasets_info() |
Shows the number of tables and pipelines per dataset |
| 4. | get_tables_info() |
Shows all tables alongside their physical file path and dataset |
| 5. | search_tables_on() |
Searches tables containing specific column names. |
| 6. | recommend_k_unionable_tables() |
Returns the top k tables that are unionable |
| 7. | recommend_k_joinable_tables() |
Returns the top k tables that are joinable |
| 8. | get_path_between_tables() |
Visualizes the paths between a starting table and the target one |
| 9. | get_pipelines_info() |
Shows the following information for all pipeline: 1. Pipeline name 2. Dataset 3. Author 4. Date written on 5. Number of votes 6. Score |
| 10. | get_most_recent_pipeline() |
Returns the most recent pipeline |
| 11. | get_top_k_scoring_pipelines_for_dataset() |
Returns the top k pipeline with the highest score |
| 12. | search_classifier() |
Shows all the classifiers used for a dataset |
| 13. | get_hyperparameters() |
Returns the hyperparameter values that were used for a given classifier |
| 14. | get_top_k_library_used() |
Visualizes the top-k libraries that were used overall or for a given dataset |
| 15. | get_top_used_libraries() |
Retrieve the top-k libraries used in a particular task. Task here could be: 1. Classification 2. Clustering 3. Regression 4. Visualization |
| 16. | get_pipelines_calling_libraries() |
Returns a list of pipelines matching the criteria along with other important metadata, such as author, language, etc. |
| 17. | recommend_transformations() |
Returns the possible set of transformation for tables |
kglids.query()from api.api import KGLiDS import pandas as pd kglids = KGLiDS() my_custom_query = """ SELECT ?Source { ?source_id rdf:type kglids:Source ; schema:name ?source . } """ kglids.query(my_custom_query)
| Source | |
|---|---|
| 0. | kaggle |
kglids.show_graph_info()kglids.show_graph_info()
| Datasets | Tables | Columns | Pipelines | |
|---|---|---|---|---|
| 0. | 101 | 969 | 418 | 9502 |
kglids.show_dataset_info()kglids.show_dataset_info()
| Dataset | Number_of_tables | |
|---|---|---|
| 0 | COVID-19 Corona Virus India Dataset | 8 |
| 1 | COVID-19 Dataset | 6 |
| 2 | COVID-19 Healthy Diet Dataset | 5 |
| 3 | COVID-19 Indonesia Dataset | 1 |
| 4 | COVID-19 World Vaccination Progress | 2 |
| ... | ... | ... |
| 96 | uciml.red-wine-quality-cortez-et-al-2009 | 22 |
| 97 | unitednations.international-greenhouse-gas-emi... | 3 |
| 98 | upadorprofzs.testes | 8 |
| 99 | vitaliymalcev.russian-passenger-air-service-20... | 14 |
| 100 | ylchang.coffee-shop-sample-data-1113 | 10 |
kglids.show_table_info()kglids.show_table_info()
Showing all available table(s):
| Table | Dataset | Path_to_table | |
|---|---|---|---|
| 0 | state_level_daily.csv | COVID-19 Corona Virus India Dataset | /data/datasets/data_lake/COVID-19 Coro... |
| 2 | patients_data.csv | COVID-19 Corona Virus India Dataset | /data/datasets/data_lake/COVID-19 Coro... |
| 3 | nation_level_daily.csv | COVID-19 Corona Virus India Dataset | /data/datasets/data_lake/COVID-19 Coro... |
| ... | ... | ... | ... |
| 414 | 201904 sales reciepts.csv | ylchang.coffee-shop-sample-data-1113 | /data/datasets/data_lake/ylchang.coffe... |
| 415 | sales_outlet.csv | ylchang.coffee-shop-sample-data-1113 | /data/datasets/data_lake/ylchang.coffe... |
| 416 | product.csv | ylchang.coffee-shop-sample-data-1113 | /data/datasets/data_lake/ylchang.coffe... |
| 417 | Dates.csv | ylchang.coffee-shop-sample-data-1113 | /data/datasets/data_lake/ylchang.coffe... |
kglids.get_tables_info(dataset='UK COVID-19 Data')
Showing table(s) for 'UK COVID-19 Data' dataset:
| Table | Dataset | Path_to_table | |
|---|---|---|---|
| 0 | UK_Devolved_Nations_COVID_Dataset.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
| 1 | UK_Local_Authority_UTLA_COVID_Dataset.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
| 2 | England_Regions_COVID_Dataset.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
| 3 | UK_National_Total_COVID_Dataset.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
| 4 | NEW_Official_Population_Data_ONS_mid-2019.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
| 5 | Populations_for_UK_and_Devolved_Nations.csv | UK COVID-19 Data | /data/datasets/data_lake/UK COVID-19 D... |
kglids.show_table_info()table_info = kglids.search_tables_on(conditions=[['player', 'club']]) table_info
Showing recommendations as per the following conditions: Condition = [['player', 'club']]
| Dataset | Table | Number_of_columns | Number_of_rows | Path_to_table | |
|---|---|---|---|---|---|
| 0 | FIFA 21 complete player dataset | players_21.csv | 106 | 18944 | /data/datasets/data_lake/FIFA 21 compl... |
| 1 | FIFA 21 complete player dataset | players_20.csv | 106 | 18483 | /data/datasets/data_lake/FIFA 21 compl... |
| 2 | FIFA 20 complete player dataset | players_20.csv | 104 | 18278 | /data/datasets/data_lake/FIFA 20 compl... |
| 3 | FIFA 21 complete player dataset | players_19.csv | 106 | 18085 | /data/datasets/data_lake/FIFA 21 compl... |
| 4 | FIFA 20 complete player dataset | players_19.csv | 104 | 17770 | /data/datasets/data_lake/FIFA 20 compl... |
| 5 | FIFA 20 complete player dataset | players_18.csv | 104 | 17592 | /data/datasets/data_lake/FIFA 20 compl... |
| 6 | FIFA 21 complete player dataset | players_18.csv | 106 | 17954 | /data/datasets/data_lake/FIFA 21 compl... |
| 7 | FIFA 21 complete player dataset | players_17.csv | 106 | 17597 | /data/datasets/data_lake/FIFA 21 compl... |
| 8 | FIFA 20 complete player dataset | players_17.csv | 104 | 17009 | /data/datasets/data_lake/FIFA 20 compl... |
| 9 | FIFA 20 complete player dataset | players_16.csv | 104 | 14881 | /data/datasets/data_lake/FIFA 20 compl... |
| 10 | FIFA 21 complete player dataset | players_16.csv | 106 | 15623 | /data/datasets/data_lake/FIFA 21 compl... |
| 11 | FIFA 21 complete player dataset | players_15.csv | 106 | 16155 | /data/datasets/data_lake/FIFA 21 compl... |
| 12 | FIFA 20 complete player dataset | players_15.csv | 104 | 15465 | /data/datasets/data_lake/FIFA 20 compl... |
| 13 | open-source-sports.mens-professional-basketball | basketball_player_allstar.csv | 23 | 1609 | /data/datasets/data_lake/open-source-s... |
| 14 | open-source-sports.mens-professional-basketball | basketball_draft.csv | 11 | 9003 | /data/datasets/data_lake/open-source-s... |
| 15 | open-source-sports.mens-professional-basketball | basketball_awards_players.csv | 6 | 1719 | /data/datasets/data_lake/open-source-s... |
| 16 | FIFA22 OFFICIAL DATASET | FIFA22_official_data.csv | 65 | 16710 | /data/datasets/data_lake/FIFA22 OFFICI... |
| 17 | FIFA22 OFFICIAL DATASET | FIFA21_official_data.csv | 65 | 17108 | /data/datasets/data_lake/FIFA22 OFFICI... |
| 18 | FIFA22 OFFICIAL DATASET | FIFA20_official_data.csv | 65 | 17104 | /data/datasets/data_lake/FIFA22 OFFICI... |
| 19 | FIFA22 OFFICIAL DATASET | FIFA19_official_data.csv | 64 | 17943 | /data/datasets/data_lake/FIFA22 OFFICI... |
| 20 | FIFA22 OFFICIAL DATASET | FIFA18_official_data.csv | 64 | 17927 | /data/datasets/data_lake/FIFA22 OFFICI... |
| 21 | FIFA22 OFFICIAL DATASET | FIFA17_official_data.csv | 63 | 17560 | /data/datasets/data_lake/FIFA22 OFFICI... |
kglids.recommend_k_unionable_tables(table_info: pandas.Series, k: int)recommendations_union =kglids.recommend_k_unionable_tables(table_info.iloc[0], k = 5) recommendations_union
Showing the top-5 unionable table recommendations:
| Dataset | Recommended_table | Score | Path_to_table |
|---|---|---|---|
| 0 | FIFA 20 complete player dataset | players_20.csv | 1.00 |
| 1 | FIFA 20 complete player dataset | players_19.csv | 0.85 |
| 2 | FIFA 20 complete player dataset | players_18.csv | 0.85 |
| 3 | FIFA 20 complete player dataset | players_17.csv | 0.85 |
| 4 | FIFA 20 complete player dataset | players_15.csv | 0.84 |
kglids.recommend_k_joinable_tables(table_info: pd.Series, k: int)recommendations_join = kglids.kglids.recommend_k_joinable_tables((table_info.iloc[0], k = 2) recommendations_join
Showing the top-2 joinable table recommendations:
| Dataset | Recommended_table | Score | Path_to_table |
|---|---|---|---|
| 0 | FIFA 20 complete player dataset | players_20.csv | 1.0 |
| 1 | FIFA22 OFFICIAL DATASET | FIFA22_official_data.csv | 0.5 |
kglids.get_path_between_tables(source_table_info: pd.Series, target_table_info: pd.Series, hops: int)kglids.get_path_between_tables(table_info.iloc[0], recommendations_join.iloc[1], hops=1)
