# KGLac Knowledge Graph Builder
The KG builder component is responsible for leveraging the profiles stored in the document database to build the KG. To do so, it uses RDF-star to represent the entities and their relationships. the KG builder uses the profiles to determine the different relationships between the entities. The relationships include semanticSimilarity, schemaSimilairty, primary key-secondary key (pkfk), and inclusion dependency. The generated graph will be then hosted on a RDF store supporting RDF-star (for now Apache Jena, Blazegraph)
### Components
------
#### SRC:
1. **api**
1. api.py: Contains the different apis to be used in jupyter notebook.
2. **dataanalysis.py:** This folder was used to determine the semanticSimilarities before using the embedding. It should be removed.
3. **enums:**
1. relation.py: An enum definining the various relationships reflected in the KG.
4. **out:** A folder containing the resultant KG
1. triples.ttl: a file containing the tripes of the KG in a turtle serialization.
5. **storage:** A package containing the different source files dedicated to handle query the document DB and RDF store.
1. elasticsearch_client.py: A source file containing the functions used to communicate with elasticsearch.
2. kglac_client.py: A source file containing the functions used to communicate with the the RDF store. The function in this file are the implementation of function used in the api.py source file under **api** package.
3. kwtype.py: An enu containing some metadata that was previously used. This should omitted.
4. query_templates: A source file containing the query templates to used by the kglac_client.py functions to interact with the RDF store.
6. **word_embedding:** A package containing source files aimed to launch the word embedding server used for to determine the existence of the semanticSimilarity
1. embeddings_client: A source file representing the client to the embedding server.
2. libclient.py: A source file containing core functionalities used by the embeddings_client source file
3. word_embeddings: A source file containing the core functionalities offered by the server.
4. word_embeddings_services.py: A source file containing the servcies offered by the embedding server. Here we also specify the path of the embeddings to load.
7. config.py: A source file containing parameters to run the KG builder.
8. label.py: A class to represent the object of the label predicate. The object in addition to the text of the label it should specify the language.
9. rdf_builder.py: A source file containing the different functions used to create the entities, determine the different relationships mentioned above and dump the triples in the file triple.ttl found under **out**.
10. rdf_builder_coordinator.py: The main source file to run the fire the KG builder. It interacts with te various stores and the rdf_builder to create the KG.
11. rdf_resource.py: A class for an RDF triple component. It can be the subject, predicate, or the object.
12. triplet.py: A class to represent the triples componsed. It uses recursion to model rdf-star.
13. utils.py: A source file containing helpful functions like generating the label which is used across different source files in different packages like api.py and rdf_builder.py. It also contains the code to generate the graph visualization using the graphviz library.
#### tests
This folder contains the different tests to run to make sure that the code works properly.
### How to run?
------
1. Connect to the vm and run elasticsearch:
1. Open terminal and connect to the vm
```
ssh -i path/to/ahmed-keypairs.pem ubuntu@206.12.92.210
```
2. Go the folder of **app_servers**
```
cd /mnt/discovery/app_servers
```
3. Run ES 7.10.2
```
elasticsearch-7.10.2/bin/elasticsearch
```
Run ES 7.10.2**Note:** You can use kibana as a UI to see the profiles and the raw_data stored on the different indexes (profiles and raw_data respectively)
2. Run the the KG builder
```
python rdf_builder_coordinator.py -opath out
```
3. Launch Blazegraph
1. Open a new terminal and connect to the vm
```
ssh -i path/to/ahmed-keypairs.pem -L 9999:localhost:9999 ubuntu@206.12.92.210
```
2. Run Blazegraph
```
cd /mnt/discovery/app_servers/blazegraph
java -server -Xmx4g -jar blazegraph.jar
```
3. Create a namespace
1. Open your browser
2. Go to http://localhost:9999/blazegraph
3. Go to namespaces.
4. Create your NAMESPACE by specifying the name and set the mode to rdr to support rdf-star.
5. Go to UPDATE. If you will upload the data specify the RDF DATA type and Turtle-RDR in the format. Elase speficy the path of triples.ttl
6. Once the the data is loaded, go to WELCOME to start writing your query or using the APIs.