CustomLLM / README.md
README.md
Raw

Complying with the EU AI Act - Rationale and Code

Introduction

Welcome to the code repository for the project titled "Complying with the EU AI Act." This repository contains the code and resources related to exploring areas that organizations should focus on when considering compliance with the Artificial Intelligence Act (AIA) in the European Union. The project is divided into two main parts:

Part 1: Questionnaire Data Analysis

In the first part of the project, we delve into analyzing questionnaire data to gain insights and assess compliance with the EU AI Act. This analysis is crucial in understanding an organization's current state of compliance. Please refer to the "data_processing-3.pdf" file to access the rules set up as part of the rule-based system and examine the code used to generate the scores. It's important to note that the complete CSV data is not published due to privacy and confidentiality considerations.

Part 2: Custom Language Models (LLMs)

The second part of our project focuses on developing custom Large Language Models (LLMs) for organizations to help them with their questions concerning the AIA. Different LLMs are created and compared to assist organizations in meeting the requirements of the EU AI Act. This part involves two libraries:

SearchWithOpenAI (SWOAI)

The "SWOAI_AIA_2.zip" archive contains the adapted files for the SearchWithOpenAI model. This model uses many files, most of which are unchanged from the source code. Not all files are copied to this repository due to copyright. Therefore, if you want to try this model yourself, it is best to go to the original repo. It is easy to set this model up, and using the cluster data that we used, one will achieve similar results.

Llama Index

The "llama_index.py" script is used for the Llama Index model. This model is well documented. Using only this file should work, but you need an API key.

Cluster Data

AIA.zip contains the different clusters and can be used to train your own custom LLM with and to figure out which documents give the best results.

Repository Contents

Here's a brief overview of the files within this repository:

  • data_processing-3.pdf: The code for the rule-based system to analyze questionnaires.
  • SWOAI_AIA_2.zip: Resources and files for the SearchWithOpenAI model.
  • aia.zip: The files of the training clusters
  • db.zip: The DB for SWOAI
  • llama_index.py: The script for the Llama Index custom Language Model.

Getting Started

This repo can be used in two ways:

  • Fill out the questionnaire for your organization and compare your answers using the rules given in the data_provessing-3.pdf.
  • Re-build the CustomLLM and examine what documents should be used as fine-tuning data to get the best result for your specific needs.

We hope this repository is a valuable resource in your journey towards compliance with the EU AI Act. If you have any questions or need further assistance, please don't hesitate to reach out.

Thank you!