Competition Link: https://hackaichallenge.devpost.com/
LLMs hallucinate and cannot think critically. They miss simple questions like "How many r's in strawberry" and "Is 2.11 greater than 2.9". We believe this is because they only "think" associatively, and do not have a robust understanding of the world tempered by formulaic, algorithmic "thinking".
To make a metaphor to cognitive science, they are only capable of implicit learning, not explicit learning. Though they learn extremely complex patterns between strings of words and paragraphs, their lack of self-awareness means they cannot directly observe their thinking; thus, they cannot explore how to think more efficiently. This leads to computational bloat, an explosion in the complexity and size of LLMs.
This has many drawbacks, like increased demand on the energy grid and the models being too complex for us to understand their mechanizations. Humans, on the other hand, are imprecise but extremely computationally and energy efficient. Perhaps if we could give models some of our characteristics, chiefly executive function, they could become more efficient and understandable while retaining their superior precision.
18 days until submission date
Use Rag to serve as working memory for LLM level "thinking" to facilitate proof of concept of executive functioning. Executive functioning includes:
Use RAG to feed the LLM with data sets from English and Math questions, asking it to solve associative and algorithmic problems in both subjects.
Level 1: It can accurately answer both types of questions by selecting and performing the correct algorithms on simple questions, and describe how the algorithms work.
E.G. Addition and Subtraction, How many "r"s in "strawberry", etc
Level 2: It can generate compositions of previously seen algorithms to hypothesize on how to solve slightly more complex
E.G. Multiplication is then described to it. Then it is asked to create an algorithm to solve multiplication problems of increasing digits. Then this process is repeated for division
The purpose of this is to train it to hypothesize
Level 3: It can generalize and learn other fields. Like Biology
E.G. Read scientific papers and assist with research based on an accurate understanding of the mechanics of the topic
Does the project demonstrate quality software development? Does the project leverage NVIDIA AI Workbench? How is the quality of the code? Is there a balanced blend of frontend and backend in the software?
Does it have a user-friendly interface? How much work is required to get the application in the Project started?
Is the demo video and explanation of the project well thought out? Is it clear how this project showcases NVIDIA AI Workbench features?
How big of an impact could the project have on the Dell & NVIDIA developer community? How big of an impact could it have beyond the target community?
How creative and unique is the project? Does the concept exist already? If so, how much does the project improve on it?
https://youtu.be/TRjq7t2Ms5I?si=NwU0mi2SNJ-Faan7 https://cobusgreyling.medium.com/fine-tuning-llms-with-retrieval-augmented-generation-rag-c66e56aec858 https://medium.com/@techsachin/metrag-a-multi-layered-thoughts-enhanced-retrieval-augmented-generation-framework-894e59ee56da
https://github.com/NVIDIA/workbench-example-agentic-rag https://github.com/NVIDIA/workbench-example-hybrid-rag
https://huggingface.co/datasets/ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions https://paperswithcode.com/dataset/math https://huggingface.co/datasets/deepmind/math_dataset