# Miniature Llama 3.1 trained on TinyStories

This is a miniaturized implementation of the Llama 3.1 model, which is then trainined on the TinyStories dataset.
This implementation includes Grouped Query Attention, Rotary Positional Embeddings (RoPE), and the AdamW optimizer (see `src/`).
Outputs of the model for classification on the SST and CFIMDB dataset are included in `outputs/`.


### Acknowledgement
This code was developed as part of the 11-711 Advanced NLP class at Carnegie Mellon University. Parts of the codebase were created by the course staff. This code is based on llama2.c by Andrej Karpathy. Parts of the code are also from the [`transformers`](https://github.com/huggingface/transformers) library ([Apache License 2.0]).