## Obejective 
We have to make a model which translate Italian to English from scratch.

## Basic Info
1. Download the Italian to English translation dataset from <a href="http://www.manythings.org/anki/ita-eng.zip">here</a>

2. Encoder and Decoder architecture with  attention.

Encoder   - with 1 layer LSTM 
Decoder   - with 1 layer LSTM
Attention 

3. In Global attention, we have 3 types of scoring functions.
 As a part of this assignment, we had created 3 models for each scoring function.
    In model 1 you need to implemnt "dot" score function
    In model 2 you need to implemnt "general" score function
    In model 3 you need to implemnt "concat" score function
    

4. Using attention weights, we have plot the attention plots.  

5. BLEU score as metric to evaluate the model and SparseCategoricalCrossentropy as a loss. 

6. There is detaile observation under each plot.

## How to train your model?
0. There are many ways to train your model. Say you are translating Hindi to English

1. Encoder input should be — <start> Hindi sentence <end>
2. Decoder input should be — <start> what is your name?
3. Decoder output should be — what is your name? <end>
4. model.fit([encoder_input,decoder_input],decoder_output)

## What is teacher forcing?
If you are having Decoder input and output as same, Say I want to predict Hindi to English.
English sentence is — — <Start> Hi How are you <End>
So at the first-time step, you will pass <Start> and you expect your model to predict Hi, not <Start>.
If you want the model to predict the same input as output then why do you need such a complex network?. So your decoder output will be a one-time step ahead of decoder input.
 
## Observation:
![p1](https://user-images.githubusercontent.com/39815040/100618452-e141aa80-3341-11eb-82ff-160ed8bfe5c3.png)