We have to make a model which translate Italian to English from scratch.
Download the Italian to English translation dataset from here
Encoder and Decoder architecture with attention.
Encoder - with 1 layer LSTM Decoder - with 1 layer LSTM Attention
In Global attention, we have 3 types of scoring functions. As a part of this assignment, we had created 3 models for each scoring function. In model 1 you need to implemnt "dot" score function In model 2 you need to implemnt "general" score function In model 3 you need to implemnt "concat" score function
Using attention weights, we have plot the attention plots.
BLEU score as metric to evaluate the model and SparseCategoricalCrossentropy as a loss.
There is detaile observation under each plot.
There are many ways to train your model. Say you are translating Hindi to English
Encoder input should be — Hindi sentence
Decoder input should be — what is your name?
Decoder output should be — what is your name?
model.fit([encoder_input,decoder_input],decoder_output)
If you are having Decoder input and output as same, Say I want to predict Hindi to English. English sentence is — — Hi How are you So at the first-time step, you will pass and you expect your model to predict Hi, not . If you want the model to predict the same input as output then why do you need such a complex network?. So your decoder output will be a one-time step ahead of decoder input.