Embedding

In Part1, we learned how to build a neural network with one hidden layer to generate words. The model we built performed fairly well as we got the exact words generated based on counting. However, the bigram model suffers from the limitation that it assumes that each character only depends on its previous character. Suppose there is only one bigram starting with a particular character. In that case, the model will always generate the following character in that bigram, regardless of the context or the probability of other characters. This lack of context can lead to poor performance of bigram models. In this lecture, Andrej shows us how to build a multilayer neural network to improve the model performance. ...