Responses- July 28
1) As talked about in the tensorflow exercise and Lawrence Maroney’s video, one-hot encoding is an interesting (but somewhat inefficient) way to encode words. With one-hot encoding, one goal is to keep a uniform length in regards to the longest sentence, so each word will have 0’s up until the word (and a 1 for the word). For example, if the sentence was “I love apples”, one-hot encoding would look something like this (I created this table on word so I apologize if the formatting is a little weird):
As we can see from the picture above, the majority of spaces are 0 (a sparse vector). Because of this, one-hot encoding is pretty inefficient. In order to create a more dense vector, known as word embeddings, we could assign a different number to each word. For example, for the sentence “the chocolate is in the cabinet”, the to 1, chocolate to 2, is to 3, in to 4, and cabinet to 5 (This makes word embeddings much more efficient than one-hot encoding):
[1, 2, 3, 4, 1, 5]
2.
The first plot (loss graph) shows that the training loss went down for the most part, while the validation loss went down from 2 to 4 and again from 4 to 6, but then spiked up from 6 to 10. This is a very clear sign of overfitting. The model worked well on the training data, but once the validation data was introduced the model overfitted greatly. The second plot (accuracy graph) shows that the training accuracy went up for the most part, while the validation accuracy fluctuated greatly. The accuracy went up from 2 to 3 and from 5 to 6, but spiked down from 3 to 5. As I said before, this is a clear sign of overfitting.
D.Text Classification with an RNN
1.
The first plot (accuracy graph) showed that both the validation and training accuracy both began to plateau at the 3rd epoch (the training accuracy increased slightly but very slightly). This is also a big sign of overitting. The second plot (loss graph) shows a very similar story in that the training loss decreased from epoch 0 to epoch 6, but the validation loss decreased drastically from 0 to 1, but then began to plateau and slightly increase from 4 to 6. This can also be a sign of overfitting. The bottom two graphs have LSTM layers included. I think that these graphs look very similar to the original accuracy and loss models without LSTM layers, meaning that the graphs still show overfittness, even with the additon of the LSTM layers.