Positional encoding: Key idea to parallelize compute in deep learning models
For notes on tokenization, see https://medium.com/@anudhamittal/tokenization-in-nlp-9b36b5cc2590
STEPS for a Deep Learning Model for Text Data
1/ Tokenize text
choice of tokenization schemes: BytePair , Word Piece, Sentence Piece
2/ Embed text (Query, Key, Value)
choice of embedding model: many
3/ Attach a positional encoding to each token
keeps track of position and permits parallel training of tokens