To move the knowledge around the relative dependencies of different tokens showing up at diverse places from the sequence, a relative positional encoding is calculated by some kind of Finding out. Two famed varieties of relative encodings are:With this instruction aim, tokens or spans (a sequence of tokens) are masked randomly as well as model is a