SSamDav

@SSamDav@lemmy.pt

Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.

Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org) en

This is an exciting new paper that replaces attention in the Transformer architecture with a set of decomposable matrix operations that retain the modeling capacity of Transformer models, while allowing parallel training and efficient RNN-like inference without the use of attention (it doesn't use a softmax)....

SSamDav,

Would love to now how it compares with hyenna on the LRA.

SSamDav,

One cool thing about this work is that there was a concurrent discussion in twitter about the proposed method. From different authors.

  • Todo
  • Suscrito
  • Moderado
  • Favoritos
  • random
  • noticiascr
  • CostaRica
  • Todos las revistas