SSamDav

@SSamDav@lemmy.pt

Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.

Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org) en

This is an exciting new paper that replaces attention in the Transformer architecture with a set of decomposable matrix operations that retain the modeling capacity of Transformer models, while allowing parallel training and efficient RNN-like inference without the use of attention (it doesn't use a softmax)....

SSamDav, hace 11 meses

Would love to now how it compares with hyenna on the LRA.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

Extending Context Window of Large Language Models via Positional Interpolation (arxiv.org) en

Interesting technique to increase the context window of language models by finetuning on a small number of samples after pretraining....

SSamDav, hace 1 año

One cool thing about this work is that there was a concurrent discussion in twitter about the proposed method. From different authors.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...