![@Deliverator@kbin.social avatar](http://kbin.fedi.cr/media/cache/resolve/avatar_thumb/8c/15/8c15c0f5174cf7ec58030e16b9aedc8029668c922d09d56d2d836e9c7caa0d8e.jpg)
![@Deliverator@kbin.social](http://kbin.fedi.cr/media/cache/resolve/user_cover/64/da/64da56cfeb4306d54c7a1913d929fabf905aae742756e37517582bd50205efbe.jpg)
![@Deliverator@kbin.social avatar](http://kbin.fedi.cr/media/cache/resolve/avatar_thumb/8c/15/8c15c0f5174cf7ec58030e16b9aedc8029668c922d09d56d2d836e9c7caa0d8e.jpg)
Tales from the Terrordrome
Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.
Tales from the Terrordrome
Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.
r/Place 2023: The Good Ending (peertube.io) en
Retentive Network: A Successor to Transformer for Large Language Models (arxiv.org) en
This is an exciting new paper that replaces attention in the Transformer architecture with a set of decomposable matrix operations that retain the modeling capacity of Transformer models, while allowing parallel training and efficient RNN-like inference without the use of attention (it doesn't use a softmax)....
Extending Context Window of Large Language Models via Positional Interpolation (arxiv.org) en
Interesting technique to increase the context window of language models by finetuning on a small number of samples after pretraining....