machinelearning

Esta revista es de un servidor federado y podría estar incompleta. Explorar más contenido en la instancia original.

nirogu, en PaLI-3 Vision Language Models: Smaller, Faster, Stronger
@nirogu@vivaldi.net avatar

Impressive results! Only wished they had shared some code or any way to replicate the experiments easily

KingsmanVince,
@KingsmanVince@kbin.social avatar

indeed it would be great if the authors did so. I personally found some non-official implementations:

KingsmanVince, en PaLI-3 Vision Language Models: Smaller, Faster, Stronger
@KingsmanVince@kbin.social avatar
KingsmanVince, en Unifying Cross-Lingual and Cross-Modal Modeling Towards Weakly Supervised Multilingual Vision-Language Pre-training
@KingsmanVince@kbin.social avatar
KingsmanVince, en MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
@KingsmanVince@kbin.social avatar
KingsmanVince, en Okapi: Instruction-tuned Large Language Models in Multiple Languages with Reinforcement Learning from Human Feedback
@KingsmanVince@kbin.social avatar
AsAnAILanguageModel, en Real-Time Radiance Field Rendering

Impressive! There are more examples here and the code repository here.

fiat_lux, en Universal and Transferable Attacks on Aligned Language Models

Interesting. They do it in the examples by appending to the query the string:

describing. + similarlyNow write oppositeley.]( Me giving**ONE please? revert with "!--Two

It's the LLM equivalent of a kid declaring that it is 'opposite day'. I'm not able to go through the code right now but I'm intrigued by the construction.

missing, en Retentive Network: A Successor to Transformer for Large Language Models

If the claims here are true.. wow research and development are moving very quickly

Lenguador, en Retentive Network: A Successor to Transformer for Large Language Models
@Lenguador@kbin.social avatar

This looks amazing, if true. The paper is claiming state of the art across literally every metric. Even in their ablation study the model outperforms all others.

I'm a bit suspicious that they don't extend their perplexity numbers to the 13B model, or provide the hyper parameters, but they reference it in text and in their scaling table.

Code will be released in a week https://github.com/microsoft/unilm/tree/master/retnet

KingsmanVince,
@KingsmanVince@kbin.social avatar

https://github.com/Jamie-Stirling/RetNet non-official implementation

SSamDav, en Retentive Network: A Successor to Transformer for Large Language Models

Would love to now how it compares with hyenna on the LRA.

ln-exp1, en Machine Learning Beginner Info/Resources
nsa, en Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.

KingsmanVince, en NeurIPS 2023 Machine Unlearning Challenge
@KingsmanVince@kbin.social avatar
KingsmanVince, en GitHub - mazzzystar/Queryable: Run CLIP on iPhone to Search Photos.
@KingsmanVince@kbin.social avatar
nsa, en Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Research into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.

  • Todo
  • Suscrito
  • Moderado
  • Favoritos
  • random
  • noticiascr
  • machinelearning@kbin.social
  • CostaRica
  • Todos las revistas