Más antiguo - Comentarios - machinelearning - Kbin en español, instancia regional para personas de Costa Rica y más allá.

Esta revista es de un servidor federado y podría estar incompleta. Explorar más contenido en la instancia original.

nsa, hace 1 año en The Curse of Recursion: Training on Generated Data Makes Models Forget

If the effect is strong enough, then it could have a very negative effect on LLM training in the near future, considering more and more of the internet contains ChatGPT & GPT-4 content in it and automatic detectors are currently quite poor.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

Deliverator, hace 1 año

Yeah it does not portend well for the future, especially combined with the current explosion of low quality, profit driven content. I fear if left unchecked we could approach some kind of Kessler Syndrome-style scenario where desire for rapid growth and profit will poison the well in the long term. "Garbage in, garbage out"

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 1 año en Machine Learning Beginner Info/Resources

I also want to share some resources.
For Pytorch,

https://pytorch.org/tutorials/ their basic tutorials are fundamental but some more advanced tutorials might be outdated.

https://www.learnpytorch.io/ the author guides mostly in computer vision but he gives the overview from research to production.

For TPU,

https://github.com/ayaka14732/tpu-starter full guideline using TPUs with Jax

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

ragnarokonline, hace 1 año en r/MachineLearning finally received a warning from u/ModCodeOfConduct

Got eem

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 1 año en VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks

Also reminds me of this ICLR paper: Linearly Mapping from Image to Text Space.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

miro, hace 1 año en Extending Context Window of Large Language Models via Positional Interpolation

Is this similar to what MPT did to extend its context length?

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 1 año

hmmm... not sure which model you're referring to. do you have a paper link?

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

Blaed, hace 1 año

I believe it's a different technique (at least far as I understand the topics).

According to Mosaic, MPT (i.e. MPT-7B-StoryWriter-65k+) uses a different underlying architecture which enables their long context lengths.

The original author of this new method (SuperHOT by kaiokendev) shares what he has learned about this method here:

https://kaiokendev.github.io/til

https://kaiokendev.github.io/context

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

SSamDav, hace 1 año en Extending Context Window of Large Language Models via Positional Interpolation

One cool thing about this work is that there was a concurrent discussion in twitter about the proposed method. From different authors.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 1 año

do you have a link?

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 1 año en Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models

It seems like for creative text generation tasks, metrics have been shown to be deficient; this even holds for the new model-based metrics. That leaves human evaluation (both intrinsic and extrinsic) as the gold standard for those types of tasks. I wonder if the results from this paper (and other future papers that look automatic CV metrics) will lead reviewers to demand more human evaluation in CV tasks like they do for certain NLP tasks.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses en Hardwiring ViT Patch Selectivity into CNNs using Patch Mixing

Reddit thread: https://www.reddit.com/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 11 meses

Please don't post links to reddit.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses

I know we are moving away from Reddit. However, if I don't link, I feel like we may miss out good threads on r/machinelearning. Moreover, the authors don't only post arxiv links, they post other sutff such as Summary, Key points, ... (e.g this).

So can I at least put them in the posts instead of posting in a comment?

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

Lenguador, hace 11 meses

I find the link valuable. Despite the proliferation of AI in pop culture, actual discussion of machine learning research is still niche. The community on Reddit is quite valuable and took a long time to form.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 11 meses

If there isn't any discussion on reddit (no discussion in this case), I don't see a reason to link to reddit; you can just link to the project page. That said, if you think there is important discussion happening that is helpful for understanding the paper, then use a teddit link instead, like:

https://teddit.net/r/MachineLearning/comments/14pq5mq/r_hardwiring_vit_patch_selectivity_into_cnns/

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses

I will follow then.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 11 meses

That's appreciated!

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses en Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models

Related links:

https://luogen1996.github.io/lavin/

https://github.com/luogen1996/LaVIN

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 11 meses en Sophia: A Scalable Stochastic Second-order Optimizer for Language Model Pre-training

Research into efficient optimization techniques seems pretty important given the scale of LLMs these days. Nice to see a second-order approach that achieves reasonable wall-clock improvements.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses en GitHub - mazzzystar/Queryable: Run CLIP on iPhone to Search Photos.

Relevant links

https://queryable.app/

https://apps.apple.com/us/app/queryable-find-photo-by-text/id1661598353?platform=iphone

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

KingsmanVince, hace 11 meses en NeurIPS 2023 Machine Unlearning Challenge

Relevant links:

https://github.com/unlearning-challenge/starting-kit

https://arxiv.org/abs/2209.02299

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

nsa, hace 11 meses en Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time

Averaging model weights seems to help across textual domains as well, see Branch-Train-Merge: Embarrassingly Parallel Training of Expert Language Models and Scaling Expert Language Models with Unsupervised Domain Discovery. I wonder if the two types of averaging (across hyperparameters and across domains) can be combined to produce even better models.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

ln-exp1, hace 11 meses en Machine Learning Beginner Info/Resources

Sharing the MML book: https://mml-book.github.io/book/mml-book.pdf

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

SSamDav, hace 11 meses en Retentive Network: A Successor to Transformer for Large Language Models

Would love to now how it compares with hyenna on the LRA.

responder

reportar

actividad

copiar enlace

copiar enlace al fediverso

Loading...

Federación

Status:

Activo | Inactivo

Instancias:

/m/machinelearning@kbin.social

Hilos (58)

Microblog (1)

Gente

Revistas

Revista

machinelearning

@machinelearning@kbin.social

Machine learning (ML) is a field devoted to understanding and building methods that let machines "learn" – that is, methods that leverage data to improve computer performance on some set of tasks. Machine learning algorithms build a model based on sample data, known as training data, in order to make predictions or decisions without being explicitly programmed to do so. Machine learning algorithms are used in a wide variety of applications, such as in medicine, email filtering, speech recognition, agriculture, and computer vision, where it is difficult or unfeasible to develop conventional algorithms to perform the needed tasks.

Creado: hace 1 año
Propietaria/o: donelias
Suscriptores/as: 1
En linea: -

Moderadores/as

donelias