![@nirogu@vivaldi.net avatar](http://kbin.fedi.cr/media/cache/resolve/avatar_thumb/7c/b6/7cb663c4632710fd1c3d9bff0cc42f1b27095b24d863339ad1b0dd76bba5a2d2.jpg)
![@nirogu@vivaldi.net](http://kbin.fedi.cr/media/cache/resolve/user_cover/19/6d/196d70c9677e0c1e8a8de6848105c56259a47cf7b8fbdea1a861ff947f48fc8d.gif)
![@nirogu@vivaldi.net avatar](http://kbin.fedi.cr/media/cache/resolve/avatar_thumb/7c/b6/7cb663c4632710fd1c3d9bff0cc42f1b27095b24d863339ad1b0dd76bba5a2d2.jpg)
Computer scientist and mathematician
Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.
Computer scientist and mathematician
Este perfil es de un servidor federado y podría estar incompleto. Explorar más contenido en la instancia original.
PaLI-3 Vision Language Models: Smaller, Faster, Stronger (arxiv.org) en
This paper presents PaLI-3, a smaller, faster, and stronger vision language model (VLM) that compares favorably to similar models that are 10x larger. As part of arriving at this strong performance, we compare Vision Transformer (ViT) models pretrained using classification objectives to contrastively (SigLIP) pretrained ones. We...