Vision Language Transformers: A Survey en (arxiv.org)

Vision language tasks, such as answering questions about or generating captions that describe an image, are difficult tasks for computers to perform. A relatively recent body of research has adapted the pretrained transformer architecture introduced in \citet{vaswani2017attention} to vision language modeling. Transformer models...

  • Todo
  • Suscrito
  • Moderado
  • Favoritos
  • random
  • noticiascr
  • machinelearning@kbin.social
  • CostaRica
  • Todos las revistas