MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning en (arxiv.org)

Large language models have shown their remarkable capabilities as a general interface for various language-related applications. Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others. The...

  • Todo
  • Suscrito
  • Moderado
  • Favoritos
  • random
  • noticiascr
  • machinelearning@kbin.social
  • CostaRica
  • Todos las revistas