VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks en (arxiv.org)

Large language models (LLMs) have notably accelerated progress towards artificial general intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing them with immense potential across a range of applications. However, in the field of computer vision, despite the availability of numerous...