%% generate tags start %%
#ai
%% generate tags end %%
#ai/model
LLaVA represents a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding, achieving impressive chat capabilities mimicking spirits of the multimodal GPT-4 and setting a new state-of-the-art accuracy on Science QA.
[LLaVA (llava-vl.github.io)](https://llava-vl.github.io/)