Alibaba Cloud has introduced a large-scale open source language model called Large Vision Language, which has the ability to understand images and text.
Two models, Qwen-VL and Qwen-VL-Chat, have been trained for image understanding and conversation. With 7 billion parameters, Qwen-VL-Chat is capable of processing images, such as performing mathematical calculations, and generating conversational responses.
This model can also be used to help read Chinese signs for those who are unfamiliar with the language or assist individuals with visual impairments. Both Qwen-7B and Qwen-7B-Chat are available for download and use on ModelScope, Alibaba Cloud’s AI developer community, and Hugging Face.