Qwen2.5-VL-32B: Smarter and Lighter

(qwenlm.github.io)

Comments

simonw 24 March 2025
Big day for open source Chinese model releases - DeepSeek-v3-0324 came out today too, an updated version of DeepSeek v3 now under an MIT license (previously it was a custom DeepSeek license). https://simonwillison.net/2025/Mar/24/deepseek/
simonw 24 March 2025
This model is available for MLX now, in various different sizes.

I ran https://huggingface.co/mlx-community/Qwen2.5-VL-32B-Instruct... using uv (so no need to install libraries first) and https://github.com/Blaizzy/mlx-vlm like this:

  uv run --with 'numpy<2' --with mlx-vlm \
    python -m mlx_vlm.generate \
      --model mlx-community/Qwen2.5-VL-32B-Instruct-4bit \
      --max-tokens 1000 \
      --temperature 0.0 \
      --prompt "Describe this image." \
      --image Mpaboundrycdfw-1.png
That downloaded an ~18GB model and gave me a VERY impressive result, shown at the bottom here: https://simonwillison.net/2025/Mar/24/qwen25-vl-32b/
ggregoire 24 March 2025
We were using Llama vision 3.2 a few months back and were very frustrated with it (both in term of speed and results quality). Some day we were looking for alternatives on Hugging Face and eventually stumbled upon Qwen. The difference in accuracy and speed absolutely blew our mind. We ask it to find something in an image and we get a response in like half a second with a 4090 and it's most of the time correct. What's even more mind blowing is that when we ask it to extract any entity name from the image, and the entity name is truncated, it gives us the complete name without even having to ask for it (e.g. "Coca-C" is barely visible in the background, it will return "Coca-Cola" on its own). And it does it with entities not as well known as Coca-Cola, and with entities only known in some very specific regions too. Haven't looked back to Llama or any other vision models since we tried Qwen.
simonw 24 March 2025
32B is one of my favourite model sizes at this point - large enough to be extremely capable (generally equivalent to GPT-4 March 2023 level performance, which is when LLMs first got really useful) but small enough you can run them on a single GPU or a reasonably well specced Mac laptop (32GB or more).
101008 24 March 2025
Silly question: how can OpenAI, Claude and all, have a valuation so large considering all the open source models? Not saying they will disappear or be tiny (closed models), but why so so so valuable?
Arcuru 24 March 2025
Does anyone know how making the models multimodal impacts their text capabilities? The article is claiming this achieves good performance on pure text as well, but I'm curious if there is any analysis on how much impact it usually has.

I've seen some people claim it should make the models better at text, but I find that a little difficult to believe without data.

lysace 24 March 2025
To clarify: Qwen is made by Alibaba Cloud.

(It's not mentioned anywhere in the blog post.)

jauntywundrkind 24 March 2025
Wish I knew better how to estimate what sized video card one needs. HuggingFace link says this is bfloat16, so at least 64GB?

I guess the -7B might run on my 16GB AMD card?

gatienboquet 24 March 2025
So today is Qwen. Tomorrow a new SOTA model from Google apparently, R2 next week.

We haven't hit the wall yet.

nodesocket 24 March 2025
What is the recommended model to process a RAG of PDF text documents? I've seen some recommendations for Mistral:7b. Looking to run on a consumer pedestrian home PC (ollama) with a Nvidia 4060ti and Ryzen 5700x.
michaelt 24 March 2025
Has anyone successfully run a quantized version of any of the Qwen2.5-VL series of models?

I've run the smallest model in non-quantized format, but when I've tried to run a AWQ version of one of the bigger models I've struggled to find a combination of libraries that works right - even though it should fit on my GPU.

LeoPanthera 25 March 2025
Open weight models are coming out so quickly it's difficult to keep track. Is anyone maintaining a list of what is "current" from each model?
i_love_retros 24 March 2025
Any security risks running these Chinese LLMs on my local computer?
naasking 25 March 2025
I found Qwen never completed answering my standard coding task that I ask to check a model. Claude did great, DeepSeek R1 did well.
Tewboo 25 March 2025
Impressive to see advancements in both performance and efficiency. Excited to see how it stacks up in real-world use.
informal007 25 March 2025
It's still BF16 model.

Deepseek has proved that fp8 is more cost-effectiveness than fp16, isn't it valid for dozens-B model?

admiralrohan 24 March 2025
Any update on the privacy concerns?
nico 25 March 2025
How does it compare to gemma3?
cryptocrat7 25 March 2025
there should be a way to share these prompts + tools through visuals
wg0 24 March 2025
For now, motivation is to win the AI race. Not sure how or who would keep these models updated in the future.

I mean all of them.

buyucu 25 March 2025
same day as deepseek v3 update as well!
cryptocrat7 25 March 2025
i have been using it, agreed :100
slics 24 March 2025
Warning- this is a lengthy comment. I deployed that model locally via Ollama and the reasoning model is not accurate.

This is lengthy but I thought I share its reasoning- at some point it just stopped working - then it started responding in Chinese.

Query input:

“1+5 = 7, and that is the correct answer”

Reasoning response:

The equation "1 + 5 = 7" isn’t about regular arithmetic addition. Instead, it refers to the number of segments used to display the digits 1 and 5 on a seven-segment display (like on digital clocks or calculators). Here's the breakdown: • Digit 1 uses 2 segments (the vertical lines on the right). • Digit 5 uses 5 segments (top, upper-left, middle, lower-left, and bottom segments). Adding these segments: 2 (for 1) + 5 (for 5) = 7 segments total. This is a clever visual riddle, not a math problem! it was a math problem, you are just wrong

Okay, the user is insisting that "1 + 5 = 7 is a math problem" and that my previous explanation was wrong. Hmm, I need to address this carefully. First, in standard arithmetic, 1 + 5 is indeed 6. So if it's a straightforward math problem, the correct answer should be 6. The user might be mistaken, or perhaps they have a different context in mind. Wait, in my previous response, I considered the seven-segment display explanation, which is a common riddle. The user is now rejecting that and stating it's a math problem, implying that their answer of 7 is correct in some mathematical context. Maybe they're referring to a different base? Let me revisit the base idea. If the equation is in a different base, maybe base 9? Let's see: In base b, the equation "1 + 5 = 11" would be 1 + 5 = 6 in decimal, which isn't helpful. Wait, maybe the result is written in a different base? For example, if we compute 1 + 5 in decimal (which is 6) and then represent that in base 7? 6 in base 7 is still "6", not 7. Hmm, not helpful. Alternatively, maybe modulo arithmetic? Like (1 + 5) mod something equals 7? Let's see: 1 +5 = 6. To get 7, maybe modulo something where 6 ≡7 mod (negative number?), which doesn't make sense. Alternatively, maybe the user is adding in a cyclic group? Like mod 6? 6 mod 6 is 0, not 7. Alternatively, could it be a different operation? Like 1 plus 5 in some operation other than addition? Maybe concatenation? 1 concatenated with 5 is 15, but that's not 7. Wait, the user said "another one 1+5 =7". Maybe they're referring to a previous problem where the answer was different, and this is a follow-up? But I don't have context on previous interactions. Alternatively, perhaps the user is joking or trolling, but I should take their statement at face value. Since they insist it's a math problem and that 7 is correct, I need to explore if there's any mathematical framework where this holds. Wait, let's consider if it's a sum in a different base where the result is written in that base. Suppose the equation is in base 8. Let's see: In base 8, "7" is still 7 in decimal, so 1 (base8) +5(base8) = 6 (base8), which is 6 in decimal. Not 7. Alternatively, if the answer is written in base 9: 1 +5 =6 in decimal, which is still 6 in base9. Still not 7. Wait a different angle: Maybe the '+' is not addition, but some other operation. Like bitwise XOR? 1 in binary is 01, 5 is 101. XOR gives 110 (6). Not 7. Or maybe adding digits of the sum? Like 1+5=6, then 6 is a single digit, so adding digits 6, but that's not helpful. Wait, the user might have made a mistake, but they're insistent. Since they called it a math problem, perhaps they're using a different definition of "+"? Like 1 +5 in some algebraic structure where the result is 7? That's possible, but without more context, it's hard to define. Alternatively, maybe the numbers are not in base 10. Suppose the numbers are in base 12. Wait, 1+5 is still 6. Hmm. Wait, maybe the user is considering the numbers as Roman numerals? I+V=VI which is 6. Still not 7. Alternatively, maybe it's a sequence where 1+5 is the 7th term? Unlikely. Alternatively, it could be a trick with letters. Like 1 is the first letter (A), 5 is the fifth letter (E), so AE... but that doesn't lead to 7. Alternatively, adding their positions: 1 +5 =6, which is F. Still not 7. Alternatively, maybe counting letters in spelled-out words? "One" has 3 letters, "five" has 4, total 7? That could be a different riddle. Let's see: 1 (one) +5 (five) = 3 letters +4 letters =7. That's a possible interpretation! If the problem is "1+5" where each number is spelled out in English and you add the number of letters, then 3+4=7. So maybe that's the case here? Alternatively, if it's counting the strokes to write the numbers. For example, writing '1' takes one stroke, '5' takes two strokes (depending on how you write it)? Not sure. Alternatively, the user might have meant 1+5 in a non-standard math problem, like in a clock arithmetic? For example, mod 12. 1+5=6 mod12 is still 6. Hmm, I need to consider that maybe the user intended a different approach that I haven't thought of. Since they are insistent that 7 is correct, perhaps they meant a different operation or context. Alternatively, maybe the problem is written vertically: 1 + 5 ---- 7 But that still doesn't make sense unless there's a carryover. But 1+5 is 6, so even with carryover, you can't get 7. Unless adding digits with some twist. Wait, maybe they're adding the numbers' positions in a different sequence? Like 1st prime plus 5th prime? The first prime is 2, fifth prime is 11. 2+11=13, nope. Alternatively, if you consider 1 and 5 as bits in a binary number, like 1 and 5 (binary 101) concatenated? Not helpful. Alternatively, the problem might be in a different numeral system, like in base 1? But that's not standard. Alternatively, maybe it's a simple mistake, but the user is confident. Since I need to respond, perhaps I should outline the possible interpretations where 7 could be the answer, like the spelled-out letters interpretation. Alternatively, the user could have a typo and meant 2+5=7, but wrote 1+5=7. Alternatively, in some contexts like in programming, maybe 1+5=7 due to a specific coding (like 1 is assigned to a=1, so a+f=6? Not sure. Alternatively, it's a lateral thinking puzzle where the answer is 7 because of some visual or language trick. Wait, another angle: If you add 1 and 5 on a calculator upside down, but that might not help. Alternatively, in temperature scales? For example, converting from Fahrenheit to Celsius or vice versa? 1°F to Celsius is negative, 5°C is 41°F. Not adding to 7. Hmm. Given the user's assertion that 1+5=7 is correct in a math problem, I might need to explain that typically 1+5=6, but there are alternative contexts where it might be considered 7, like counting letters in their names (1 is "one" (3 letters), 5 is "five" (4 letters), 3+4=7). That's a possible interpretation. Alternatively, in some base where the addition is represented differently. Alternatively, a trick question where the numbers are not in base 10 but in another base where the representation is different. Alternatively, the user might have seen a problem where adding 1 and 5 gives 7 through some function, like a + b = a + b +1, but that's arbitrary.

tomcam 24 March 2025
I wish I were smarter and lighter

tbh I’d settle for just lighter

submeta 25 March 2025
Heavily censors certain political topics (in the Middle East). Grok is doing none of this. How is this possible, coming from China? What stakes got China in the ME conflict?
hmottestad 24 March 2025
Just don’t ask it about the tiananmen square massacre or you’ll get a security warning. Even if you rephrase it.

It’ll happily talk about Bloody Sunday.

Probably a great model, but it worries me that it has such restrictions.

Sure OpenAI also has lots of restrictions, but this feels more like straight up censorship since it’ll happily go on about bad things the governments of the west have done.