Are you running Qwen 3.5 locally? I tried it using the qwen-code, but it wasn’t able to fix a simple Fortran problem. But as a chat it works really well, probably the best local model I tried.
Yes! I am using opencode with the qwens, not qwen-code. I am now also trying to wire it to codex as a local model. For this, llama.cpp needed some modifications because of unknown tool names but it runs now. How good I cannot tell yet. I also did only benchmarks but no practical work yet, and seems like for benchmarks the larger qwens (27B, 35B-A3B, 122B-A10B) are barely usable.
Yes, I used Qwen3.5-35B-A3B-8bit, and once it runs, it has about 70 tokens/s on my laptop, so very usable. But in qwen-code, it would load the whole conversation over and over, so it would take 5-10 minutes to load the prompt, then quickly generate a response in a few seconds, then qwen-code would load again for every request, and for any task you need, say, 20 requests, so in practice it was unusable. Given that the task continues, I would think you don’t need to reload the prompt from scratch. I am sure this will get figured out in the coming years. As a chat, it is very good.