fortbench: a benchmark for agentic coding of Fortran

krystophny · March 14, 2026, 11:05am

I was playing with the new Qwen 3.5 family of models and benchmarked them inspired by SWE-Bench on GitHub - lazy-fortran/fortbench: Real-world Fortran coding benchmark for agent CLIs · GitHub . Qwen is still a bit worse than Claude and GPT, but can solve more than 50% of my tasks. Would be curious about input or feedback how to expand it.

PS: Changed my user from @ert to @krystophny to be consistent with GitHub.

certik · March 14, 2026, 2:00pm

Are you running Qwen 3.5 locally? I tried it using the qwen-code, but it wasn’t able to fix a simple Fortran problem. But as a chat it works really well, probably the best local model I tried.

krystophny · March 14, 2026, 2:11pm

Yes! I am using opencode with the qwens, not qwen-code. I am now also trying to wire it to codex as a local model. For this, llama.cpp needed some modifications because of unknown tool names but it runs now. How good I cannot tell yet. I also did only benchmarks but no practical work yet, and seems like for benchmarks the larger qwens (27B, 35B-A3B, 122B-A10B) are barely usable.

certik · March 14, 2026, 2:29pm

Yes, I used Qwen3.5-35B-A3B-8bit, and once it runs, it has about 70 tokens/s on my laptop, so very usable. But in qwen-code, it would load the whole conversation over and over, so it would take 5-10 minutes to load the prompt, then quickly generate a response in a few seconds, then qwen-code would load again for every request, and for any task you need, say, 20 requests, so in practice it was unusable. Given that the task continues, I would think you don’t need to reload the prompt from scratch. I am sure this will get figured out in the coming years. As a chat, it is very good.

Topic		Replies	Views
Groq Fortran coding agent AI	1	270	March 8, 2025
Experience using Claude AI	11	1023	April 29, 2026
ForOpenAI - A Fortran library for OpenAI API Announcements	4	997	September 20, 2023
Using OpenAI Codex to generate Fortran programs AI	0	323	October 19, 2025
Benchmarking Large Language Models AI	6	1689	January 9, 2025

fortbench: a benchmark for agentic coding of Fortran

Related topics