News
This organization contains the source code for Multi-SWE-bench, a multilingual benchmark for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results