Backless Benches Product

News

This organization contains the source code for Multi-SWE-bench, a multilingual benchmark for evaluating LLMs in real-world code issue resolution. Unlike existing Python-centric benchmarks (e.g., ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

News

Trending now