News

"Human-computer interaction studies are far slower than even human-adjudicated benchmark evaluations, but as the systems grow more powerful, they will become even more essential," they write.
OpenAI's new MLE-bench challenges AI systems with real-world data science tasks, revealing both the progress and limitations of AI in machine learning engineering compared to human experts.