OpenAI’s Codex 5.3 now leads in a new benchmark for testing AI agents' ability to reproduce computational results from ...