OpenAI is evaluating the scientific judgment of agents with GeneBench-Pro