CAIS and Scale AI Unveil Results of “Humanity’s Last Exam,” a Groundbreaking New Benchmark
The Center for AI Safety (CAIS) and Scale AI today announced the results of a groundbreaking new AI benchmark that was designed to test the limits of AI knowledge and whether the models are capable of chain-of-thought reasoning. The results demonstrated a significant improvement from the reasoning capabilities of earlier models, but current models still […]