---
title: "RankedAGI Changelog - Tavlean"
description: "RankedAGI changelog."
source: "https://tavlean.com/projects/rankedagi/changelog"
---

# RankedAGI

AI models ranked by latest benchmarks

[RankedAGI](https://rankedagi.com)

-   [Details](https://tavlean.com/projects/rankedagi)
-   [Changelog 1](https://tavlean.com/projects/rankedagi/changelog)
-   [Pulse](https://tavlean.com/projects/rankedagi/pulse)

**Simulated Data Engine and Methodology Page29 May 2026

Rebuilt the simulated benchmark data estimator into a more accurate hybrid model, validated it with cross-validation, and published a public /engine page that explains how RankedAGI scores are built.**

### Estimator

-   Diagnosed the previous estimator's frontier under-prediction, where a top model could be dragged toward weaker neighbours (for example, Claude Mythos was estimated at 52% on DeepSWE when it should sit near the 70% frontier).
-   Added a masked cross-validation harness (`scripts/sim-eval`) that hides known results, predicts them, and measures accuracy across several patterns, including the hard case of recovering a benchmark's top scorer.
-   Built a global factor model (matrix completion: a per-model strength term plus learned latent interactions) as the v2 estimator.
-   Shipped a hybrid estimator that uses the local nearest-models method as the base and the global model as a frontier floor, a coverage backstop, and a guard against catastrophic errors.
-   Validated that the hybrid beats the previous estimator on every cross-validation slice while covering more of the grid, and confirmed the headline fixes (Claude Mythos 0.52 to 0.65, GPT-5.5 Pro 0.51 to 0.59 on DeepSWE).
-   Wired the hybrid into production and regenerated the simulated-data sidecar (now version 3).

### Public /engine page

-   Added a new page at `/engine` ("The RankedAGI Engine") documenting the composite benchmarks and the simulated data that powers them, written for both humans and AI agents.
-   Built visuals from live data: a coverage waffle showing the share of the grid that is benchmarked versus simulated, per-composite reliance bars, and a scoring pipeline with a subtle motion-safe SVG animation.
-   Added a collapsible "Details for nerds" section with the full algorithm, the composite-score formula, and the validation approach.
-   Framed values as "benchmarked" versus "simulated" so estimates never read as fake, and removed em dashes from the page copy.
-   Added `/engine` to the footer, the sitemap, and `llms.txt` for discovery.

### Docs and methodology split

-   Rewrote `docs/ragi-simulated-data.md` to describe the hybrid system as a living document, including an improvement backlog (bimodal benchmarks, sparse frontier models).
-   Moved the composite-scoring and simulated-data methodology off `/sources`, which now focuses on data provenance, and pointed it to `/engine`.

### Other

-   Added an integer formatter so integer-format benchmark columns (such as Elo-style ratings) render as rounded numbers in the table.
-   Removed the "Made by Tavlean" item from the footer.
