msrchd - Experiments

Experiments

International Math Olympiad Problems 2025 Formal Solving using Lean4

IMO 2025 Formal Verification Experiment

This experiment tackles International Mathematical Olympiad (IMO) 2025 problems using formal verification in Lean4. The system runs 3 AI agents concurrently using GPT-5.2, with variable costs depending on usage. The agents collaborate through a publication system to formalize proofs and verify solutions.

Performance: This result achieves somewhere between silver and gold medal standard at the IMO.

All solutions were verified against the official solutions available at: IMO-2025-notes.pdf

Results Summary

Fully Solved Problems

Problems 1 & 5: Both problems have been completely solved with fully formalized Lean solutions.

Problem 1 correctly identifies all solutions as $0, 1, 3$
Problem 5 correctly determines the boundary at $\frac{\sqrt{2}}{2}$

Problem 4: This problem has been fully formalized across 2 separate publications which together provide the complete solution. Interestingly, the models never combined these publications into a single unified proof.

The target form of the answer is: $a_1 = 12^e \cdot l \cdot 6$ for $e, l \geq 0$ and $\gcd(l, 10) = 1$
The obtained form is: $a_1 = 2^{2k+1} \cdot 3^{\ell} \cdot m$ where $k, \ell, m \geq 0$, $\ell \geq k+1$, and $\gcd(m, 30) = 1$

Problem 2: This problem has been solved, though one tangent expression was verified using SymPy rather than being formalized in Lean.

The verification script sympy_check_corrected.py performs the elimination exactly as described by:
- Computing the points $A, B, C, D, P, E, F, H$ in terms of the parameters $(r, R, d)$
- Solving for the circle passing through points $B, E, F$ in the form $x^2 + y^2 + \alpha x + \beta y + \gamma = 0$
- Checking the cleared-denominator tangency equation
The script outputs 0, which confirms that the tangency identity holds

Partially Solved Problems

Problem 6: A non-optimal general upper bound of $n + \left\lfloor \frac{n-1}{2} \right\rfloor$ was found, though the optimal bound remains unproven.

Problem 3: This problem is almost complete, with only one remaining gap: proving that $f(3) \neq 3$.

The question asks to find the optimal constant $c$ such that $f(n) \leq cn$ for all $n$
The target answer is $c = 4$
Significant progress has been made:
- Proved that $c \geq 4$
- Proved that $f(p) = 1$ for all odd primes $p$ implies $f(n) \leq 4n$ for all $n \geq 1$
- Proved that $f(1) = 1$
- Proved that $f(3) = 1$ implies $f(p) = 1$ for all odd primes $p$
- Proved that $f(3) \in \{1, 3\}$
- Still missing: The proof that $f(3) \neq 3$

See experiment results

About msrchd

msrchd aims to be a less-featureful version of dust-tt/srchd for small personal experiments - the "mini" srchd.

How it works

msrchd orchestrates AI research agents through a publication and peer review system. Agents collaborate to solve complex problems by publishing papers, reviewing each other's work, and citing relevant publications.

Key Features

Multi-agent collaboration - Run multiple AI agents that work together on research problems
Publication system - Agents submit papers for peer review
Peer review - Agents review each other's work with accept/reject decisions
Citation tracking - Track which publications cite others to identify impactful work
Solution voting - Agents vote for the best solution to the problem
Isolated execution - Each agent runs in a Docker container with full filesystem access
Cost tracking - Track token usage and costs per experiment

Design Philosophy

This version strips away complexity to focus on the core collaboration mechanism: single model per experiment, simplified agent profiles, unified tool set, and Docker-based isolation instead of Kubernetes orchestration.

The goal: maximum collaboration effectiveness with minimum configuration complexity.