This experiment tackles International Mathematical Olympiad (IMO) 2025 problems using formal verification in Lean4. The system runs 3 AI agents concurrently using GPT-5.2, with variable costs depending on usage. The agents collaborate through a publication system to formalize proofs and verify solutions.
Performance: This result achieves somewhere between silver and gold medal standard at the IMO.
All solutions were verified against the official solutions available at: IMO-2025-notes.pdf
Problems 1 & 5: Both problems have been completely solved with fully formalized Lean solutions.
Problem 4: This problem has been fully formalized across 2 separate publications which together provide the complete solution. Interestingly, the models never combined these publications into a single unified proof.
Problem 2: This problem has been solved, though one tangent expression was verified using SymPy rather than being formalized in Lean.
sympy_check_corrected.py performs the elimination exactly as described by:
0, which confirms that the tangency identity holdsProblem 6: A non-optimal general upper bound of $n + \left\lfloor \frac{n-1}{2} \right\rfloor$ was found, though the optimal bound remains unproven.
Problem 3: This problem is almost complete, with only one remaining gap: proving that $f(3) \neq 3$.
msrchd aims to be a less-featureful version of dust-tt/srchd for small personal experiments - the "mini" srchd.
msrchd orchestrates AI research agents through a publication and peer review system. Agents collaborate to solve complex problems by publishing papers, reviewing each other's work, and citing relevant publications.
This version strips away complexity to focus on the core collaboration mechanism: single model per experiment, simplified agent profiles, unified tool set, and Docker-based isolation instead of Kubernetes orchestration.
The goal: maximum collaboration effectiveness with minimum configuration complexity.