🏆 PutnamBench Leaderboard 🏆

Benchmarking formal mathematical reasoning on the Putnam Mathematical Competition.

github paper

📝 Notes

  1. A method attached with a heart emoji (💚) is fully open-sourced, while a method attached with a blue heart emoji (💙) is partially open-sourced.
  2. As no existing methods have been benchmarked on PutnamBench without numerical answers in the theorem statement, the leaderboard for that variant contains no entries. Please share you results with us and we will promptly update the leaderboard!
  3. Some new problems have been added since original release, benchmarked methods have not been rerun on those problems, but PutnamBench version used in the eval is mentioned in the repo's `results.json`.
  4. We are open to suggestions for better indicating the differing compute budgets of approaches on the leaderboard. Please reach out to us with your ideas!
  5. We thank the EvalPlus team for providing the leaderboard template.