We choose low temperature: 0.1. High temperature may lead to more random errors, which may be harmful. We use the maximum thinking budget (32768 reasoning tokens) of Gemini 2.5 Pro. We do not use web search (of course), code, or any other tools. We share the most important prompts below.
### Core Instructions ###