Google rolls out Gemini 2.5 Deep Think, outperforms Grok-4 and OpenAI o3 on 2 key benchmarks

/ 3 min read
Summary

Gemini 2.5 Deep Think, Google's advanced AI model, excels in reasoning and creativity, outperforming Grok-4 and OpenAI o3. Available to AI Ultra subscribers, it uses 'parallel thinking' for complex tasks.

Tested by mathematicians, it shows promise in math and coding. Google plans wider testing to ensure its safety and effectiveness in various use cases.
Tested by mathematicians, it shows promise in math and coding. Google plans wider testing to ensure its safety and effectiveness in various use cases. | Credits: Getty Images

Google DeepMind has rolled out Gemini2.5 Deep Think, claiming it to be a major upgrade in terms of advanced AI reasoning. Available for Google's AI Ultra subscribers, the feature allows more time to process complex tasks and uses "parallel thinking" to process various ideas at once, increasing chances of more creative and accurate answers.

ADVERTISEMENT

Google claims Gemini2.5 Deep Think incorporates feedback from early trusted testers and researchers, and is a variation of its earlier version, which achieved the gold-medal standard at this year’s International Mathematical Olympiad (IMO), but faster and more usable day-to-day, while still reaching Bronze-level performance on the IMO benchmark. Google says a small group of mathematicians and academics are also reviewing the Gemini 2.5 Deep Think model.

How Deep Think works?

Google claims Deep Think pushes the frontier of thinking capabilities by brainstorming using parallel techniques. This helps Gemini generate many ideas at once, consider them, and even revise before sharing the final response. The model works towards thinking creatively by taking "thinking time", similar in principle to Meta's "Tree of Thoughts". Using novel reinforcement learning techniques, this model can think intuitively to solve a specific problem.

How Deep Think stacks up?

Gemini 2.5 Deep Think has 3 core skills: creativity, strategic planning and making improvements step-by-step. It is good at building something that's complex, while also improving the aesthetics and functionality of web development tasks. Because of its reasoning capability, Deep Think can have major use cases in math and coding. When tested across challenging benchmarks in coding, science, knowledge and reasoning, it achieved top performance compared to other models like OpenAI o3, Gemini 2.5 Pro and Grok 4, without tool use, across LiveCodeBench V6 and Humanity’s Last Exam.

Recommended Stories

How can you use Deep Think in the Gemini app?

If you’re a Google AI Ultra subscriber, you can use Deep Think in the Gemini app, with a fixed set of prompts a day by toggling “Deep Think” in the prompt bar when selecting 2.5 Pro in the model drop-down. Deep Think automatically works with tools such as code execution and Google Search, and claims to produce much longer responses. Google also plans to release Deep Think with and without tools to a set of trusted testers via the Gemini API in the coming weeks. This will help better understand its usability for developer and enterprise use cases.

When it comes to embedding safety and responsibility into the model, Google says in testing, Gemini 2.5 Deep Think showed improved content safety and tone-objectivity compared to Gemini 2.5 Pro, but did have a higher tendency to refuse benign requests. Google says that as Gemini's problem-solving abilities advance, it will take a deeper look at risks that come with increased complexity.

40 Under 40 2025
View Full List >

Fortune India is now on WhatsApp! Get the latest updates from the world of business and economy delivered straight to your phone. Subscribe now.

ADVERTISEMENT