Elon Musk unveils Grok-3 and Grok-3 mini that outperform other AI chatbots

-

-Advertisement-

In a highly-anticipated online live-stream, owner of X and xAI and CEO of Tesla & SpaceX Elon Musk unveiled the latest versions of the Grok AI chatbot i.e. Grok-3 and Grok-3 mini.

Top members of the xAI’s AI engineering team joined Musk in this presentation to showcase the capabilities of Grok-3 and its scaled-down version Grok-3 mini.

According to xAI’s internal testing and public benchmarking tools, both Grok-3 and Grok-3 mini have surpassed existing chatbots like ChatGPT, DeepSeek R1, and Gemini-2 Flash Thinking by a large margin.

Elon Musk founded xAI in July 2023 to compete with companies like OpenAI where he was an early investor but later had to leave the company because of conflicts with other board members.

– Advertisement –
Elon Musk and his team at xAI unveil the Grok-3 AI chatbot to the world on Monday 17th Feb 2025.
Elon Musk and his team at xAI unveil the Grok-3 AI chatbot to the world on Monday 17th Feb 2025. “Our mission is to understand the universe”. Credit: Elon Musk / xAI / X (Twitter).

Benchmarking Against Other Chatbots

Elon Musk’s xAI tested its Grok-3 and Grok-3 mini chatbots against other advanced AI chatbots in 3 major areas — Mathematics, Science, and Computer Programming (coding and game development).

Testing against benchmarks, the xAI Grok-3 AI bot topped all the charts against existing chatbots in all of the areas mentioned above.

The xAI Grok-3 AI chatbot is designed to think deeply. According to its creators, it reasons many times while thinking. It solves the same problem many times before it concludes what’s the right solution.

Graph: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o.
Graph 1: Grok-3 and Grok-3 mini benchmarking results in Maths, Science, and Coding abilities against Gemini-2 Pro, DeepSeek-V3, Claude 3.5 Sonnet, and ChatGPT 4o. Credit: xAI / Elon Musk.
– Advertisement –

Math: AIME 2024 and 2025

During the presentation, an xAI engineer said “Grok-3 is ready to go to college”. As we can see from the above benchmarking comparison bar chart, Grok-3 and Grok-3 mini performed significantly better compared to the competition in the AIME’24 mathematics test.

Even Grok-3 mini scored 40 while DeepSeek-V3 got 39 points. The larger Gork-3 version outperformed every other AI chatbot available on the market with a benchmarking score of 52. The closest to it is its own younger brother Grok-3 mini. Interestingly, ChatGPT’s GPT-4o did the worst in this area with a score of only 9.

xAI has named the current early version of Grok-3 as Chocolate. Elon Musk‘s team at his artificial intelligence company is constantly improving the AI model of Grok to solve even more complex mathematical challenges.

AIME 2025 Performance Bar Chart: Grok-3 Reasoning Beta and Grok-3 mini reasoning compared to ChatGPT o3 mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking.
Graph 2: AIME 2025 Performance Bar Chart: Grok-3 Reasoning Beta and Grok-3 mini reasoning compared to ChatGPT o3 mini (high), o1, DeepSeek-R1, and Gemini-2 Flash Thinking. Credit: xAI / Elon Musk via X.
– Advertisement –

Science: GPQA

The mini version of Grok-3 scored 65 points in Ph.D.-level science questions. This score is the same as Gemini-2 Pro and Claude 3.5 Sonnet. In this specific area, China’s DeepSeek-V3 scored only 59 points.

The larger Grok-3 version did exceptionally well in science with 75 points in the GPQA benchmarking test.

Grok performing the science test with an ace is a positive sign that it will be able to understand the universe better than the rest of AI bots in the future.

Coding / Game Development

Looking at Graph 1 above, we can see that both Grok-3 and Grok-3 mini have a fair advantage in coding and game development over its competing AI chatbots.

Using Grok-3 and its Big Brain option (extra compute), Elon Musk created a combo of Tetris and Bejeweled using the following command statement in plain English:

Using pygame, make a game that is a mix of Tetris and Bejeweled. The code could be very long. Output it as one file. Mat it insanely great.

There was no prompt engineer involved in making this game by the user. This command also gave Grok-3 the freedom of creativity and to choose its own criteria in putting the video game together.

“We’re seeing the beginnings of creativity,” Elon Musk said.

– Advertisement –
Screenshot of Grok-3-created hybrid video game of Tetris and Bejeweled.
Screenshot of Grok-3-created hybrid video game of Tetris and Bejeweled. Credit: xAI / Elon Musk via X (live-stream recording video below).

ChatBot Arena (LMSYS) Benchmarking

xAI’s Grok-3 Chocolate (early version) outperformed all other AI ChatBots tested on the ChatBot Arena benchmarking system.

Grok-3 scored 1400 points in the ChatBot Arena benchmarking test. The closest AI chatbot to Grok-3 is Google’s Gemini-2 Flash-Thinking which scored between 1380 and 1400 (see Graph 3 below).

Graph: Grok-3 Chocolate vs other AI chatbots in ChatBot Arena benchmarking test (LMSYS).
Graph: Grok-3 Chocolate vs other AI chatbots in ChatBot Arena benchmarking test (LMSYS). Credit: xAI / Elon Musk via X.
– Advertisement –
Video: Recording of the Grok-3 AI chatbot unveiling live-stream.

Stay tuned for more Elon Musk news, and videos. Follow us on:
Google News | Flipboard | RSS (Feedly).

Related Elon Musk / AI News

Iqtidar Ali
Iqtidar Alihttp://www.teslaoracle.com
Author of more than 1500 articles on Tesla, SpaceX, and EVs. His work has been liked and tweeted by Elon Musk and other prominent influencers. You can reach him on Twitter @IqtidarAlii

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest News

Tesla CEO Elon Musk responds to media lies about $400M purchase of armored Cybertrucks by the US Dept. of State

A couple of days ago, US Department of State's procurement planning files surfaced online that listed a $400 million...

A glimpse of Unsupervised FSD, watch Tesla owner eat as her car drives itself on Autopilot

Tesla (TSLA) CEO Elon Musk said during the 2024 yearly earnings call that "The reality of autonomy is upon...

Tesla’s Megafactory China begins production of Megapacks for Australia

It took Tesla (TSLA) only 7 months from groundbreaking to the start of production of its Megafactory in Shanghai,...

Tesla (TSLA) to launch Model Y Juniper Matrix Headlights in a couple of months in the US

So far, the launch of the new 2025 Tesla Model Y Juniper has been a success story for the...
- Advertisement -

Flight 7 Starship looked like galactic stardust after the RUD (sighting videos) — SpaceX explains the reason

SpaceX lost the upper stage Ship 33 after stage separation as it detached itself from the booster. The private...

Elon Musk shares amazing footage of Flight 7 booster’s landing catch

For the 2nd time in history, a spaceflight company has caught a ~250 ton rocket booster on its way...

Starship Flight 7 live updates, watch the live stream here [Successful booster catch]

Finally, the day has arrived. SpaceX is conducting Flight 7 Starship 7 at Starbase Texas. You can watch the...

Tesla Tips & Tricks

Even red wine doesn’t stain Tesla white seats (video)

Tesla's futuristic white interior option is the most popular...

SpaceX enthusiast designs a 1:110 scale LEGO Starship

A NASA/SpaceX enthusiast who goes by the name of...

Here’s how to pair Apple Watch with your Tesla, unlock it and send commands remotely

Tesla (TSLA) released a notable feature with the 2024...

Tesla Quarterly Reports & Eearnings

Tesla’s automotive revenues slide a little in 2024 but energy business grows immensely — Q4 report

Tesla (TSLA) released its Q4 and full-year 2024 financial...

Tesla (TSLA) announces the date and time of the Q4 2024 Earnings Call

Yesterday, we published a comprehensive report of Tesla's Q4...
- Advertisement -

You might also likeRELATED
Recommended for You