In a highly-anticipated online live-stream, owner of X and xAI and CEO of Tesla & SpaceX Elon Musk unveiled the latest versions of the Grok AI chatbot i.e. Grok-3 and Grok-3 mini.
Top members of the xAI’s AI engineering team joined Musk in this presentation to showcase the capabilities of Grok-3 and its scaled-down version Grok-3 mini.
According to xAI’s internal testing and public benchmarking tools, both Grok-3 and Grok-3 mini have surpassed existing chatbots like ChatGPT, DeepSeek R1, and Gemini-2 Flash Thinking by a large margin.
Elon Musk founded xAI in July 2023 to compete with companies like OpenAI where he was an early investor but later had to leave the company because of conflicts with other board members.

Benchmarking Against Other Chatbots
Elon Musk’s xAI tested its Grok-3 and Grok-3 mini chatbots against other advanced AI chatbots in 3 major areas — Mathematics, Science, and Computer Programming (coding and game development).
Testing against benchmarks, the xAI Grok-3 AI bot topped all the charts against existing chatbots in all of the areas mentioned above.
The xAI Grok-3 AI chatbot is designed to think deeply. According to its creators, it reasons many times while thinking. It solves the same problem many times before it concludes what’s the right solution.

Math: AIME 2024 and 2025
During the presentation, an xAI engineer said “Grok-3 is ready to go to college”. As we can see from the above benchmarking comparison bar chart, Grok-3 and Grok-3 mini performed significantly better compared to the competition in the AIME’24 mathematics test.
Even Grok-3 mini scored 40 while DeepSeek-V3 got 39 points. The larger Gork-3 version outperformed every other AI chatbot available on the market with a benchmarking score of 52. The closest to it is its own younger brother Grok-3 mini. Interestingly, ChatGPT’s GPT-4o did the worst in this area with a score of only 9.
xAI has named the current early version of Grok-3 as Chocolate. Elon Musk‘s team at his artificial intelligence company is constantly improving the AI model of Grok to solve even more complex mathematical challenges.

Science: GPQA
The mini version of Grok-3 scored 65 points in Ph.D.-level science questions. This score is the same as Gemini-2 Pro and Claude 3.5 Sonnet. In this specific area, China’s DeepSeek-V3 scored only 59 points.
The larger Grok-3 version did exceptionally well in science with 75 points in the GPQA benchmarking test.
Grok performing the science test with an ace is a positive sign that it will be able to understand the universe better than the rest of AI bots in the future.
Coding / Game Development
Looking at Graph 1 above, we can see that both Grok-3 and Grok-3 mini have a fair advantage in coding and game development over its competing AI chatbots.
Using Grok-3 and its Big Brain option (extra compute), Elon Musk created a combo of Tetris and Bejeweled using the following command statement in plain English:
Using pygame, make a game that is a mix of Tetris and Bejeweled. The code could be very long. Output it as one file. Mat it insanely great.
There was no prompt engineer involved in making this game by the user. This command also gave Grok-3 the freedom of creativity and to choose its own criteria in putting the video game together.
“We’re seeing the beginnings of creativity,” Elon Musk said.

ChatBot Arena (LMSYS) Benchmarking
xAI’s Grok-3 Chocolate (early version) outperformed all other AI ChatBots tested on the ChatBot Arena benchmarking system.
Grok-3 scored 1400 points in the ChatBot Arena benchmarking test. The closest AI chatbot to Grok-3 is Google’s Gemini-2 Flash-Thinking which scored between 1380 and 1400 (see Graph 3 below).

Stay tuned for more Elon Musk news, and videos. Follow us on:
Google News | Flipboard | RSS (Feedly).
Related Elon Musk / AI News
- Elon Musk unveils Grok-3 and Grok-3 mini that outperform other AI chatbots
- Tesla CEO Elon Musk responds to media lies about $400M purchase of armored Cybertrucks by the US Dept. of State
- Tesla increases prices of its vehicles in Canada as the tariffs battle escalates with the US
- Skeptics of FSD are the only ones who haven’t tried it, Elon Musk on Tesla Q4 2024 Earnings Call — key takeaways
- California wildfires: Elon Musk sends a Cybertruck fleet and Starlinks to help with relief efforts
- SpaceX moves Flight 7 Starship to the launch tower, Elon Musk shares futuristic views of Starbase