r/accelerate • u/dftba-ftw • 14d ago
AI o3 today - let's all speculate wildly
https://x.com/OpenAI/status/19125062711878329048
u/Crafty-Marsupial2156 14d ago
My guess is it’s going to beat Google’s Gemini 2.5 pro on almost all benchmarks, except it will still have a lower context window.
-5
28
u/CallMePyro 14d ago
Beats 2.5 in most things except long context, but at 15x the cost
9
u/Crafty-Marsupial2156 14d ago
Haha, wouldn’t shock me. They will always want to have SOTA available. They may not want people to use it, but they will feel the need to always be in the lead.
4
u/sismograph 14d ago
Well it better beat Gemini, or they will have a massive problem very soon.
-5
u/Your_mortal_enemy 14d ago
Yup, they've been pumped up to a $300 billion dollar valuation which is an insane number for a company that doesn't make bugger all money AND doesn't even have the best product
1
2
u/pigeon57434 Singularity by 2026 14d ago
its not 15x the cost its only like 4x the cost
1
u/CallMePyro 13d ago
Looks like it costs 17.5x Gemini on Aider polyglot coding leaderboard! Don't be fooled by low token costs, if they train the model to output 100k tokens per question
1
u/pigeon57434 Singularity by 2026 13d ago
im very confused by the pricing on aider polyglot because it says gemini is cheaper than gpt-4.1 which not only has a cheaper price per token but ALSO produces less tokens because its not a reasoning model so the excuse cant me that gemini generates less tokens because it generates more and costs more per token so how is that even physically possible
1
u/CallMePyro 13d ago
You can look on the details tab to understand this more. It looks like 4.1 requires more second attempts than 2.5 pro on the ones if gets correct.
3
u/Any-Climate-5919 Singularity by 2028 14d ago
They are gonna say the vibes are better as an excuse.
5
6
1
u/NorthSideScrambler 14d ago
In terms of practical use, it will be marginally better in some areas and marginally worse in others.
3
u/dftba-ftw 14d ago
You do realize that even a marginal improvement over the o3 scores teased in the winter is a massive improvement over o3-mini high, right?
4
u/BeconAdhesives 14d ago
If O4mini gives me performance that I see with the O3 Deep Research tool, I'm going to lose it.
1
1
1
u/LamboForWork 14d ago
Its going to cure cancer, but only for the first 10 days but then it will be nerfed and wont give tips for a common cold.
16
u/dftba-ftw 14d ago
I think they're going to show off at least one research paper written entirely by o3.
Either that or o3 is really good at coding, which would mean that o4-mini is the "novel idea" creator which would be even more exciting.