Hacker News

by Ryan Harman

GPT‑5.3‑Codex‑Spark (openai.com)

412 points by meetpateltech 5 hours ago

beklein 4 hours ago

I love this! I use coding agents to generate web-based slide decks where “master slides” are just components, and we already have rules + assets to enforce corporate identity. With content + prompts, it’s straightforward to generate a clean, predefined presentation. What I’d really want on top is an “improv mode”: during the talk, I can branch off based on audience questions or small wording changes, and the system proposes (say) 3 candidate next slides in real time. I pick one, present it, then smoothly merge back into the main deck. Example: if I mention a recent news article / study / paper, it automatically generates a slide that includes a screenshot + a QR code link to the source, then routes me back to the original storyline. With realtime voice + realtime code generation, this could turn the boring old presenter view into something genuinely useful.

sva_ 3 hours ago

I love the probabilistic nature of this. Presentations could be anywhere from extremely impressive to hilariously embarrassing.

clickety_clack 2 hours ago

It would be so cool if it generated live in the presentation and adjusted live as you spoke, so you’d have to react to whatever popped on screen!

crystal_revenge an hour ago

Etheryte 2 hours ago

onionisafruit 2 hours ago

m_mueller 19 minutes ago

You're describing almost verbatim what we're building at Octigen [1]! Happy to provide a demo and/or give you free access to our alpha version already online.

[1] https://octigen.com

deepGem 2 hours ago

I built something similar at a hackathon, a dynamic teleprompter that adjusts the speed of tele-prompting based on speaker tonality and spoken wpm. I can see extending the same to an improv mode. This is a super cool idea.

jorgenveisdal 2 hours ago

As an associate professor who spends a ridiculous amount of time preparing for lectures, I would love to try this in one of my courses

esafak 3 hours ago

Can you show one?

beklein an hour ago

The end result would be a normal PPT presentation, check https://sli.dev as an easy start, ask Codex/Claude/... to generate the slides using that framework with data from something.md. The interesting part here is generating these otherwise boring slide decks not with PowerPoint itself but with AI coding agents and a master slides, AGENTS.md context. I’ll be showing this to a small group (normally members only) at IPAI in Heilbronn, Germany on 03/03. If you’re in the area and would like to join, feel free to send me a message I will squeeze you in.

orochimaaru 4 hours ago

How do you handle the diagrams?

beklein 3 hours ago

In my AGENTS.md file i have a _rule_ that tells the model to use Apache ECharts, the data comes from the prompt and normally .csv/.json files. Prompt would be like: "After slide 3 add a new content slide that shows a bar chart with data from @data/somefile.csv" ... works great and these charts can be even interactive.

orochimaaru 2 hours ago

turnsout 3 hours ago

I love the idea of a living slide deck. This feels like a product that needs to exist!

postalcoder 3 hours ago

First thoughts using gpt-5.3-codex-spark in Codex CLI:

Blazing fast but it definitely has a small model feel.

It's tearing up bluey bench (my personal agent speed benchmark), which is a file system benchmark where I have the agent generate transcripts for untitled episodes of a season of bluey, perform a web search to find the episode descriptions, and then match the transcripts against the descriptions to generate file names and metadata for each episode.

Downsides:

- It has to be prompted to do actions in my media library AGENTS.md that the larger models adhere to without additional prompting.

- It's less careful with how it handles context which means that its actions are less context efficient. Combine that with the smaller context window and I'm seeing frequent compactions.

  Bluey Bench* (minus transcription time):

  Codex CLI
  gpt-5.3-codex-spark low        20s
  gpt-5.3-codex-spark medium     41s
  gpt-5.3-codex-spark xhigh   1m 09s (1 compaction)

  gpt-5.3-codex low           1m 04s
  gpt-5.3-codex medium        1m 50s

  gpt-5.2 low                 3m 04s
  gpt-5.2 medium              5m 20s

  Claude Code
  opus-4.6 (no thinking)      1m 04s

  Antigravity
  gemini-3-flash              1m 40s
  gemini-3-pro low            3m 39s

  *Season 2, 52 episodes

alexdobrenko an hour ago

can we plese make the bluey bench the gold standard for all models always

mnicky 2 hours ago

Can you compare it to Opus 4.6 with thinking disabled? It seems to have very impressive benchmark scores. Could also be pretty fast.

postalcoder 2 hours ago

Added a thinking-disabled Opus 4.6 timing. It took 1m 4s – coincidentally the same as 5.3-codex-low.

Squarex 2 hours ago

I wonder why they named it so similiarly to the normal codex model while it much worse, while cool of course.

pjs_ 3 hours ago

Continue to believe that Cerebras is one of the most underrated companies of our time. It's a dinner-plate sized chip. It actually works. It's actually much faster than anything else for real workloads. Amazing

onlyrealcuzzo 3 hours ago

Nvidia seems cooked.

Google is crushing them on inference. By TPUv9, they could be 4x more energy efficient and cheaper overall (even if Nvidia cuts their margins from 75% to 40%).

Cerebras will be substantially better for agentic workflows in terms of speed.

And if you don't care as much about speed and only cost and energy, Google will still crush Nvidia.

And Nvidia won't be cheaper for training new models either. The vast majority of chips will be used for inference by 2028 instead of training anyway.

Nvidia has no manufacturing reliability story. Anyone can buy TSMC's output.

Power is the bottleneck in the US (and everywhere besides China). By TPUv9 - Google is projected to be 4x more energy efficient. It's a no-brainer who you're going with starting with TPUv8 when Google lets you run on-prem.

These are GW scale data centers. You can't just build 4 large-scale nuclear power plants in a year in the US (or anywhere, even China). You can't just build 4 GW solar farms in a year in the US to power your less efficient data center. Maybe you could in China (if the economics were on your side, but they aren't). You sure as hell can't do it anywhere else (maybe India).

What am I missing? I don't understand how Nvidia could've been so far ahead and just let every part of the market slip away.

sailingparrot 2 hours ago

> let every part of the market slip away.

Which part of the market has slept away, exactly ? Everything you wrote is supposition and extrapolation. Nvidia has a chokehold on the entire market. All other players still exist in the small pockets that Nvidia doesn’t have enough production capacity to serve. And their dev ecosystem is still so far ahead of anyone else. Which providers gets chosen to equip a 100k chips data center goes so far beyond the raw chip power.

onlyrealcuzzo 2 hours ago

mnicky 2 hours ago

> What am I missing?

Largest production capacity maybe?

Also, market demand will be so high that every player's chips will be sold out.

onlyrealcuzzo 2 hours ago

wing-_-nuts 2 hours ago

Man I hope someone drinks Nvidia's milk shake. They need to get humbled back to the point where they're desperate to sell gpus to consumers again.

Only major road block is cuda...

whism 2 hours ago

I believe they licensed smth from groq

Handy-Man 2 hours ago

Well they `acquired` groq for a reason.

zozbot234 3 hours ago

It's "dinner-plate sized" because it's just a full silicon wafer. It's nice to see that wafer-scale integration is now being used for real work but it's been researched for decades.

arcanemachiner 3 hours ago

Just wish they weren't so insanely expensive...

azinman2 3 hours ago

The bigger the chip, the worse the yield.

speedgoose 2 hours ago

moralestapia 3 hours ago

dalemhurley 2 hours ago

Yet investors keep backing NVIDIA.

vimda an hour ago

At this point Tech investment and analysis is so divorced from any kind of reality that it's more akin to lemmings on the cliff than careful analysis of fundamentals

latchkey 3 hours ago

Not for what they are using it for. It is $1m+/chip and they can fit 1 of them in a rack. Rack space in DC's is a premium asset. The density isn't there. AI models need tons of memory (this product annoucement is case in point) and they don't have it, nor do they have a way to get it since they are last in line at the fabs.

Their only chance is an aquihire, but nvidia just spent $20b on groq instead. Dead man walking.

p1esk 3 hours ago

The real question is what’s their perf/dollar vs nvidia?

zozbot234 3 hours ago

xnx 3 hours ago

latchkey 3 hours ago

spwa4 3 hours ago

Oh don't worry. Ever since the power issue started developing rack space is no longer at a premium. Or at least, it's no longer the limiting factor. Power is.

latchkey 3 hours ago

femiagbabiaka 3 hours ago

yep

xnx 3 hours ago

Cerebras is a bit of a stunt like "datacenters in spaaaaace".

Terrible yield: one defect can ruin a whole wafer instead of just a chip region. Poor perf./cost (see above). Difficult to program. Little space for RAM.

the_duke 3 hours ago

They claim the opposite, though, saying the chip is designed to tolerate many defects and work around them.

perdomon an hour ago

This has been the industry standard for the last 20 minutes. I can't believe people are still using GPT-5.3-Codex.

sam_goody 25 minutes ago

I read this headline and was like, "A look, an announcement by GPT!! That means that Google or Anthropic must have had a release today!"

And, yup, there is Gemini in item 3!

simonw an hour ago

My stupid pelican benchmark proves to be genuinely quite useful here, you get a visual representation of the quality difference between GPT-5.3-Codex-Spark and full GPT-5.3-Codex: https://simonwillison.net/2026/Feb/12/codex-spark/

lacoolj an hour ago

These are the ones I look for every time a new model is released. Incorporates so many things into one single benchmark.

Also your blog is tops. Keep it up, love the work.

jryio 4 hours ago

This is interesting for offloading "tiered" workloads / priority queue with coding agents.

If 60% of the work is "edit this file with this content", or "refactor according to this abstraction" then low latency - high token inference seems like a needed improvement.

Recently someone made a Claude plugin to offload low-priority work to the Anthropic Batch API [1].

Also I expect both Nvidia and Google to deploy custom silicon for inference [2]

1: https://github.com/s2-streamstore/claude-batch-toolkit/blob/...

2: https://www.tomshardware.com/tech-industry/semiconductors/nv...

zozbot234 4 hours ago

Note that Batch APIs are significantly higher latency than normal AI agent use. They're mostly intended for bulk work where time constraints are not essential. Also, GPT "Codex" models (and most of the "Pro" models also) are currently not available under OpenAI's own batch API. So you would have to use non-agentic models for these tasks and it's not clear how well they would cope.

(Overall, batches do have quite a bit of potential for agentic work as-is but you have to cope with them taking potentially up to 24h for just a single roundtrip with your local agent harness.)

dehugger 4 hours ago

I built something similar using an MCP that allows claude to "outsource" development to GLM 4.7 on Cerebras (or a different model, but GLM is what I use). The tool allows Claude to set the system prompt, instructions, specify the output file to write to and crucially allows it to list which additional files (or subsections of files) should be included as context for the prompt.

Ive had great success with it, and it rapidly speeds up development time at fairly minimal cost.

cheema33 4 hours ago

Why use MCP instead of an agent skill for something like this when MCP is typically context inefficient?

pertymcpert 2 hours ago

wahnfrieden 4 hours ago

nikkwong 4 hours ago

> Our latest frontier models have shown particular strengths in their ability to do long-running tasks, working autonomously for hours, days or weeks without intervention.

I have yet to see this (produce anything actually useful).

simonw 4 hours ago

How hard have you tried?

I've been finding that the Opus 4.5/4.6 and GPT-5.2/5.3 models really have represented a step-change in how good they are at running long tasks.

I can one-shot prompt all sorts of useful coding challenges now that previously I would have expected to need multiple follow-ups to fix mistakes the agents made.

I got all of this from a single prompt, for example: https://github.com/simonw/research/tree/main/cysqlite-wasm-w... - including this demo page: https://simonw.github.io/research/cysqlite-wasm-wheel/demo.h... - using this single prompt: https://github.com/simonw/research/pull/79

aeyes 4 hours ago

What do you mean? The generated script just downloads the sources and runs pyodide: https://github.com/simonw/research/blob/main/cysqlite-wasm-w...

There is maybe 5 relevant lines in the script and nothing complex at all that would require to run for days.

simonw 3 hours ago

andai 3 hours ago

basilgohar 4 hours ago

Can you share any examples of these one-shot prompts? I've not gotten to the point where I can get those kind of results yet.

simonw 3 hours ago

gamegoblin 4 hours ago

I routinely leave codex running for a few hours overnight to debug stuff

If you have a deterministic unit test that can reproduce the bug through your app front door, but you have no idea how the bug is actually happening, having a coding agent just grind through the slog of sticking debug prints everywhere, testing hypotheses, etc — it's an ideal usecase

nikkwong 3 hours ago

I have a hard time understanding how that would work — for me, I typically interface with coding agents through cursor. The flow is like this: ask it something -> it works for a min or two -> I have to verify and fix by asking it again; etc. until we're at a happy place with the code. How do you get it to stop from going down a bad path and never pulling itself out of it?

The important role for me, as a SWE, in the process, is verify that the code does what we actually want it to do. If you remove yourself from the process by letting it run on its own overnight, how does it know it's doing what you actually want it to do?

Or is it more like with your usecase—you can say "here's a failing test—do whatever you can to fix it and don't stop until you do". I could see that limited case working.

gamegoblin 14 minutes ago

woah 3 hours ago

zem an hour ago

p1esk 3 hours ago

vel0city 2 hours ago

tsss 4 hours ago

How can you afford that?

wahnfrieden 4 hours ago

addaon 3 hours ago

> it's an ideal usecase

This is impressive, you’ve completely mitigated the risk of learning or understanding.

arcanemachiner 3 hours ago

XCSme 4 hours ago

Their ability to burn through tokens non-stop for hours, days or weeks without intervention.

raw_anon_1111 3 hours ago

You’re mixing up Open AI for Anthropic.

Anthropic is actually sort of concerned with not burning through cash and charging people a reasonable price. Open AI doesn’t care. I can use Codex CLI all day and not approach any quotas with just my $20 a month ChatGPT subscription.

I treat coding agents like junior developers and never take my hand off the wheel except for boilerplate refactoring.

TheMuenster an hour ago

Can I just say how funny this metric is?

"Our model is so slow and our tokens/second is so low that these tasks can take hours!" is not the advertising they think it is.

johnfn 3 hours ago

The other day I got Codex to one-shot an upgrade to Vite 8 at my day job (a real website with revenue). It worked in this for over 3 hours without intervention (I went to sleep). This is now in production.

seunosewa 2 hours ago

How did you verify it?

girvo an hour ago

wahnfrieden 4 hours ago

It worked for me several times.

It's easy to say that these increasingly popular tools are only able to produce useless junk. You haven't tried, or you haven't "closed the loop" so that the agent can evaluate its own progress toward acceptance criteria, or you are monitoring incompetent feeds of other users.

nikkwong 3 hours ago

I'm definitely bullish on LLM's for coding. It sounds to me as though getting it to run on its own for hours and produce something usable requires more careful thought and setup than just throwing a prompt at it and wishing for the best—but I haven't seen many examples in the wild yet

foobar10000 2 hours ago

rcarmo 2 hours ago

bitwize 3 hours ago

PEBKAC

raahelb 2 hours ago

Interesting to note that the reduced latency is not just due to the improved model speed, but also because of improvements made to the harness itself:

> "As we trained Codex-Spark, it became apparent that model speed was just part of the equation for real-time collaboration—we also needed to reduce latency across the full request-response pipeline. We implemented end-to-end latency improvements in our harness that will benefit all models [...] Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon."

I wonder if all other harnesses (Claude Code, OpenCode, Cursor etc.,) can make similar improvements to reduce latency. I've been vibe coding (or doing agentic engineering) with Claude Code a lot for the last few days and I've had some tasks take as long as 30 minutes.

2001zhaozhao an hour ago

This might actually be hard for open source agents (e.g. Opencode) to replicate, barring a standardized WebSocket LLM API being widely adopted.

jbellis 6 minutes ago

really too bad that the codex models are so tightly coupled to the codex harness as to be useless for everything else

kachapopopow 4 hours ago

Is this the first time one of the big 3 using Cerebras? I've been waiting for this day...

arisAlexis 4 hours ago

They were afraid for the untested tech but it looks like a leap in speed now

rvz 4 hours ago

This is nonsense what do you mean? Mistral uses Cerebras for their LLMs as well. [0]

It's certainly not "untested".

[0] https://www.cerebras.ai/blog/mistral-le-chat

lemming 3 hours ago

mbm an hour ago

Works pretty well as a general-purpose computer. The speed is really enjoyable. Could replace some of my Claude Code use actually. For coding, set to xhigh and use it for personal tools or small projects.

Example repo that Codex with spark made in about 15 minutes for me since `claude --resume` has been finicky lately: https://github.com/mzxrai/claude-sessions

mudkipdev 4 hours ago

Off topic but how is it always this HN user sharing model releases within a couple of minutes of their announcement?

casefields 4 hours ago

The account isn’t a normal user. They literally only post stuff like this. Their comments are just official links back to said announcements.

sho_hn 4 hours ago

Maybe they set up an agent for it.

Squarex 4 hours ago

or a simple cron :)

lacoolj an hour ago

Google Alerts

pdeva1 4 hours ago

This is closer to 5.1 mini it seems and tied to Pro account. GLM 4.7 is available on-demand on Cerebras today [1] and performs better and cheaper... [1] https://www.cerebras.ai/blog/glm-4-7

ehzb2827 4 hours ago

GLM 4.7 scores 41.0% on Terminal Bench 2.0 [1] compared to 58.4% for GPT-5.3-Codex-Spark [2].

[1] https://z.ai/blog/glm-4.7 [2] https://openai.com/index/introducing-gpt-5-3-codex-spark/

ttul 2 hours ago

Great move by OpenAI. With coding agents, if you have access to a fast and cheap model, you can afford to let it rip, making lots of mistakes, and iterate until it gets things right. With the right scaffolding (AGENTS.md, SKILLS.md, etc.), a fast and light model can do great things. And when it's done, you can still have the heavyweight model come in to clean up any messes.

alecco an hour ago

This could probably work amazingly with an orchestrator on 5.3-high and coding agents with Spark. But it would need some decent instructions for both.

antirez 4 hours ago

The search for speed is vain. Often Claude Code Opus 4.6, on hard enough problems, can do the impression of acting fast without really making progresses because of lack of focus on what matters. Then you spin the much slower GPT 5.3-Codex and it fixes everything in 3 minutes of doing the right thing.

mickeyp 4 hours ago

I disagree. This is great for bulk tasks: renaming, finding and searching for things, etc

ghosty141 24 minutes ago

What codex often does for this, write a small python script and execute that to bulk rename for example.

I agree that there is use for fast "simpler" models, there are many tasks where the regular codex-5.3 is not necessary but I think it's rarely worth the extra friction of switching from regular 5.3 to 5.3-spark.

Aurornis 3 hours ago

I will always take more speed. My use of LLMs always comes back to doing something manually, from reviewing code to testing it to changing direction. The faster I can get the LLM part of the back-and-forth to complete, the more I can stay focused on my part.

jusgu 4 hours ago

disagree. while intelligence is important, speed is especially important when productionizing AI. it’s difficult to formalize the increase in user experience per increase in TPS but it most definitely exists.

capevace 4 hours ago

Seems like the industry is moving further towards having low-latency/high-speed models for direct interaction, and having slow, long thinking models for longer tasks / deeper thinking.

Quick/Instant LLMs for human use (think UI). Slow, deep thinking LLMs for autonomous agents.

gaigalas 4 hours ago

You always want faster feedback. If not a human leveraging the fast cycles, another automated system (eg CI).

Slow, deep tasks are mostly for flashy one-shot demos that have little to no practical use in the real world.

foobar10000 2 hours ago

I mean, yes, one always does want faster feedback - cannot argue with that!

But some of the longer stuff - automating kernel fusion, etc, are just hard problems. And a small model - or even most bigger ones, will not get the direction right…

gaigalas 2 hours ago

varispeed 4 hours ago

Are they really thinking or are they sprinkling them with Sleep(x)?

storus 3 hours ago

Anyone using OpenClaw to manage a bunch of coding agents so that you only set the high-level vision and leave all the prompting, testing, debugging, forking to agents? If yes, how did you glue it all together? Are you using local models? What is the SOTA for what I can run locally with a 512GB M3 Ultra, 2x DGX Spark, 2x RTX Pro 6000 Max-Q in one machine and 1x RTX Pro 6000 WS in another machine?

OsrsNeedsf2P 4 hours ago

No hint on pricing. I'm curious if faster is more expensive, given a slight trade-off in accuracy

sauwan 3 hours ago

It's either more expensive or dumber.

Aeroi 34 minutes ago

open ai naming is a meme at this point

wxw 4 hours ago

Great stuff. People are getting used to agents as the interface for everything, even work as simple as "change label X to label Y". More speed on that front is welcome. The Codex "blended mode" they refer to will be useful (similar to Claude Code bouncing between haiku and opus).

I imagine it's a win-win. This could significantly help their tokenomics.

The example showing a plan being generated instantaneously is interesting. Human understanding will end up as the last, true bottleneck.

dalemhurley 2 hours ago

This is a win for agents, speed and intelligence is crucial to the loop. If the time and token cost is small you can iterate many times to correct mistakes.

Got to wonder why Wall Street is dumping NVIDIA.

SamDc73 2 hours ago

I mean they are only running a small version of codex can they run the full one? Or the technology isn't there yet?

mynti 3 hours ago

With the rough numbers from the blog post at ~1k tokens a second in Cerebras it should put it right at the same size as GLM 4.7, which also is available at 1k tokens a second. And they say that it is a smaller model than the normal Codex model

Computer0 17 minutes ago

128k context window!

hchak 2 hours ago

Cerebras out here catching dubs. Does anyone know if Groq is running DGX Cloud inference or am I tripping?

rprend 2 hours ago

Damn, this is the first thing to make me decide to try Codex, as a loyal Claude Code user.

cjbarber 4 hours ago

It'll be nice when there's smarter routing between models, or easier routing, so some things get sent to the fast model, some get sent to the cheap model, some get sent to the smart model, etc.

jannniii 2 hours ago

This would be interesting if it was an open weights model.

alexhans 4 hours ago

When I saw Spark my mind went to Apache Spark and wondered if we were learning all the lessons in orchestration of driver/worker and data shuffling from that space.

modeless 3 hours ago

Why are they obscuring the price? It must be outrageously expensive.

chaos_emergent 3 hours ago

I think it's a beta so they're trying to figure out pricing by deploying it.

throwup238 4 hours ago

Your move, Anthropic.

(Yes I know they released /fast last week but I’m loving the constant oneupsmanship)

bearjaws 2 hours ago

/fast is insanely expensive.

Last night it got stuck in a loop (in plan mode, I use vanilla CC) and burnt through $22 in 15 minutes.

dude250711 4 hours ago

They asked Google to cover them this time. They will owe them a reciprocal favour.

rvz 4 hours ago

ok. [0]

[0] https://www.anthropic.com/news/anthropic-raises-30-billion-s...

anonzzzies 3 hours ago

Been using glm 4.7 for this with opencode. Works really well.

system2 2 hours ago

I stopped using OpenAI tools recently after they increased the censorship. I can't even tell it to read a screencapture software I am building because it thinks I might use it for evil purposes.

nusl 4 hours ago

These graphs are really weird. One only shows 30-60% range with the model(s) close to 60%, the other shows 80% but the top model is at 77%.

guessmyname 3 hours ago

Lying with charts → https://handsondataviz.org/how-to-lie-with-charts.html

Also → https://medium.com/@hypsypops/axes-of-evil-how-to-lie-with-g...

More → https://researchguides.library.yorku.ca/datavisualization/li...

And → https://vdl.sci.utah.edu/blog/2023/04/17/misleading/

desireco42 an hour ago

Is it not available in Codex? I think this is fantastic and can't wait to try it, this is exactly the usecase I need, something fast, perform based on my instruction.

Cerebras is a winner here.

arpinum an hour ago

update codex, it's there.

tsss 3 hours ago

Does anyone want this? Speed has never been the problem for me, in fact, higher latency means less work for me as a replaceable corporate employee. What I need is the most intelligence possible; I don't care if I have to wait a day for an answer if the answer is perfect. Small code edits, like they are presented as the use case here, I can do much better myself than trying to explain to some AI what exactly I want done.

vessenes 3 hours ago

Yes, we want this.

cjbarber 4 hours ago

For a bit, waiting for LLMs was like waiting for code to compile: https://xkcd.com/303/

> more than 1000 tokens per second

Perhaps, no more?

(Not to mention, if you're waiting for one LLM, sometimes it makes sense to multi-table. I think Boris from Anthropic says he runs 5 CC instances in his terminal and another 5-10 in his browser on CC web.)

deskithere 4 hours ago

Anyway token eaters are upgrading their consumption capabilities.

allisdust 4 hours ago

Normal codex it self is sub par compared to opus. This might be even worse

cactusplant7374 3 hours ago

I was really hoping it would support codex xhigh first.

jauntywundrkind 4 hours ago

Wasn't aware there was an effort to move to websockets. Is there any standards work for this, or is this just happening purely within the walled OpenAI garden?

> Under the hood, we streamlined how responses stream from client to server and back, rewrote key pieces of our inference stack, and reworked how sessions are initialized so that the first visible token appears sooner and Codex stays responsive as you iterate. Through the introduction of a persistent WebSocket connection and targeted optimizations inside of Responses API, we reduced overhead per client/server roundtrip by 80%, per-token overhead by 30%, and time-to-first-token by 50%. The WebSocket path is enabled for Codex-Spark by default and will become the default for all models soon.

behnamoh 4 hours ago

In my opinion, they solved the wrong problem. The main issue I have with Codex is that the best model is insanely slow, except at nights and weekends when Silicon Valley goes to bed. I don't want a faster, smaller model (already have that with GLM and MiniMax). I want a faster, better model (at least as fast as Opus).

When they partnered with Cerebras, I kind of had a gut feeling that they wouldn't be able to use their technology for larger models because Cerebras doesn't have a track record of serving models larger than GLM.

It pains me that five days before my Codex subscription ends, I have to switch to Anthropic because despite getting less quota compared to Codex, at least I'll be able to use my quota _and_ stay in the flow.

But even Codex's slowness aside, it's just not as good of an "agentic" model as Opus: here's what drove me crazy: https://x.com/OrganicGPT/status/2021462447341830582?s=20. The Codex model (gpt-5.3-xhigh) has no idea about how to call agents smh

properbrew 4 hours ago

I was using a custom skill to spawn subagents, but it looks like the `/experimental` feature in codex-cli has the SubAgent setting (https://github.com/openai/codex/issues/2604#issuecomment-387...)

behnamoh 4 hours ago

Yes, I was using that. But the prompt given to the agents is not correct. Codex sends a prompt to the first agent and then sends the second prompt to the second agent, but then in the second prompt, it references the first prompt. which is completely incorrect.

kachapopopow 4 hours ago

That's why I built oh-my-singularity (based on oh-my-pi - see the front page from can.ac): https://share.us-east-1.gotservers.com/v/EAqb7_Wt/cAlknb6xz0...

video is pretty outdated now, this was a PoC - working on a dependency free version.

cjbarber 4 hours ago

> In my opinion, they solved the wrong problem. The main issue I have with Codex is that the best model is insanely slow, except at nights and weekends when Silicon Valley goes to bed. I don't want a faster, smaller model (already have that with GLM and MiniMax). I want a faster, better model (at least as fast as Opus).

It's entirely possible that this is the first step and that they will also do faster better models, too.

behnamoh 4 hours ago

I doubt it; there's a limit on model size that can be supported by Cerebras tech. GPT-5.3 is supposedly +1T parameters...

joshuastuden an hour ago

re-thc 4 hours ago

> In my opinion, they solved the wrong problem

> I don't want a faster, smaller model. I want a faster, better model

Will you pay 10x the price? They didn't solve the "wrong problem". They did what they could with the resources they have.

cowpig 2 hours ago

> Today, we’re releasing

Releasing for real? Is it an open model?

rvz 4 hours ago

> Today, we’re releasing a research preview of GPT‑5.3-Codex-Spark, a smaller version of GPT‑5.3-Codex, and our first model designed for real-time coding. Codex-Spark marks the first milestone in our partnership with Cerebras, which we announced in January .

Nevermind. [0]

[0] https://news.ycombinator.com/item?id=35490837

Hacker News

by Ryan Harman

GPT‑5.3‑Codex‑Spark (openai.com)

beklein 4 hours ago [-]

sva_ 3 hours ago [-]

clickety_clack 2 hours ago [-]

crystal_revenge an hour ago [-]

Etheryte 2 hours ago [-]

onionisafruit 2 hours ago [-]

m_mueller 19 minutes ago [-]

deepGem 2 hours ago [-]

jorgenveisdal 2 hours ago [-]

esafak 3 hours ago [-]

beklein an hour ago [-]

orochimaaru 4 hours ago [-]

beklein 3 hours ago [-]

orochimaaru 2 hours ago [-]

turnsout 3 hours ago [-]

postalcoder 3 hours ago [-]

alexdobrenko an hour ago [-]

mnicky 2 hours ago [-]

postalcoder 2 hours ago [-]

Squarex 2 hours ago [-]

pjs_ 3 hours ago [-]

onlyrealcuzzo 3 hours ago [-]

sailingparrot 2 hours ago [-]

onlyrealcuzzo 2 hours ago [-]

mnicky 2 hours ago [-]

onlyrealcuzzo 2 hours ago [-]

wing-_-nuts 2 hours ago [-]

whism 2 hours ago [-]

Handy-Man 2 hours ago [-]

zozbot234 3 hours ago [-]

arcanemachiner 3 hours ago [-]

azinman2 3 hours ago [-]

speedgoose 2 hours ago [-]

moralestapia 3 hours ago [-]

dalemhurley 2 hours ago [-]

vimda an hour ago [-]

latchkey 3 hours ago [-]

p1esk 3 hours ago [-]

zozbot234 3 hours ago [-]

xnx 3 hours ago [-]

latchkey 3 hours ago [-]

spwa4 3 hours ago [-]

latchkey 3 hours ago [-]

femiagbabiaka 3 hours ago [-]

xnx 3 hours ago [-]

the_duke 3 hours ago [-]

perdomon an hour ago [-]

sam_goody 25 minutes ago [-]

simonw an hour ago [-]

lacoolj an hour ago [-]

jryio 4 hours ago [-]

zozbot234 4 hours ago [-]

dehugger 4 hours ago [-]

cheema33 4 hours ago [-]

pertymcpert 2 hours ago [-]

wahnfrieden 4 hours ago [-]

nikkwong 4 hours ago [-]

simonw 4 hours ago [-]

aeyes 4 hours ago [-]

simonw 3 hours ago [-]

andai 3 hours ago [-]

basilgohar 4 hours ago [-]

simonw 3 hours ago [-]

gamegoblin 4 hours ago [-]

nikkwong 3 hours ago [-]

gamegoblin 14 minutes ago [-]

woah 3 hours ago [-]

zem an hour ago [-]

p1esk 3 hours ago [-]

vel0city 2 hours ago [-]

tsss 4 hours ago [-]

wahnfrieden 4 hours ago [-]

addaon 3 hours ago [-]

arcanemachiner 3 hours ago [-]

XCSme 4 hours ago [-]

raw_anon_1111 3 hours ago [-]

TheMuenster an hour ago [-]

beklein 4 hours ago

sva_ 3 hours ago

clickety_clack 2 hours ago

crystal_revenge an hour ago

Etheryte 2 hours ago

onionisafruit 2 hours ago

m_mueller 19 minutes ago

deepGem 2 hours ago

jorgenveisdal 2 hours ago

esafak 3 hours ago

beklein an hour ago

orochimaaru 4 hours ago

beklein 3 hours ago

orochimaaru 2 hours ago

turnsout 3 hours ago

postalcoder 3 hours ago

alexdobrenko an hour ago

mnicky 2 hours ago

postalcoder 2 hours ago

Squarex 2 hours ago

pjs_ 3 hours ago

onlyrealcuzzo 3 hours ago

sailingparrot 2 hours ago

onlyrealcuzzo 2 hours ago

mnicky 2 hours ago

onlyrealcuzzo 2 hours ago

wing-_-nuts 2 hours ago

whism 2 hours ago

Handy-Man 2 hours ago

zozbot234 3 hours ago

arcanemachiner 3 hours ago

azinman2 3 hours ago

speedgoose 2 hours ago

moralestapia 3 hours ago

dalemhurley 2 hours ago

vimda an hour ago

latchkey 3 hours ago

p1esk 3 hours ago

zozbot234 3 hours ago

xnx 3 hours ago

latchkey 3 hours ago

spwa4 3 hours ago

latchkey 3 hours ago

femiagbabiaka 3 hours ago

xnx 3 hours ago

the_duke 3 hours ago

perdomon an hour ago

sam_goody 25 minutes ago

simonw an hour ago

lacoolj an hour ago

jryio 4 hours ago

zozbot234 4 hours ago

dehugger 4 hours ago

cheema33 4 hours ago

pertymcpert 2 hours ago

wahnfrieden 4 hours ago

nikkwong 4 hours ago

simonw 4 hours ago

aeyes 4 hours ago

simonw 3 hours ago

andai 3 hours ago

basilgohar 4 hours ago

simonw 3 hours ago

gamegoblin 4 hours ago

nikkwong 3 hours ago

gamegoblin 14 minutes ago

woah 3 hours ago

zem an hour ago

p1esk 3 hours ago

vel0city 2 hours ago

tsss 4 hours ago

wahnfrieden 4 hours ago

addaon 3 hours ago

arcanemachiner 3 hours ago

XCSme 4 hours ago

raw_anon_1111 3 hours ago

TheMuenster an hour ago

johnfn 3 hours ago

seunosewa 2 hours ago

girvo an hour ago