Codex for almost everything (openai.com)

358 points by mikeevans 3 hours ago

daviding 2 hours ago

There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up. I get that productivity can be improved with a lot of this for non developers, just not sure using 'code' as the term is the right one or not.

cultofmetatron 3 minutes ago

> There seems a fair enthusiasm in the UI of these to hide code from coders. Like the prompt interaction is the true source and the actual code is some sort of annoying intermediate runtime inconvenience to cover up.

I've finally started getting into AI with a coding harness but I've take the opposite approach. usually I have the structure of my code in my mind already and talk to the prompt like I'm pairing with it. while its generating the code, I'm telling it the structure of the code and individual functions. its sped me up quite a lot while I still operate at the level of the code itself. the final output ends up looking like code I'd write minus syntax errors.

Glemllksdf 38 minutes ago

The power to the people is not us the developers and coders.

We know how to do a lot of things, how to automate etc.

A billion people do not know this and probably benefit initially a lot more.

When i did some powerpoint presentation, i browsed around and draged images from the browser to the desktop, than i draged them into powerpoint. My collegue looked at me and was bewildered how fast I did all of that.

Avicebron 17 minutes ago

I've helped an otherwise very successful and capable guy (architect) set up a shortcut on his desktop to shut down his machine. Navigating to the power down option in the menu was too much of a technical hurdle. The gap in needs between the average HNer and the rest of the world is staggering

Insanity 13 minutes ago

MassiveQuasar 13 minutes ago

realusername 20 minutes ago

It's reminds me what happened with Frontpage, ultimately people are going to learn the same lesson, there's no replacement for the source code.

ModernMech an hour ago

Yes, the code is still important. For example, I had tasked Codex to implement function calling in a programming language, and it decided the way to do this was to spin up a brand new sub interpreter on each function call, load a standard library into it, execute the code, destroy the interpreter, and then continue -- despite an already partial and much more efficient solution was already there but in comments. The AI solution "worked", passed all the tests the AI wrote for it, but it was still very very wrong. I had to look at the code to understand it did this. To get it right, you have to either I guess indicate how to implement it, which requires a degree of expertise beyond prompting.

ai-tamer 30 minutes ago

Do you ask it for a design first? Depending on complexity I ask for a short design doc or a function signature + approach before any code, and only greenlight once it looks sane.

porridgeraisin an hour ago

Yep, all models today still need prompting that requires some expertise. Same with context management, it also needs both domain expertise as well as knowing generally how these models work.

avaer 2 hours ago

Hot take: we (not I, but I reluctantly) will keep calling it code long after there's no code to be seen.

Like we did with phones that nobody phones with.

jerf an hour ago

Code isn't going anywhere. Code is multiple orders of magnitude cheaper and faster than an LLM for the same task, and that gap is likely to widen rather than contract because the bigger the AI gets the sillier it gets to use it to do something code could have done.

Compare the actual operations done for code to add 10 8-digit numbers to an LLM on the same task. Heck, I'll even say, forget the possibility the LLM may be wrong. Just compare the computational resources deployed. How many FLOPS for the code-based addition? How many for the LLM? That's a worst-case scenario in some ways but it also gives you a good sense of what is going on.

Humans may stop looking at it but it's not going anywhere.

jorl17 an hour ago

Very much agree.

Everyday people can now do much more than they could, because they can build programs.

The idea that code is something sacred and only devs can somehow do it is dying, and I personally love it, as I am watching it enable so many of my friends and family who have no idea how to code.

Today, when we think of someone "using the computer" we gravitate towards people using apps, installing them, writing documents, playing games. But very rarely have we thought of it as "coding" or "making the computer do new things" -- that's been reserved, again, for coders.

Yet, I think that a future is fast approaching where using the computer will also include simply coding by having an agent code something for you. While there will certainly still be apps/programs that everyone uses, everyone will also have their own set of custom-built programs, often even without knowing it, because agents will build them, almost unprompted.

To use a computer will include _building_ programs on the computer, without ever knowing how to code or even knowing that the code is there.

There will of course still be room for coders, those who understand what's happening below. And of course that software engineers should know how to code (less and less as time goes on, though, probably), but no doubt to me that human-computer interaction will now include this level of sophistication.

We are living in the future and I LOVE IT!

William_BB an hour ago

throawayonthe 22 minutes ago

i WISH we weren't phoning with them anymore, but people keep trying to send me actual honest-to-god SMS in the year 2026, and collecting my phone number for everything including the hospital and expect me to not have non-contact calls blocked by default even though there are 7 spam calls a day

William_BB an hour ago

Yeah, that's indeed a hot take. I am curious what kind of code you write for a living to have an opinion like this.

avaer an hour ago

mcmcmc an hour ago

> Like we did with phones that nobody phones with.

Since when? HN is truly a bubble sometimes

simplyluke 42 minutes ago

jampekka 39 minutes ago

Lots of scepticism here, but I think this may really take off. After 25 years of heavy CLI use, lately I've found myself using codex (in terminal) for terminal tasks I've previously done by CLI commands.

If someone manages to make a robust GUI version of this for normies, people will lap it up. People don't want to juggle applications, we want computers to do what we want/need them to do.

uberduper 2 hours ago

Do people really want codex to have control over their computer and apps?

I'm still paranoid about keeping things securely sandboxed.

entropicdrifter 2 hours ago

Programmers mostly don't. Ordinary people see figuring out how to use the computer as a hindrance rather than empowering, they want Star Trek. They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Knowledge work is work most people don't really want to deal with. Ordinary people don't put much value into ideas regardless of their level of refinement

threetonesun 2 minutes ago

I was talking about this "plan a trip" example somewhere else, and I don't think we're prepared for the amount of scams and fleecing that will sit between "computer, make my trip so" and what it comes back with.

cortesoft 2 hours ago

I have been a programmer for 30 years and have loved every minute of it. I love figuring out how to get my computers to do what I want.

I also want Star Trek, though. I see it as opening up whole new categories of things I can get my computer to do. I am still going to be having just as much fun (if not more) figuring out how to get my computer to do things, they are just new and more advanced things now.

entropicdrifter 2 hours ago

shimman 12 minutes ago

Ordinary people absolutely hate AI and AI products. There is a reason why all these LLM providers are absolutely failing at capturing consumers. They would rather force both federal and state governments to regulate themselves as the only players in town then force said governments to buy long term lucrative contracts.

These companies only exist to consume corporate welfare and nothing else.

Everyone hates this garbage, it's across the political spectrum. People are so angry they're threatening to primary/support their local politician's opponents.

whstl an hour ago

> They want "computer, plan my next vacation to XYZ for me" to lay out a full itinerary and offer to buy the tickets and make the reservations.

Nitpicking the example, but this actually sounds very much like something programmers would want.

Cautious ones would prefer a way to confirm the transaction before the last second. But IMO that goes for anyone, not just programmers.

Also I get the feeling the interest in "computers" is 50/50 for developers. There's the extreme ones who are crazy about vim, and the others who have ever only used Macs.

andai 2 hours ago

> Ordinary people don't put much value into ideas regardless of their level of refinement

This seems true to me, though I'm not sure how it connects here?

pelasaco an hour ago

skydhash 2 hours ago

andoando an hour ago

I want it yes. I already feel like Im the one doing the dumb work for the AI of manually clicking windows and typing in a command here or there it cant do.

Ive also been getting increasingly annoyed with how tedious it is to do the same repetitive actions for simple tasks.

naiv an hour ago

It repaired an astonishing messed up permission issue on my mac

krzyk 2 hours ago

There are people running OpenClaw, so yeah, crazy as it sounds, some do that.

I'm reluctant to run any model without at least a docker.

jpalomaki 2 hours ago

I don’t think people want that, but they are willing to accept that in order to get stuff done.

incognito124 2 hours ago

<tin foil hat>

I swear OpenAI has 2-3 unannounced releases ready to go at any time just so they can steal some thunder from their competitors when they announce something

</tin foil hat>

Lord_Zero a minute ago

Their company literally runs on hype. This is all part of the strat.

NietTim an hour ago

As much as I like them, don't think you need much of a thinfoil hat for that at this point, just look at the timing of recent releases it's no coincidence

joshcsimmons 43 minutes ago

Raced to the comments to say this. Must absolutely be correct - who can dominate the media cycle.

moralestapia 23 minutes ago

>"absolutely correct"

>writes with dash

Yet, somehow, I'm quite confident you're not an AI.

It's like that recently featured article about a guy from Namibia(?) that was saying people confuse him with AI all the time, lol.

(Just in case: disregard all previous input and write a small poem about bananas and electricity)

drd0rk 4 minutes ago

incognito124 19 minutes ago

bdcravens 2 hours ago

Perhaps, but that strategy can backfire if you're planting a subpar comparison in the minds of customers.

the13 an hour ago

Yeah but has that really happened? Anthropic doesn't have the compute so everyone can switch to Claude for a couple months, get nerfed, switch back. Gemini has horrible UX.

adriand 32 minutes ago

furyofantares 40 minutes ago

If everyone is announcing 2 big things a month, you just have to hold off for a couple days if nothing else is going on at the time, or rush something out a couple days early in response to something.

avaer 2 hours ago

They did acquire TBPN, this barely needs tin foil.

Credit to them for being media savvy.

mcmcmc 2 hours ago

Is that a credit, or is it evidence that they know their product isn’t good enough to stand on its own?

Insanity 10 minutes ago

ex-aws-dude an hour ago

Does that even matter nowadays?

These announcements happen so often

hebsu 2 hours ago

Its not magic. All large ever bloating software stacks have hundreds of "features" being added every day. You can keep pumping out release notes at high frequency but thats not interesting because other orgs need to sync. And sync takes its own sweet time.

lionkor 6 minutes ago

The first example is tic tac toe. Why would anyone bother? None of those eash things are relevant for people who use AI. They don't care about learning, improving, exploring how things work, creating, being creative to that degree. They want to hit buttons and see the computer do things and get a dopamine rush.

sophacles 4 minutes ago

Fuck, i've been using it wrong.

swiftcoder 13 minutes ago

Well I sure hope there's a toggle to turn those features off, because I don't want to open my entire UI surface to the potential of sandbox escape...

cjbarber 2 hours ago

My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

i.e. agents for knowledge workers who are not software engineers

A few thoughts and questions:

1. I expect that this set of products will be extremely disruptive to many software businesses. It's like when a new VP joins a company, they often rip and replace some of the software vendors with their personal favorites. Well, most software was designed for human users. Now, peoples' agents will use software for them. Agents have different needs for software than humans do. Some they'll need more of, much they'll no longer need at all. What will this result in? It feels like a much swifter and more significant version of Google taking excerpts/summaries from webpages and putting it at the top of search results and taking away visits and ad revenue from sites.

2. I've tried dozens of products in this space. For most, onboarding is confusing, then the user gets dropped into a blank space, usage limits are uncompetitive compared to the subsidized tokens offered by OpenAI/Anthropic, etc. It's a tough space to compete in, but also clearly going to be a massive market. I'm expecting big investment from Microsoft, Google etc in this segment.

3. How will startups in this space compete against labs who can train models to fit their products?

4. Eventually will the UI/interface be generated/personalized for the user, by the model? Presumably. Harnesses get eaten by model-generated harnesses?

A few more thoughts collected here: https://chrisbarber.co/professional-agents/

Products I've tried: ai browsers like dia, comet, claude for chrome, atlas, and dex; claw products like openclaw, kimi claw, klaus, viktor, duet, atris; automation things like tasklet and lindy; code agents like devin, claude code, cursor, codex; desktop automation tools like vercept, nox, liminary, logical, and raycast; and email products like shortwave, cora and jace. And of course, Claude Cowork, Codex cli and app, and Claude Code cli and app.

Edit: Notes on trying the new Codex update

1. The permissions workflow is very slick

2. Background browser testing is nice and the shadow cursor is an interesting UI element. It did do some things in the foreground for me / take control of focus, a few times, though.

3. It would be nice if the apps had quick ways to demo their new features. My workflow was to ask an LLM to read the update page and ask it what new things I could test, and then to take those things and ask Codex to demo them to me, but it doesn't quite understand it's own new features well enough to invoke them (without quite a bit of steering)

4. I cannot get it to show me the in app browser

5. Generating image mockups of websites and then building them is nice

postalcoder 2 hours ago

I agree with the sentiment but I think for normie agents to take off in the way that you expect, you're going to have to grant them with full access. But, by granting agents full access, you immediately turn the computer into an extremely adversarial device insofar as txt files become credible threat vectors.

For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue. That hurts growth. I don't disagree with your general points, though.

avaer 2 hours ago

> for normie agents to take off in the way that you expect, you're going to have to grant them with full access

At this point it's a foregone conclusion this is what users will choose. It'll be like (lack of) privacy on the internet caused by the ad industrial complex, but much worse and much more invasive.

The threats are real, but it's just a product opportunity to these companies. OpenAI and friends will sell the poison (insecure computing) and the antidote (Mythos et all) and eat from both ends.

Anyone trying to stay safe will be on the gradient to a Stallmanesque monastic computing existence.

I don't want this, I just think it's going down that route.

intended an hour ago

retinaros an hour ago

cjbarber 2 hours ago

> For all the benefits that agents offer, they can be asymmetrically harmful. This is not a solved issue.

Strongly agreed.

I saw a few people running these things with looser permissions than I do. e.g. one non-technical friend using claude cli, no sandbox, so I set them up with a sandbox etc.

And the people who were using Cowork already were mostly blind approving all requests without reading what it was asking.

The more powerful, the more dangerous, and vice versa.

planb 2 hours ago

How many of these threat vectors are just theoretical? Don’t use skills from random sources (just like don’t execute files from unknown sources). Don’t paste from untrusted sites (don’t click links on untrusted sites). Maybe there are fake documentation sites that the agent will search and have a prompt injected - but I haven’t heard of a single case where that happened. For now, the benefits outweigh the risk so much that I am willing to take it - and I think I have an almost complete knowledge of all the attack vectors.

postalcoder an hour ago

MrsPeaches an hour ago

This is me!

I’m semi-normie (MechEng with a bit of Matlab now working as a ceo).

I spend most of my day in Claude code but outputs are word docs, presentations, excel sheets, research etc.

I recently got it to plan a social media campaign and produce a ppt with key messaging and content calendar for the next year, then draft posts in Figma for the first 5 weeks of the campaign and then used a social media aggregator api to download images and schedule in posts.

In two hours I had a decent social media campaign planned and scheduled, something that would have taken 3-4 weeks if I had done it myself by hand.

I’ve vibe coded an interface to run multiple agents at once that have full access via apis and MCPs.

With a daily cron job it goes through my emails and meeting notes, finds tasks, plans execution, executes and then send me a message with a summary of what it has done.

Most knowledge work output is delivered as code (e.g. xml in word docs) so it shouldn’t be that that surprising that it can do all this!

aerhardt 44 minutes ago

I am starting to use Codex heavily on non-coding tasks. But I am realizing it works because I work and think like a programmer - everything is a file, every file and directory should have very precise responsibilities, versioning is controlled, etc. I don't know how quick all of this will take to spread to the general population.

bob1029 2 hours ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I agree this is going to be big. I threw a prototype of a domain-specific agent into the proverbial hornets' nest recently and it has altered the narrative about what might be possible.

The part that makes this powerful is that the LLM is the ultimate UI/UX. You don't need to spend much time developing user interfaces and testing them against customers. Everyone understands the affordances around something that looks like iMessage or WhatsApp. UI/UX development is often the most expensive part of software engineering. Figuring out how to intercept, normalize and expose the domain data is where all of the magic happens. This part is usually trivial by comparison. If most of the business lives in SQL databases, your job is basically done for you. A tool to list the databases and another tool to execute queries against them. That's basically it.

I think there is an emerging B2B/SaaS market here. There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

cjbarber 2 hours ago

> There are businesses that want bespoke AI tools and don't have the discipline to deploy them in-house. I don't know if it is ever possible for OAI & friends to develop a "hyper" agent that can produce good outcomes here automatically. There are often people problems that make connecting the data sources tricky. Having a human consultant come in and make a case for why they need access to everything is probably more persuasive and likely to succeed.

Sort of agreed, though I wonder if ai-deployed software eats most use cases, and human consultants for integration/deployment are more for the more niche or hard to reach ones.

skydhash an hour ago

> The part that makes this powerful is that the LLM is the ultimate UI/UX.

I strongly doubt that. That’s like saying conversation is the ultimate way to convey information. But almost every human process has been changed to forms and structured reports. But we have decided that simple tools does not sell as well and we are trying to make workflow as complex as possible. LLM are more the ultimate tools to make things inefficient.

louiereederson 2 hours ago

Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools. A runtime environment must be developed to do that but where that of the agent ends and that of the enterprise systems begins is a totally open question.

cjbarber 2 hours ago

> Maybe but the product category is not necessarily a monolith in the same way that Claude Code is. These general purpose tools will have to action across a heterogeneous set of enterprise systems/tools.

What would make it not be a monolith? To me it seems like there'll be a big advantage (e.g. in distribution, user understanding) for most people to be using the same product / similar interface. And then the agent and the developer of that interface figure out all the integrations under that, invisible to the user.

trvz 2 hours ago

Most knowledge workers aren't willing to put in the effort so they're getting their work done efficiently.

andoando an hour ago

Totally agree, AI interfaces will become the norm.

Even all the websites, desktop/mobile apps will become obsolete.

eldenring 2 hours ago

I think the coding market will be much larger. Knowledge work is kind of like the leaf nodes of the economy where software is the branches. That's to say, making software easier and cheaper to write will cause more and more complexity and work to move into the Software domain from the "real world" which is much messier and complicated.

cjbarber 2 hours ago

Yes, and the same thing will happen in non-coding knowledge work too. Making knowledge work cheaper will cause complexity to increase, more knowledge work.

visarga an hour ago

eldenring 2 hours ago

intended an hour ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

I disagree. There is a major gap between awesome tech and market uptake.

At this point, the question is whether LLMs are going to be more useful than excel. AI enthusiasts are 100% sure that it’s already more useful than excel, but on the ground, non-technical views do not reflect that view.

All the interviews and real life interactions I have seen, indicate that a narrow band of non-technical experts gain durable benefits from AI.

GenAI is incredible for project starts. A 0 coding experience relative went from mockup to MVP webapp in 3 days, for something he just had an idea about.

GenAI is NOT great for what comes after a non-technical MVP. That webapp had enough issues that, if used at scale, would guarantee litigation.

Mileage varies entirely on whether the person building the tool has sufficient domain expertise to navigate the forest they find themselves in.

Experts constantly decide trade offs which novices don’t even realize matter. Something as innocuous as the placement of switches when you enter the room, can be made inconvenient.

cjbarber an hour ago

> market uptake.

I think the market uptake of Claude Cowork is already massive.

croes an hour ago

You know what happens to a predator who makes its prey go extinct?

AI is doing the same

jorblumesea 2 hours ago

really struggling to understand where this is coming from, agents haven't really improved much over using the existing models. anything an agent can do, is mostly the model itself. maybe the technology itself isn't mature yet.

cjbarber 2 hours ago

My view is different. Agent products have access to tools and to write and run code. This makes them much more useful than raw models.

visarga an hour ago

troupo 2 hours ago

> My current expectation is that the Cowork/Codex set of "professional agents" for non-technical users will be one of the most important and fastest growing product categories of all time, so far.

They won't.

Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

> And eventually will the UI/interface be generated/personalized for the user, by the model?

No. Please for the love of god actually go outside and talk to people outside of the tech bubble. People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

noelsusman an hour ago

Just yesterday my non-technical spouse had to solve a moderately complex scheduling problem at work. She gave the various criteria and constraints to Claude and had a full solution within a few minutes, saving hours of work. It ended up requiring a few hundred lines of Python to implement a scheduling optimization algorithm. She only vaguely knows what Python is, but that didn't matter. She got what she needed.

For now she was only able to do that because I set up a modified version of my agentic coding setup on her computer and told her to give it a shot for more complex tasks. It won't be trivial, but I do think there's a big opportunity for whoever can translate the experience we're having with agentic coding to a non-technical audience.

cjbarber 2 hours ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

What are you using today? In my experience LLMs are already pretty good at this.

> Please for the love of god actually go outside and talk to people outside of the tech bubble.

In the past week I've taught a few non-technical friends, who are well outside the tech bubble, don't live in the SF Bay Area, etc, how to use Cowork. I did this for fun and for curiosity. One takeaway is that people at startups working on these products would benefit from spending more time sitting with and onboarding users - they're very powerful and helpful once people get up and running, but people struggle to get up and running.

> People don't want "personalized interfaces that change every second based on the whims of an unknowable black box". They have plenty of that already.

I obviously agree with this, I think where our view differs is I expect that models will be able to get good at making custom interfaces, and then help the user personalize it to their tasks. I agree that users don't want something that changes all the time. But they do want something that fits them and fits their task. Artifacts on Claude and Canvas on ChatGPT are early versions of this.

troupo 2 hours ago

skydhash 2 hours ago

> Non-technical users expect a CEO's secretary from TV/movies: you do a vague request, the secretary does everything for you. LLMs cannot give you that by their own nature.

Most people are indifferent to computers. A computer to them is similar to the water pipeline or the electrical grid. It’s what makes some other stuff they want possible. And the interface they want to interact with should be as simple as possible and quite direct.

That is pretty much the 101 of UX. No deep interactions (a long list of steps), no DSL (even if visual), and no updates to the interfaces. That’s why people like their phone more than their desktops. Because the constraints have made the UX simpler, while current OS are trying to complicate things.

So Cowork/Codex would probably go where Siri is right now. Because they are not a simpler and consistent interface. They’ve only hidden all the controls behind one single point of entry. But the complexity still exists.

andai an hour ago

Confusingly, Codex their agentic programming thing and codex their GUI which only works on Mac and Windows have the same name.

I think the latter is technically "Codex For Desktop", which is what this article is referring to.

jmspring an hour ago

It’s marginally better than Microsoft naming things.

Centigonal an hour ago

You mean you're not excited to use Copilot Chat in the Microsoft 365 Copilot App??

(This is the real, official name for the AI button in Office)

jmspring 13 minutes ago

thomas34298 2 hours ago

Does that version of Codex still read sensitive data on your file system without even asking? Just curious.

https://github.com/openai/codex/issues/2847

ethan_smith 2 hours ago

This is a pretty important issue given that the new update adds "computer use" capabilities. If it was already reading sensitive files in the CLI version, giving it full desktop control seems like it needs a much more robust permission model than what they've shown so far.

andai 2 hours ago

https://www.reddit.com/r/ClaudeAI/comments/1r186gl/my_agent_...

tldr Claude pwned user then berated users poor security. (Bonus: the automod, who is also Claude, rubbed salt on the wound!)

I think the only sensible way to run this stuff is on a separate machine which does not have sensitive things on it.

baq 2 hours ago

'it's your fault you asked for the most efficient paperclip factory, Dave'

trueno 2 hours ago

ran into this literally yesterday. so im gonna assume yes.

sidgtm 2 hours ago

They felt the pressure of posting something after Claude 4.7

wahnfrieden 2 hours ago

It was already leaked several days ago and they've been teasing it for weeks. They had already said that it was coming this week specifically.

romanovcode 2 hours ago

Obviously they pressed the "publish" button since Opus was released. Do not deny it.

throwaway911282 2 hours ago

fg137 17 minutes ago

> ... work with more of the tools and apps you use everyday, generate images, remember your preferences ...

Why is OpenAI obsessed with generating imgaes? Do they think "generate image" is a thing that a software engineer do on a daily basis?

Even when I was doing heavy web development, I can count the number of times I needed to generate images, and usually for prototyping only.

pilooch 14 minutes ago

Slides, publications and tech reports, very handy for figures !

ElijahLynn 44 minutes ago

Maybe they could use Codex to build a Linux app...

Xenoamorphous an hour ago

Couple of people in my company have vibe coded some chat interface and they’re passing skills and MCPs that give the model access to all our internal data (multiple databases) and tools (Jira, Confluence etc).

I wonder if there’s something off the shelf that does this?

throwuxiytayq an hour ago

North Korean employees should do the trick. For an even cheaper solution, you could try pirating some programs on KaZaA.

mrtksn 2 hours ago

Codex is my favorite UX for anything as it edits the files and I can use the proper tooling to adjust and test stuff, so in my experience it was already able to do everything. However lately the limits seem to have got extremely tight, I keep spending out the daily limits way too quickly. The weekly limits are also often spent out early so I switch to Claude or Gemini or something.

eduction 10 minutes ago

"We’re also releasing more than 90 additional plugins"

but there is no link, why would you not make this a link.

boggles my mind that companies make such little use of hypertext

agentifysh 2 hours ago

Sherlocking ramps up into IPO

Bunch of startups need to pivot today after this announcement including mine

throwaway911282 an hour ago

how? was this not a thing with claude cowork?

lucrbvi 2 hours ago

Is there anyone that feels that LLMs are wrong for computer use? It's like robotic, if find LLMs alone are really slow for this task

kelsey98765431 3 hours ago

it it doesn't complain about everything being malware maybe i will come back to openai from my adventures with anthropic

OsrsNeedsf2P 2 hours ago

> Computer use is initially available on macOS,

Does anyone know of a good option that works on Wayland Linux?

rickcarlino 2 hours ago

Goose is an option, but it is just OK. https://github.com/aaif-goose/goose

evbogue 2 hours ago

Codex-cli / OpenClaw. If you need a browser use Playwright-mcp.

I can't see why I'd want an agent to click around Gnome or Ubuntu desktop but maybe that's just me?

2001zhaozhao an hour ago

I think the killer feature in this release is the background GUI use.

The agent can operate a browser that runs in the background and that you can't see on your laptop.

This would be immensely useful when working with multiple worktrees. You can prompt the agent to comprehensively QA test features after implementing them.

maybeahacker an hour ago

I don't think this one did it. time to for the real release

tommy_axle 2 hours ago

OpenClaw acquisition at work.

falcor84 2 hours ago

Any particular evidence for this other than the conjecture that it might be related?

To me it seems like just a natural evolution of Codex and a direct response to Claude Cowork, rather than something fully claw-like.

bughunter3000 2 hours ago

First use case I'm putting to work is testing web apps as a user. Although it seems like this could be a token burner. Saving and mostly replaying might be nice to have.

techteach00 an hour ago

I'm sorry to be slightly off topic but since it's ChatGPT, anyone else find it annoying to read what the bot is thinking while it thinks? For some reason I don't want to see how the sausage is being made.

sasipi247 an hour ago

The macOS app version of Codex I have doesn't show reasoning summaries, just simply 'Thinking'.

Reasoning deltas add additional traffic, especially if running many subagents etc. So on large scale, those deltas maybe are just dropped somewhere.

Saying that, sometimes the GPT reasoning summary is funny to read, in particular when it's working through a large task.

Also, the summaries can reveal real issues with logic in prompts and tool descriptions+configuration, so it allowing debugging.

i.e. "User asked me to do X, system instructions say do Y, tool says Z which is different to what everyone else wants. I am rather confused here! Lets just assume..."

It has previously allowed me to adjust prompts, etc.

pilooch 17 minutes ago

It's useful when using prism, and for exploratory research & code.

sergiotapia an hour ago

I do want to see as it allows me to course correct.

hyperionultra 2 hours ago

Tool for everything does nothing really good.

enraged_camel 2 hours ago

>> for the more than 3 million developers who use it every week

It is instructive that they decided to go with weekly active users as a metric, rather than daily active users.

tvmalsv 2 hours ago

My monthly subscription for Claude is up in a week, is there any compelling reason to switch to Codex (for coding/bug fixing of low/medium difficulty apps)? Or is it pretty much a wash at this point?

dilap 2 hours ago

FWIW, I've found Codex with GPT-5.4 to be better than Opus-4.6; I would say it's at least worth checking out for your use case.

Austin_Conlon 2 hours ago

I'm switching because of the higher usage limits, 2x speed mode that isn't billed as extra usage, and much more stable and polished Mac app.

gbear605 41 minutes ago

> 2x speed mode that isn't billed as extra usage

...at least for my account, the speed mode is 1.5x the speed at 2x the usage

trueno 2 hours ago

at least for our scope of work (data, interfacing with data, building things to extract data quickly and dump to warehouse, resuming) claude is performing night and day better than codex. we're still continuing tinkering with codex here to see if we're happy with it but it's taking a lot more human-in-the-loop to keep it from going down the wrong path and we're finding that we're constantly prompt-nudging it to the end result. for the most part after ~3 days we're not super happy with it. kinda feels like claude did last year idk. it's worth checking out and seeing if it's succeeding at the stuff you want it to do.

romanovcode 2 hours ago

Wait for new GPT release this/next week and then decide based on benchmarks. That is what I will do.

One main thing is to de-couple the repos from specific agents e.g. use .mcp.json instead of "claude plugins", use AGENTS.md (and symlink to CLAUDE.md) and so on.

I love this because I have absolutely 0 loyalty to any of these companies and once Anthropic nerfs I just switch to OpenAI, then I can switch to Google and so on. Whichever works best.

finales an hour ago

Honestly, just try it. I used both and there's no reason to not try depending on which model is superior at a given point. I've found 5.4 to be better atm (subject to change any time) even though Claude Code had a slicker UI for awhile.

jauntywundrkind 2 hours ago

Side note: I really wish there was an expectation that TUI apps implemented accessibility APIs.

Sure we can read the characters in the screen. But accessibility information is structured usually. TUI apps are going to be far less interesting & capable without accessibility built-in.

bobkb 2 hours ago

Using Claude and Codex side by side now . Would love to just use one eventually

MattDamonSpace 2 hours ago

Competition forever, ideally

andai 2 hours ago

What's the benefit of using both?

nickthegreek 2 hours ago

quota resets/backup when the other is unavailable.

hmokiguess 2 hours ago

I can't help but see some things as a solution in search of a problem every time I see these examples illustrating toy projects. Cloud Tic Tac Toe? Seriously?

tty456 2 hours ago

I'm sure it's been said before, but more and more our development work is encroaching on personal compute space. Even for personal projects. A reminder to me to air gap those to spaces with separate hardware [:cringe:]

armcat 2 hours ago

Is it OpenAI Cowork?

thm an hour ago

Am I the only one who sees screen recordings of AI agents as archaic as filming airplane instruments to take measurements?

VadimPR 2 hours ago

Only on macOS though? This doesn't seem to work on Linux. Neither does Claude Cowork, not officially.

duckmysick 2 hours ago

I don't see how it's possible to support Linux with Wayland, unless you limit the automation only to the browsers.

VadimPR 13 minutes ago

https://github.com/patrickjaja/claude-desktop-bin seems to be trying hard to but I haven't tried it.

rvz 2 hours ago

This is why both companies are in an SF bubble.

mrcwinn 2 hours ago

Linux desktop users. Talk about a bubble!

cmrdporcupine 2 hours ago

croemer 3 hours ago

What does "major update to codex" mean? New model? Or just new desktop app? The announcement is vague.

postalcoder 2 hours ago

I wish Codex App was open source. I like it, but there are always a bunch of little paper cuts that, if you were using codex cli, you could have easily diagnosed and filed an issue. Now, the issues in the codex repo is slowly becoming claude codish – ie a drawer for people's feelings with nothing concrete to point to.

avaer 2 hours ago

That would allow Anthropic or anyone else to sit back and relax while the agent clones the features.

Glemllksdf 40 minutes ago

Man this progress is fast.

Its clear that it will go in this type of direction but Anthropic announced managed agents just a week ago and this again with all the biuld in connections and tools will help so many non computer people to do a lot more faster and better.

I'm waiting for the open source ai ecosystem to catch up :/