LLMs Are Complicated Now (ianbarber.blog)

125 points by matt_d 16 hours ago

vivzkestrel 6 minutes ago

- with all due respect, from a ux perspective, could you kindly add a page where i can see just the titles of all your blog posts

- https://ianbarber.blog/blogroll

- https://ianbarber.blog/archive

- https://ianbarber.blog/blog

- https://ianbarber.blog/posts

- none of the above links work

- i really dont want to scroll 200 pages just to see what your blog articles are

MarkSweep a minute ago

smasher164 2 minutes ago

lol yeah I guess the best move right now is to fetch their /feed and iterate through <post>s

jordanb 3 hours ago

It's the bitter-lesson to feature-engineering lifecycle.

When a technique or technology is new people are making massive gains by just applying it to some use case, or gathering more data for training, or giving it more resources.

As time goes on those "bitter lesson" gains start to hit the shallow part of the logistic curve and companies have to start investing more and more effort into engineering for each small, incremental gain.

zahlman 10 minutes ago

I assume the choice of phrase "bitter lesson" is intentional irony (since the original concept is that you get better results by just scaling up and not trying to be clever with domain-specific knowledge)?

sdenton4 an hour ago

I got a very different message from this, actually much closer to the problem of incumbent advantage.

The known-good thing has been heavily optimized for performance, making it much harder for new technologies to prove that they are better. This is similar to the problem of gas vs electric engines - we had a century of optimization and ecosystem development around gas engines, which creates an uphill battle for electric motors even though they are (eventually) superior on every way /except/ having that massive ecosystem.

The problem isn't as bad here, because software is much more flexible than hardware, and scaling laws give a reasonable way to try things out at smaller scale before going whole hog.

pezo1919 2 hours ago

Well put, thanks.

truvem 2 hours ago

One thing that makes LLMs complicated in production is that they're stateless — every call starts from zero. The complexity compounds when you need agents to maintain context across sessions and models. That's a layer that's largely missing from most stacks today.

ffsm8 an hour ago

If you think statefull LLMs would be easier to handle then stateless... Then I think you haven't done a lot of software engineering

zsyllepsis 14 minutes ago

Maybe a charitable reading of the parent comment, but my interpretation of it was that while the _models_ are stateless, modern deployments of these models for inference rely on state.

For example, tiered pricing for cached context relies on state, even if the models don’t.

zahlman 9 minutes ago

random3 10 minutes ago

This. lol. If you think state makes things easier you're in for a big surprise.

tossandthrow an hour ago

That does not seem to be related to llms? It is more about the harness that utilizes them, right?

talkin an hour ago

It costs tokens, so it helps the business model, so it’s not a bug but a feature.

charcircuit 6 hours ago

Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.

https://sebastianraschka.com/llm-architecture-gallery/?compa...

If you look at it, the diagrams are very similar, but the main differences are that the feedforward is replaced with a MoE (router to multiple feedforwards) and the model has a different attention implementation.

segmondy 4 hours ago

The author is correct, the model architecture is now much more complicated. You can see this if you use llama.cpp and follow the project. The earlier models were always fully implemented. Yet with more contributors, as of today tons of latest models only have partial implementation. DeepSeekv3.2 isn't fully implemented, same with KimiK2.6, GLM5.2+, DeepSeekv4 has no implementation, MiniMaxM3 not supported yet, Hy3-preview no implementation. The latest models are just bare bones to run with lots of support missing for the advanced features.

KerrAvon 11 minutes ago

indeed, there's even a (pretty solid) custom server just for DS4 https://github.com/antirez/ds4

-- works very well on high-RAM Macs

embedding-shape 3 hours ago

> Why didn't this author compare Llama 3 with GLM 5.2 (released 1 week ago) which is a more standard attention based LLM? To compare 2 separate families of LLMs and then pointing out that they are different is not a surprising result and detracts from the point the author is trying to make.

The entire point of the comparison is that LLMs look vastly different today than before. Comparing more similar LLMs would detract from the point I thought the author was trying to make.

alecco 6 hours ago

Yeah, not a great apples-to-apples comparison.

I think the point stands: MoE, a myriad of complex attention approaches, shared layers, you name it. And making it all work together well is a huge trial-and-error pain even for small models, never mind getting to efficient hardware utilization.

lproven 6 hours ago

> If you look at it, the diagrams are very similar,

The page links to the same site you do. No wonder it is similar -- the source is the same!

charcircuit 6 hours ago

The source is the same in the original article too. He is using a different diagram from the same site on the right to justify his point on how much more complicated things have become.

christopherwxyz 6 hours ago

It’s written by AI.

Philpax 4 hours ago

I am _very_ familiar with Claudish, and to some extent, the other AIs' writing styles. This article is human-written and features human writing quirks.

The very first sentence

> Back in 2022 and 2023 there were two big branches of machine learning happening at Meta.

is unmistakably human. That's not how a LLM would phrase this sentence, and if it did, it would have put a comma after 2023.

fightforcause 3 hours ago

lproven 6 hours ago

[[citation needed]]

I am a professional writer and have been for over 30 years. (I do not use any form of LLM ever.) This means I read a lot. This also means that I have 30+ years of experience of readers not understanding what I wrote, or not getting further than the title, or not getting the main message, or inverting it in their heads, or inserting their own message and then complaining when I diverge, and an endless list of Ways People Do Not Get It.

I am also a trained TESOL teacher. Ability to capture gist is a skill we test for and measure, and many, maybe the majority, of native speakers don't have it and don't know.

In recent years I constantly see people going "this is written by AI" and I have yet to see a single of of them able to coherently prove their point. It's all just feelings and hunches.

So I am calling you on this:

How do you know? Show your working. Demonstrate your case.

ekidd 5 hours ago

sowbug 29 minutes ago

hnhg 5 hours ago

girvo 5 hours ago

skydhash 5 hours ago

jddj 6 hours ago

Highly doubtful

alecco 6 hours ago

Grammarly and GPTZero say 0% AI.