Unified Controllable and Faithful Text-to-CAD Generation with LLMs (arxiv.org)

55 points by PaulHoule 7 hours ago

avaer 5 hours ago

Text to CAD doesn't need papers. You can literally just try it and see that it works well with the frontier models. If you want reification/meshing I recommend [1] which is what Godot uses. You can throw the results in a physics engine in an afternoon and see for yourself.

This wasn't obvious a year ago, but today CAD literally reduces to Simon Wilson's pelican test, since CAD is largely a matter of functional CSG, and CSG is really not that different from SVG. It's just one more dimension, which it turns out is not a problem.

LLMs consistently one-shot CSG based video game levels with interesting physics puzzles (citing myself). Given this I'm willing to conclude that the frontier models are good at automated CAD if given the correct harness. But I guess a lot of people don't know this yet.

[1] https://github.com/elalish/manifold

willis936 4 hours ago

I got a freecad API script for a catfood colander / dust sieve out of gemini / cline in a single prompt supplied with 10 measurements and a good description of the vision. Printed first try, the two halves mated without issue, and the fit was what I pictured. Under a certain threshold of complexity and given sufficiently specific instructions it works. I've also spent many hours on more complicated tasks with little success.

VTimofeenko 2 hours ago

Same, I had reasonable success with LLM-generated OpenSCAD files fed into prusa slicer. Just needs a few manual tweaks for spatial transforms from time to time.

PaulHoule 5 hours ago

Personally I can complain to no end about arXiv papers being insufficiently rigorous, especially in the text classification area that I've been interested in since 2005 or so.

Show me the blog posts where people talk about the results they got "vibe coding" and those arXiv papers look great in comparison!

There is the insidious thing with LLMs is that they can get the general shape of something right but that thing will not be useful if the last 1% is wrong. It might be that the operator sees the problem and fixes it, but it may also be that the LLM hypnotizes the operator into not seeing the errors and gaps.

I know there are many sorts of problems where I've had good experiences with LLMs but I know other people have had bad experiences and some of it might be my skill but some of it is just plain luck.

hadlock 4 hours ago

I've had pretty good success with using LLMs to generate basic shapes using python and cadquery (which generates real parametric step files you can edit in fusion, not glorified triangulated STLs). Yesterday I had GPT5.5 build a python script to generate fender style Mandolin with separate neck and body, with correct bolt holes for the neck, gibson style bridge, and stop-tail, even fretboard with the little dots and cutouts for the fret wire (I didn't ask for these, but it added them anyways). Everything looks correct and to scale. These should generate (after pip install cadquery) a .step and .stl which you can open in something like PrusaSlicer or Fusion:

Neck: https://pastebin.com/Sg3LmmUq Body: https://pastebin.com/FE9nikYB

edit: screenshot, too https://i.imgur.com/FZGyyVO.png

vmbm 3 hours ago

I am interested. Every few months, I loop back to using LLM's for this type of task but have always had fairly mixed results. Not sure if it is my prompt, model choice, or the part itself being too complex. And I haven't had the time to really dig into why things aren't working out. But would be nice to find a workflow that gets good results as I regularly 3D print stuff for hobby projects but find 3D modeling to be the most tedious and time intensive task.

ActorNightly an hour ago

My general goto for tasks that are n level complex is to have the agent store summaries after every generation. I do this for interacting with websites - Ill sit there and type text for the agent to correctly inject js to do something on a website, and every iteration is asyncronously writes a history in a background thread of what it has done and what the result was. On every invocation, it injects that context.

nancyminusone 5 hours ago

I don't think I've ever heard a mechanical engineer say "torus" in my life unless they're talking about the car. When you are doing feedback with a human operator you use terms like "make this thicker" or "rotate that this way" while pointing at them. Text does not have this.

zonkerdonker 3 hours ago

The first image on the actual paper really tells the whole story. CAD for mechanical design, by necessity, requires pretty immense specificity. It is more onerous to type out "now raise the height of the torus relative to the base 4mm" than to click on "extrude" and type 4 or drag a handle.

Injecting a natural language layer into the workflow is just not optimal. CAD itself is not a difficult tool to learn and use effectively. There are essentially no layers of abstraction that an LLM can assist in cutting through, and no obfuscated rules or languages to learn.

I think of it this way. If there was someone sitting at my computer, and I had to do all of my CAD design by explaining what I wanted them to do verbally, I'd rip out my hair.

LLMs are doing for programmers what virtual CAD did to the drafter 35+ years ago, optimizing the effort expanded to create the thing already in your brain

8note 24 minutes ago

virtual cad did something major compared to drafting - you switched from only being able to represent certain views of an object, to having a stored full representation of the object.

you could sculpt a model à la ILM for star wars or the architecture models, but the only way to have a copy of the object was to make one.

the virtual cad also brought in the ability to do analysis with FEA and get approximately smooth undestandings of the stress and strain on the piece, rather than manually calculating the critical points and stress raisers for doing analysis

coldtea 44 minutes ago

>I think of it this way. If there was someone sitting at my computer, and I had to do all of my CAD design by explaining what I wanted them to do verbally, I'd rip out my hair.

I, on the other hand, have used LLM + OpenSCAD to design stuff - while I pulled my hair out everytime I had to sit and write OpenSCAD primitives or use a UI CAD like FreeCAD or Fusion360 and their horrible unintuitive interfaces.

abdullahkhalids 2 hours ago

> If there was someone sitting at my computer, and I had to do all of my CAD design by explaining what I wanted them to do verbally, I'd rip out my hair.

Isn't this how a lot of machine shops operate, or how things operate internally in larger manufacturing factories? Customer/person-from-different-team comes in and explains what they want to do. Maybe they have some sketches or pictures of similar parts. Then there is back and forth with the CAD guy to build the thing.

One critical difference is that the CAD guy is usually smart and you have to explain to them things at a more high level, along with some written down hard numbers that need to be obeyed.

rehevkor5 2 hours ago

While you're probably right about >90% of situations that fluent CAD users face, I think you might be suffering from a lack of imagination about situations where an LLM could help do work which would otherwise be tedious or mistake-prone. And then you have the non-fluent CAD users, just like the non-programmers who are now vibe-coding: this stuff can be a game changer for them even if it's far from "good" right now.

rehevkor5 2 hours ago

> Text does not have this.

Not in an informal way. But from a technical perspective, of course it does: serialize the feature steps to text or to code, job done.

contingencies an hour ago

I'd be keen to try this but can't see a URL.

nisegami 4 hours ago

The only time text-to-SCAD has ever worked for me was when Claude (the app, not even Claude Code) decided spontaneously to spin up an environment and check the renderings of its SCAD code. That session lead to something 90-95% of the way to the finished product, and in some ways even surpassing my expectations by looking up measurements for relevant products instead of using placeholders.

Modifying the prompt and then trying it again did not lead to that self-verification loop and the output was unusable garbage.

hadlock 3 hours ago

GPT5.5 one-shotted an entire mandolin for me. Like, the whole thing, ready to 3d print or CNC. Well, two shotted, the fender style neck came out so good (I didn't think it could do it) I asked for the body and it made the matching body with correct neck pocket etc. with bolt holes. SCAD is really rough, I agree, but cadquery is great for this sort of thing for whatever reason. I linked the pastebin upstream in this same comments section if you want to check it out.