Go Optimization Guide (goperf.dev)
470 points by jedeusus 2 days ago
nopurpose 2 days ago
Every perf guide recommends to minimize allocations to reduce GC times, but if you look at pprof of a Go app, GC mark phase is what takes time, not GC sweep. GC mark always starts with known live roots (goroutine stacks, globals, etc) and traverse references from there colouring every pointer. To minimize GC time it is best to avoid _long living_ allocations. Short lived allocations, those which GC mark phase will never reach, has almost neglible effect on GC times.
Allocations of any kind have an effect on triggering GC earlier, but in real apps it is almost hopeless to avoid GC, except for very carefully written programs with no dependenciesm, and if GC happens, then reducing GC mark times gives bigger bang for the buck.
liquidgecka 2 days ago
Its worth calling out that abstractions can kill you in unexpected ways with go.
Anytime you use an interface it forces a heap allocation, even if the object is only used read only and within the same scope. That includes calls to things like fmt.Printf() so doing a for loop that prints the value of i forces the integer backing i to be heap allocated, along with every other value that you printed. So if you helpfully make every api in your library use an interface you are forcing the callers to use heap allocations for every single operation.
slashdev 2 days ago
I thought surely an integer could be inlined into the interface, I thought Go used to do that. But I tried it on the playground, and it heap allocates it:
masklinn 2 days ago
MarkMarine 2 days ago
Are you including in this analysis the amount of time/resources it takes to allocate? GC isn't the only thing you want to minimize for when you're making a high performance system.
nopurpose a day ago
From that perspective it boils down to "do less", which is what any perf guide already includes, allocations is just no different from anything else what app do.
My comment is more about "reduce allocations to reduce GC pressure" advice seen everywhere. It doesn't tell the whole story. Short lived allocation doesn't introduce any GC pressure: you'll be hard pressed to see GC sweep phase on pprof without zooming. People take this advice, spend time and energy hunting down allocations, just to see that total GC time remained the same after all that effort, because they were focusing on wrong type of allocations.
MarkMarine a day ago
aktau a day ago
Side note: see https://tip.golang.org/doc/gc-guide for more on how the Go GC works and what triggers it.
GC frequency is directly driven by allocation rate (in terms of bytes) and live heap size. Some examples:
- If you halve the allocation rate, you halve the GC frequency.
- If you double the live heap size, you halve the GC frequency (barring changes away from the default `GOGC=100`).
> ...but if you look at pprof of a Go app, GC mark phase is what takes time, not GC sweep.It is true that sweeping is a lot cheaper than marking, which makes your next statement:
> Short lived allocations, those which GC mark phase will never reach, has almost neglible effect on GC times.
...technically correct. Usually, this is the best kind of correct, but it omits two important considerations:
- If you generate a ton of short-lived allocations instead of keeping them around, the GC will trigger more frequently.
- If you reduce the live heap size (by not keeping anything around), the GC will trigger more frequently.
So now you have cheaper GC cycles, but many more of them. On top of that, you have vastly increased allocation costs.It is not a priori clear to me this is a win. In my experience, it isn't.
deepsun 20 hours ago
Interesting, thank you. But I think those points are not correlated that much. For example if I create unnecessary wrappers in a loop, I might double the allocation rate, but I will not halve the live heap size, because I did not have those wrappers outside the loop before.
Basically, I'm trying to come up with an real world example of a style change (like create wrappers for every error, or use naked integers instead of time.Time) to estimate its impact. And my feeling is that any such example would affect one of your points way more than the other, so we can still argument that e.g. "creating short-lived iterators is totally fine".
nopurpose a day ago
I enjoyed your detailed response, it adds value to this discussion, but I feel you missed the point of my comment.
I am against blanket statements "reduce allocations to reduce GC pressure", which lead people wrong way: they compare libraries based on "allocs/op" from go bench, they trust rediculous (who allocates 8KB per iteration in tight loop??) microbenchmarks of sync.Pool like in the article above, hoping to resolve their GC problem. Spend considerabe amount of effort just to find that they barely moved a needle on GC times.
If we generalize then my "avoid long-lived allocations" or yours "reduce allocation rate in terms of bytes" are much more useful in practice, than what this and many other articles preach.
zmj 2 days ago
Pretty similar story in .NET. Make sure your inner loops are allocation-free, then ensure allocations are short-lived, then clean up the long tail of large allocations.
neonsunset 2 days ago
.NET is far more tolerant to high allocation traffic since its GC is generational and overall more sophisticated (even if at the cost of tail latency, although that is workload-dependent).
Doing huge allocations which go to LOH is quite punishing, but even substantial inter-generational traffic won't kill it.
kgeist 19 hours ago
The runtime also forces GC every 2 minutes. So yeah, a lot of long living allocations can stress the GC, even if you don't allocate often. That's why Discord moved from Go to Rust for their Read States server.
ncruces a day ago
The point is not to avoid GC entirely, but to reduce allocation pressure.
If you can avoid allocs in a hot loop, it definitely pays to do so. If you can't for some reason, and can use sync.Pool there, measure it.
Cutting allocs in half may not matter much, but if you can cut them by 99% because you were allocating in every iteration of a 1 million loop, and now aren't, it will make a difference, even if all those allocs die instantly.
I've gotten better than two fold performance increases on real code with both techniques.
zbobet2012 2 days ago
E.h., kind of. If you are allocating in a hot loop it's going to suck regardless. Object pools are really key if you want high perf because the general purpose allocator is way less efficient in comparison.
bboreham a day ago
Agree that mark phase is the expensive bit. Disagree that it’s not worth reducing short-lived allocations. I spend a lot of time analyzing Go program performance, and reducing bytes allocated per second is always beneficial.
felixge a day ago
+1. In particular []byte slice allocations are often a significant driver of GC pace while also being relatively easy to optimize (e.g. via sync.Pool reuse).
raggi 2 days ago
You might wanna look at a system profiler too, pprof doesn't show everything.
deepsun 20 hours ago
Interesting, and I think that is not specific to Go, other mark-and-sweep GCs (Java, C#) should behave the same.
Which means that creating short lived objects (like iterators for loops, or some wrappers) is ok.
ted_dunning 17 hours ago
Not entirely. Go still doesn't have a generational collector so high allocation rates cause more GC's that must examine long-lived objects.
As such, short-lived objects have little impact in Java (thank god for that!). They will have second order effects in Go.
int_19h 18 hours ago
It should be noted that in C#, at least, the standard pattern is to use value types for enumerators, precisely so as to avoid heap allocations. This is the case for all (non-obsolete) collections in the .NET stdlib - e.g. List<T>.Enumerator is a struct.
Capricorn2481 2 days ago
Aren't allocations themselves pretty expensive regardless of GC?
nu11ptr 2 days ago
Go allocations aren't that bad. A few years ago I benchmarked them at about 4x as expensive as a bump allocation. That is slow enough to make an arena beneficial in high allocation situations, but fast enough to not make it worth it most of the time.
aktau a day ago
epcoa 2 days ago
No. If you have a moving multi generational GC, allocation is literally just an increment for short lived objects.
burch45 2 days ago
pebal 2 days ago
nurettin a day ago
Is it worth making short lived allocations just to please the GC? You might just end up with too many allocations which will slow things down even more.
aktau a day ago
It is not. Please see my answer (https://news.ycombinator.com/item?id=43545500).
stouset a day ago
Checking out the first example—object pools—I was initially blown away that this is not only possible but it produces no warnings of any kind:
pool := sync.Pool{
New: func() any { return 42 }
}
a := pool.Get()
pool.Put("hello")
pool.Put(struct{}{})
b := pool.Get()
c := pool.Get()
d := pool.Get()
fmt.Println(a, b, c, d)
Of course, the answer is that this API existed before generics so it just takes and returns `any` (née `interface{}`). It just feels as though golang might be strongly typed in principle, but in practice there are APIs left and rigth that escape out of the type system and lose all of the actual benefits of having it in the first place.Is a type system all that helpful if you have to keep turning it off any time you want to do something even slightly interesting?
Also I can't help but notice that there's no API to reset values to some initialized default. Shouldn't there be some sort of (perhaps optional) `Clear` callback that resets values back to a sane default, rather than forcing every caller to remember to do so themselves?
ncruces a day ago
This is still strong typing, even it it's not static typing.
It's static vs. dynamic and strong vs. weak.
9rx a day ago
It is strong, static, and structural. But structural typing is effectively compile-time duck typing, so it is understandable that some might confuse it with dynamic typing.
masklinn 10 hours ago
zaphodias a day ago
While I think you're right (generics might be useful there), it's fairly easy to wrap the `sync` primitives such as `sync.Pool` and `sync.Map` into your specific use case.
Go is pretty strict about breaking changes, so they probably won't change the current implementations; maybe we'll see a v2 version, or maybe not. The more code you have, the more code you have to maintain, and given Go's backward-compatibility promises, that's a lot of work.
aktau a day ago
Upstream thinks a type-safer `sync.Pool` is a good idea too. It's being discussed in https://go.dev/issue/71076.
Someone a day ago
> While I think you're right (generics might be useful there), it's fairly easy to wrap the `sync` primitives such as `sync.Pool` and `sync.Map` into your specific use case.
That’s not a strong argument. You can easily (but sometimes tediously) wrap any API with one that (further) restricts what types you can use with it. Generics make it possible to avoid doing that work, and code you don’t write won’t have errors.
zaphodias a day ago
strangelove026 a day ago
Sync.map is meant to have poor performance I believe
PhilippGille a day ago
jlouis a day ago
It is fairly common your type system ends up with escape hatches allowing you to violate the type rules in practice. See e.g., OCaml and the function "magic" in the Obj module.
It serves as a way around a limitation in the type system which you don't want to deal with.
You can still have the rest of the code base be safe, as long as you create a wrapper which is.
The same can be said about having imperative implementations with functional interfaces wrapping said implementation. From the outside, you have a view of a system which is functionally sound. Internally, it might break the rules and use imperative code (usually for the case of efficiency).
stouset a day ago
Obviously every type system in practice has escape hatches. But I’ve never seen another staticly-typed language where you need to break out of the type system so regularly.
Go’s type system has your back when you’re writing easy stuff.
But it throws up its hands and leaves you to fend for yourself when you need to do nearly anything interesting or complex, which is precisely when I want the type system to have my back.
I should not have to worry (or worse, not worry and be caught off guard) that my pool of database connections suddenly starts handing back strings.
int_19h 18 hours ago
jfwwwfasdfs a day ago
A lot of languages have top types
tgv a day ago
You never programmed in Go, I assume? Then you have to understand that the type of `pool.Get()` is `any`, the wildcard type in Go. It is a type, and if you want the underlying value, you have to get it out by asserting the correct type. This cannot be solved with generics. There's no way in Java, Rust or C++ to express this either, unless it is a pool for a single type, in which case Go generics indeed could handle that as well. But since Go is backwards compatible, this particular construct has to stay.
> Also I can't help but notice that there's no API to reset values to some initialized default.
That's what the New function does, isn't it?
BTW, the code you posted isn't syntactically correct. It needs a comma on the second line.
gwd a day ago
> That's what the New function does, isn't it?
But that's only run when the pool needs to allocate more space. What GP seems to expect is that sync.Pool() would always return a zeroed structure, just as Golang allocation does.
I think Golang's implementation does make sense, as sync.Pool() is clearly an optimization you use when performance is an issue; and in that case you almost certainly want to only initialize parts of the struct that are needed. But I can see why it would be surprising.
> [any] is a type
It's typed the way Python is typed, not the way Rust or C are typed; so loses the "if it compiles there's a good chance it's correct" property that people want from statically typed languages.
I don't use sync.Pool, but it does seem like now that we have generics, having a typed pool would be better.
9rx a day ago
stouset a day ago
tgv a day ago
ignoramous a day ago
zaphodias a day ago
I assume they're referring to the fact that a Pool can hold different types instead of being a collection of items of only one homogeneous type.
eptcyka a day ago
Is there a time in your career where an object pool absolutely had to contain an unbounded set of types? Any time when you would try know at compile time the total set of types a pool should contain?
pyrale a day ago
> There's no way in Java, Rust or C++ to express this either
You make it look like it's a good thing to be able to express it.
There's no way in Java, Rust or C++ to express this, praised be the language designers.
As for expressing a pool value that may be multiple things without a horrible any type and an horrible cast, you could make an union type in Rust, or an interface in Java implemented by multiple concrete objects. Both ways would force the consumer to explicitly check the value without requiring unchecked duck typing.
int_19h 18 hours ago
tgv 20 hours ago
sophacles 21 hours ago
gf000 a day ago
How is it different than pre-generic Java?
Map/List<T> etc are erased to basically an array of Objects (or a more specific supertype) at compile-time, but you can still use the non-generic version (with a warning) if you want and put any object into a map/list, and get it out as any other type, you having to cast it as the correct type.
sapiogram a day ago
> You never programmed in Go, I assume?
You might want to step off that extremely high horse for a second, buddy. It's extremely reasonable to expect a type-safe pool that only holds a single type, since that's the most common use case.
inadequatespace an hour ago
Why doesn’t the compiler pack structs for you if it’s as easy as shuffling around based on type?
kevmo314 2 days ago
Zero-copy is totally underrated. Like the site alludes to, Go's interfaces make it reasonably accessible to write zero-copy code but it still needs some careful crafting. The payoff is great though, I've often been surprised by how much time is spent allocating and shuffling memory around.
jasonthorsness 2 days ago
I once built a proxy that translated protocol A to protocol B in Go. In many cases, protocol A and B were just wrappers around long UTF-8 or raw bytes content. For large messages, reading the content into a slice then writing that same slice into the outgoing socket (preceded and followed by slices containing the translated bits from A to B) made a significant improvement in performance vs. copying everything over into a new buffer.
Go's network interfaces and slices makes this kind of thing particularly simple - I had to do the same thing in Java and it was a lot more awkward.
roundup 2 days ago
Additionally...
- https://go101.org/optimizations/101.html
- https://github.com/uber-go/guide
I wish this content existed as a model context protocol (MCP) tool to connect to my IDE along w/ local LLM.
After 6 months or switching between different language projects, it's challenging to remember all the important things.
jigneshdarji91 2 days ago
Additionally... - https://www.uber.com/en-AU/blog/how-we-saved-70k-cores-acros...
This has saved Uber a lot of money on compute (I'm one of the devs). If your compute fleet is large and has memory to spare (stateless), performing dynamic GOGC tuning to tradeoff higher memory utilization for fewer GC events will save quite a lot of compute.
TechDebtDevin 2 days ago
Embedding those docs in your MCP server takes about 5 seconds with mcp-go's AddResource method
https://github.com/mark3labs/mcp-go/blob/main/examples/every...
donatj a day ago
Unpopular opinion maybe, but sync.Pool is so sharp, dangerous and leaky that I'd avoid using it unless it's your absolute last option. And even then, maybe consider a second server first.
infogulch a day ago
A new sync/v2 NewPool() is being discussed that eliminates the sharp edges by making it generic: https://github.com/golang/go/issues/71076
I haven't personally found it to be problematic; just keep it private, give it a default new func, and be cautious about only putting things in it that you got out.
nasretdinov 21 hours ago
I think in general people understand that sync.Pool introduces essentially an equivalent of unitialised memory (since objects aren't required to be cleaned up before returning them to the pool), and mostly use it for something like []byte, slicing it like buf[0:0] to avoid accidentally reading someone else's memory.
But the instrument itself is really sharp and is indeed kind of last resort
jrockway 2 days ago
GOMEMLIMIT has saved me a number of times. In containerized production, it's nice, because sometimes jobs are ephemeral and don't even do enough allocations to hit the memory limit, so you don't spend any time in GC. But it's saved me the most times in CI where golangci-lint or govulncheck can't complete without running out of memory on a kind-of-large CI machine. Set GOMEMLIMIT and it eventually completes. (I switched to nogo, though, so at least golangci-lint isn't a problem anymore.)
__turbobrew__ 10 hours ago
Calling mmap “zero copy” is generous. I guess we glaze over the whole page fault thing, or the fact that performance is heavily dependent on how much memory pressure the process is under.
This is the same n00b trap that derailed the llama.cpp project last year because people don’t understand how memory maps and paging works, and the tradeoffs.
dennis-tra a day ago
Can someone explain to me why the compiler can’t do struct-field-alignment? This feels like something that can easily be automated.
CamouflagedKiwi a day ago
Because the order of fields can be significant. It's very relevant for syscalls, and is observable via the reflect package; it'd be strange if the field order was arbitrarily changed (and might change further between releases).
I assume the thinking was that this is pretty easy to optimise if you care, and if it's on by default there'd then have to be some opt-out which there isn't a good mechanism for.
9rx a day ago
> and if it's on by default there'd then have to be some opt-out which there isn't a good mechanism for.
Good is subjective, but the mechanism is something already implemented: https://pkg.go.dev/structs#HostLayout
CamouflagedKiwi 3 hours ago
kbolino a day ago
In particular, struct field alignment matches C (even without cgo) and so any change to the default would break a lot of code.
9rx a day ago
int_19h 18 hours ago
masklinn 10 hours ago
It can. Rust does.
That requires a way to opt out tho, because there are situations where you need a specific field ordering, so now the langage needs to provide way to tune struct compilation behaviour.
9rx a day ago
Like the answer to all "Why doesn't Go have X?" questions: Lack of manpower. There has been some work done to support it, but is far from complete. Open source doesn't mean open willingness to contribute, unfortunately. Especially when you're not the cool kid on the block.
parhamn 2 days ago
Noticed the object pooling doc, had me wondering: are there any plans to make packages like `sync` generic?
arccy 2 days ago
eventually: https://github.com/golang/go/issues/71076
neillyons a day ago
Curious to know what people are building where you need to optimise like this? eg Struct Field Alignment https://goperf.dev/01-common-patterns/fields-alignment/#avoi...
dundarious a day ago
False sharing is an absolutely classic Concurrency 101 lesson, nothing remarkable about it.
kubb a day ago
Something that shouldn’t be written in a GC language.
Cthulhu_ a day ago
GC is not relevant in this case, it's about whether you can make structs fit in cache lines and CPU registers. Mechanical sympathy is the googleable phrase. GC is a few layers further away.
piokoch a day ago
I don't think GC has anything to do here, doing manual memory allocation we might hit the same problem.
EdwardDiego 2 days ago
Huh, this surprises me about Golang, didn't realise it was so similar to C with struct alignment. https://goperf.dev/01-common-patterns/fields-alignment/#why-...
Cthulhu_ a day ago
Yup, it's a fairly low-level language intended as a replacement to C/C++ but for modern day systems (networked, concurrent, etc). You don't have manual memory management per se but you still need to decide on heap vs stack and consider the hardware.
jerf a day ago
"you still need to decide on heap vs stack"
No, you can't decide on heap vs stack. Go's compiler decides that. You can get feedback about the decision if you pass the right debug flags, and then based on that you may be able to tickle the optimizer into changing its mind based on code changes you make, but it'll always be an optimization decision subject to change without notice in any future versions of Go, just like any other language where you program to the optimizer.
If you need that level of control, Go is generally not the right language. However, I would encourage developers to be sure they need that level of control before taking it, and that's not special pleading for Go but special pleading for the entire class of "languages that are pretty fast but don't offer quite that level of control". There's still a lot of programmers running around with very 200x ideas of performance, even programmers who weren't programmers at the time, who must have picked it up by osmosis.
(My favorite example to show 200x perf ideas is paginated APIs where the "pages" are generally chosen from the set {25, 50, 100} for "performance reasons". In 2025, those are terribly, terribly small numbers. Presenting that many results to humans makes sense, but my default size for paginating API calls nowadays is closer to 1000, and that's the bottom end, for relatively expensive things. If I have no reason to think it's expensive, tack another order of magnitude on to my minimum.)
fmstephe 15 hours ago
jensneuse 2 days ago
You can often fool yourself by using sync.Pool. pprof looks great because no allocs in benchmarks but memory usage goes through the roof. It's important to measure real world benefits, if any, and not just synthetic benchmarks.
makeworld 2 days ago
Why would Pool increase memory usage?
jensneuse a day ago
Let's say you have constantly 1k requests per second and for each request, you need one buffer, each 1 MiB. That means you have 1 GiB in the pool. Without a pool, there's a high likelihood that you're using less. Why? Because in reality, most requests need a 1 MiB buffer but SOME require a 5 MiB buffer. As such, your pool grows over time as you don't have control over the distribution of the size of the pool items.
So, if you have predictable object sizes, the pool will stay flat. If the workloads are random, you have a new problem because, like in this scenario, your pool grows 5x more.
You can solve this problem. E.g. you can only give back items into the pool that are small enough. Alternatively, you could have a small pool and a big pool, but now you're playing cat and mouse.
In such a scenario, it could also work to simply allocate and use GC to clean up. Then you don't have to worry about memory and the lifetime of objects, which makes your code much simpler to read and reason about.
jerf a day ago
theThree 21 hours ago
xyproto 2 days ago
I guess if you allocate more than you need upfront that it could increase memory usage.
throwaway127482 2 days ago
nopurpose a day ago
also no one GCs sync.Pool. After a spike in utilization, live with increased memory usage until program restart.
ncruces a day ago
That's just not true. Pool contents are GCed after two cycles if unused.
nopurpose a day ago
nikolayasdf123 2 days ago
nicely organised. I feel like this could grow into community driven current state-of-the-art of optimisation tips for Go. just need to allow people edit/comment their input easily (preferably in-place). I see there is github repo, but my bet people would not actively add their input/suggestions/research there, it is hidden too far from the content/website itself
whalesalad a day ago
For sure. Feels like the broader dev community could use a generic wiki platform like this, where every language or toolkit can have its own section. Not just for performance/optimization, but also for idiomatic ways to use a language in practice.
kunley a day ago
"Although the struct Data contains a [1024]int array, which is 4 KB (assuming int is 4 bytes on the architecture used)"
Huh,what?
I mean, who uses 32b architecture by default?
bombela a day ago
Most C/C++ compilers have 32b int on 64b arch. Maybe the confusion comes from that.
Also it would be 4KiB not 4KB.
_345 2 days ago
Anyone know of a resource like this but for Python 3?
asicsp a day ago
This might help: https://pythonspeed.com/datascience/
nikolayasdf123 2 days ago
nice article. good to see statements backed up by Benchmarks right there
ljm 2 days ago
You're not really writing 'Go' anymore when you're optimising it, it's defeating the point of the language as a simple but powerful interface over networked services.
jrockway 2 days ago
Why? You have control over the parts where control yields noticeable savings, and the rest just kind of works with reasonable defaults.
Taken to the extreme, Go is still nice even with constraints. For example, tinygo is pretty nice for microcontroller projects. You can say upfront that you don't want GC, and just allocate everything at the start of the program (kind of like how DJB writes C programs) and writing the rest of the program is still a pleasant experience.
ashf023 a day ago
100%. I work in Go and use optimizations like the ones in the article, but only in a small percentage of the code. Go has a nice balance where it's not pessimized by default, and you can just write 99% of code without thinking about these optimizations. But having this control in performance critical parts is huge. Some of this stuff is 10x, not +5%. Also, Go has very good built-in support for CPU and memory profiling which pairs perfectly with this.
emmelaich 2 days ago
I think you have a point that there's generic advice for optimising: don't.
i.e. Make it simple, then measure, then make it fast if necessary.
Perhaps all this is understood for readers of the article.
Cthulhu_ a day ago
What do you mean? If you don't want that level of control over e.g. memory allocation, registries, cache lines etc, there's higher level languages than Go you can pick from, e.g. Java / C# / JS.
mariusor a day ago
I think at least some of the patterns shared in the document, using zero-copy, ordering struct properties are all very idiomatic. Writing code in this manner is writing good Go code.