Show HN: ClickStack – Open-source Datadog alternative by ClickHouse and HyperDX (github.com)
224 points by mikeshi42 a day ago
Hey HN! Mike & Warren here from HyperDX (now part of ClickHouse)! We’ve been building ClickStack, an open source observability stack that helps you collect, centralize, search/viz/alert on your telemetry (logs, metrics, traces) in just a few minutes - all powered by ClickHouse (Apache2) for storage, HyperDX (MIT) for visualization and OpenTelemetry (Apache2) for ingestion.
You can check out the quick start for spinning things up in the repo here: https://github.com/hyperdxio/hyperdx
ClickStack makes it really easy to instrument your application so you can go from bug reports of “my checkout didn’t go through” to a session replay of the user, backend API calls, to DB queries and infrastructure metrics related to that specific request in a single view.
For those that might be migrating from Very Expensive Observability Vendor (TM) to something open source, more performant, and doesn’t require extensive culling of retention limits and sampling rates - ClickStack gives a batteries-included way of starting that migration journey.
For those that aren’t familiar with ClickHouse, it’s a high performance database that has already been used by companies such as Anthropic, Cloudflare, and DoorDash to power their core observability at scale due to its flexibility, ease of use, and cost effectiveness. However, this required teams to dedicate engineers to building a custom observability stack, where it’s difficult to not only get their telemetry data easily into ClickHouse but also struggling without a native UI experience.
That’s why we’re building ClickStack - we wanted to bundle an easy way to get started ingesting your telemetry data whether it’s logs & traces from Node.js or Ruby to metrics from Kubernetes or your bare metal infrastructure. Just as important we wanted our users to enjoy a visualization experience that allowed users to quickly search using a familiar lucene-like search syntax (similar to what you’d use in Google!). We recognise though, that a SQL mode is needed for the most complex of queries. We've also added high cardinality outlier analysis by charting the delta between outlier and inlier events - which we've found really helpful in narrowing down causes of regressions/anomalies in our traces as well as log patterns to condense down clusters of similar logs.
We’re really excited about the roadmap ahead in terms of improving ClickStack as a product and the ClickHouse core database to improve observability. Would love to hear everyone’s feedback and what they think!
Spinning up a container is pretty simple: `docker run -p 8080:8080 -p 4317:4317 -p 4318:4318 docker.hyperdx.io/hyperdx/hyperdx-all-in-one` In browser live demo (no sign ups or anything silly, it runs fully in your browser!): https://play.hyperdx.io/ Landing Page: https://clickhouse.com/o11y Github Repo: https://github.com/hyperdxio/hyperdx Discord community: https://hyperdx.io/discord Docs: https://clickhouse.com/docs/use-cases/observability/clicksta...
theogravity 12 hours ago
This is really cool considering how expensive DataDog can get. I'm the author of LogLayer (https://loglayer.dev), which is a structured logger for TypeScript that allows you to use multiple loggers together. I've written transports that allows shipping to other loggers like pino and cloud providers such as DataDog.
I spent some time writing an integration for HyperDX after seeing this post and hope you can help me roll it out! Would love to add a new "integrations" section to my page that links to the docs on how to use HyperDX with LogLayer.
wrn14897 9 hours ago
Hey this looks awesome! We will take a look at it
readdit a day ago
I like and use HyperDX in production and like it a lot. So kudos to the team for building and merging with Clickhouse. I found a lot of monetary value switching over to HyperDX considering it's significantly more cost efficient for our needs.
Should we be starting to prepare for the original HyperDX product to be deprecated and potentially move over to ClickStack?
mikeshi42 19 hours ago
First off, always really excited to hear from our production users - glad to hear you're getting good value out of the platform!
HyperDX isn't being deprecated, you can probably see on the marketing page it's still really prominently featured as an integral part of the stack - so nothing changing there.
We do of course want to get users onto HyperDX v2 and the overall ClickStack pattern. This doesn't mean HyperDX is going away by any means - just that HyperDX is focused a lot more on the end-user experience, and we get to leverage the flexibility, learnings and performance of a more exposed ClickHouse-powered core which is the intent of ClickStack. On the engineering side, we're working on making sure it's a smooth path for both open source and cloud.
side note: weird I thought I replied to this one already but I've been dealing with spotty wifi today :)
HatchedLake721 17 hours ago
Still confused where HyperDX ends and where ClickStack starts.
Is HyperDX === ClickStack?
Is ClickStack = HyperDX + something closed source?
Is ClickStack just a cloud version of HyperDX?
Is it same thing, HyperDX, rebranded as ClickStack?
mikeshi42 14 hours ago
wiradikusuma 10 hours ago
Can I say it's similar to Signoz, in that both are ClickHouse-powered and available as both open-source and hosted versions? How are you guys different compared to Signoz?
(The UI looks similar too, although I guess a lot of observability tools seem to adopt that kind of UI).
oulipo 8 hours ago
Interested in a comparison between both too!
hosh 21 hours ago
I liked Otel for traces and maybe logging -- but I think the Otel metrics is over-engineered.
Does ClickStack have a way to ingest statsd data, preferably with Datadog extensions (which adds tagging)?
Does ClickStack offer correlations across traces, logging, and metrics via unified service tagging? Does the UI offer the ability to link to related traces, logging, and metrics?
Why does the Elixir sdk use the hyperdx library instead of the otel library?
Are Notebooks in the roadmap?
phillipcarter 21 hours ago
> but I think the Otel metrics is over-engineered.
What about OTel metrics is difficult?
You can set up receivers for other metrics sources like stasd or even the DD agent, so there's no need to immediately replace your metrics stack.
carefulfungi 20 hours ago
My foray into otel with aws lambda was not a success (about 6 months ago). Many of my issues were with the prom remote writer that I had to use. The extension was not reliable. Queue errors were common in the remote writer. Interop with Prometheus labels was bad. And the various config around delta and non-delta metrics was a bit of a mess. The stack I was using at least didn’t support exponential histograms. Got it to work mostly after days of fiddling but never reliably. Ripped it out and was happier. Maybe a pure OTEL stack would have been a much better experience than needing the prom remote writer - which I’d like to try in the future.
I’d certainly appreciate hearing success stories of OTEL + serverless.
cyberax 16 hours ago
One critical problem for me: no support for raw metrics.
Sometimes, you just want to export ALL of your metrics to the server and let it deal with histograms, aggregation, etc.
Another annoyance is the API, you can't just put "metrics.AddMeasurement('my_metric', 11)", you have to create a `Meter` (which also requires a library name), and then use it.
mikeshi42 20 hours ago
Great questions!
OTel Metrics: I get it, it's specified as almost a superset of everyone's favorite metric standards with config for push/pull, monotonic vs delta, exponential/"native" histograms, etc. I have my preferences as well which would be a subset of the standard but I get why a unifying standard needed to be flexible.
Statsd: The great thing about the OTel collector is that it allows ingesting a variety of different data formats, so you can take in statsd and output OTel or write directly to ClickHouse: https://github.com/open-telemetry/opentelemetry-collector-co...
We correlate across trace/span id as well as resource attributes. The correlation across logs/traces with span/trace id is a pretty well worn path across our product. Metrics to the rest is natively done via resource attributes and we primarily expose correlation for K8s-based workloads with more to come. We don't do exemplars _yet_ to solve the more generic correlation case for metrics (though I don't think statsd can transmit exemplars)
Elixir: We try to do our best to support wherever our users are, the OTel SDK and ours have continued to change in parallel over time - we'll want to likely re-evaluate if we should start pointing towards the base OTel SDK for Elixir. We've been pretty early on the OTel SDK side across the board so things continue to evolve, for example our Deno OTel integration came out I think over a year before Deno officially launched one with native HyperDX documentation <3
Notebooks: Yes, it should land in an experimental state shortly, stay tuned :) There's a lot of exciting workflows we're looking to unlock with notebooks as well. If you have any thoughts in this direction, please let me know. I'd love to get more user input ahead of the first release.
hosh 13 hours ago
Thank you. I saw a different thread about Otel statsd receiver, so that works out better. The last time I had looked into it, the otel metrics specs were very complex.
I think this is enough features for me to seriously take a look at it as a Datadog alternative.
atombender 18 hours ago
I'm looking for a new logging solution to replace Kibana. I have very good experience with ClickHouse, and HyperDX looks like a decent UI for it.
I'm primarily interested in logs, though, and the existing log shipping pipeline is around Vector on Kubernetes. Admittedly Vector has an OTel sink in beta, but I'm curious if that's the best/fastest way to ship logs, especially given that the original data comes out of apps as plain JSON rather than OTel.
The current system is processing several TB/day and needs fairly serious throughput to keep up.
mikeshi42 18 hours ago
Luckily ClickHouse and serious throughput are pretty synonymous. Internally we're at 100+PB of telemetry stored in our own monitoring system.
Vector supports directly writing into ClickHouse - several companies use this at scale (iirc Anthropic does exactly this, they spoke about this recently at our user conference).
Please give it a try and let us know how it goes! Happy to help :)
atombender 17 hours ago
Thanks! Very familiar with ClickHouse, but can logs then be ingested into CH without going through HyperDX? Doesn't HyperDX require a specific schema that the Vector pipeline would have to adapt the payloads to?
mikeshi42 17 hours ago
smetj 10 hours ago
I think settling to otel as transport/wire-format is an excellent strategic choice offering most possibilities towards the future. Two concerns less.
atombender 5 hours ago
I'm less concerned about the wire format than reducing complexity and bottlenecks in a high-volume, high-throughput system. Needing an intermediate API just to ingest into ClickHouse adds another step where things can slow down or break, not to mention that a gRPC API just to convert JSON payloads into INSERTs is quite wasteful if you can just insert directly.
JimDabell 8 hours ago
I’m not sure what this is intended to do, but when I created an account, I saw in the left sidebar a widget saying “Was this search result helpful?” with thumbs up and thumbs down buttons. I hadn’t searched for anything. I pressed the “Hide” button instead, and the widget changed to an “Any feedback?” button. I thought I would tell you about this weird bug, so I clicked the feedback button. The widget changed back into the “Was this search result helpful?” widget.
I found the UX very difficult to read. The monospace font, the unusually small text, the bold white and bright green text on a dark background… I found it a little more readable by changing the font to system-ui, but not by much. Please consider a more traditional style instead of leaning into the 80s terminal gimmick. This factor alone makes me want to not use it. It needs to be easy to read, not a pain to read.
codegeek a day ago
How are you different than Signoz, another YC company that also does Observability using clickhouse ?
mikeshi42 18 hours ago
Echoing the comment below, I guess one obvious thing is that we are a team at ClickHouse and an official first-party product on top. That translates into:
- We're flexible on top of any ClickHouse instance, you can use virtually any schema in ClickHouse and things will still work. Custom schemas are pretty important for either tuned high performance or once you're at a scale like Anthropic. This makes it also incredibly easy to get started (especially if you already have data in ClickHouse). - The above also means you don't need to buy into OTel. I love OTel but some companies choose to use Vector, Cribl, S3, a custom writing script, etc for good reasons. All of that is supported natively due to the various ClickHouse integrations, and naturally means you can use ClickStack/HyperDX in that scenario as well. - We also have some cool tools around wrangling telemetry at scale, from Event Deltas (high cardinality correlation between slow spans and normal spans to root cause issues) to Event Patterns (clustering similar logs or spans together automatically with ML) - all of these help users dive into their data in easier ways than just searching & charting. - We also have session replay capability - to truly unify everything from click to infra metrics.
We're built to work at the 100PB+ scale we run internally here for monitoring ClickHouse Cloud, but flexible enough to pin point specific user issues that get brought up once in a support case in an end-to-end manner.
There's probably a lot more I'm missing. Ultimately from a product philosophy standpoint, we aren't big believers in the "3 pillars" concept, which tends to manifest as 3 silos/tabs for "logs", "metrics", "traces" (this isn't just Signoz - but across the industry). I'm a big believer that we're building tools to unify and centralize signals/clues in one place and giving the right datapoint at the right time to the engineer. During an incident I just think about what's the next clue I can get to root cause an issue, not if I'm in the logging product or the tracing product.
oatsandsugar 21 hours ago
"You" here is ClickHouse
codegeek an hour ago
Yes but that is because they got acquired by Clickhouse. But my question still remains.
dustedcodes 5 hours ago
Very cool, reminds me of SigNoz.
How would I self host this in k8s? Would I deploy a ClickHouse cluster using the Altinity operator and then connect it using the HyperDX local mode or what is the recommended approach to self-host ClickStack?
regnerba 10 hours ago
We run a full Grafana stack (Loki, tempo, Prometheus, alloy agent, Grafana) and back out with self hosted S3 (we are all onprem physical hardware).
While I do like the stack we have, it is a lot of components to run and configure. Don’t think we have ever had any issues once it was up and running.
Does anyone have any thoughts about how this compares? We don’t have a huge amount of days, 1 month of metrics is about 200GB and logs isn’t a whole lot more, less than a TB I think for 2 weeks.
gigatexal 12 hours ago
Datadog is expensive this is true. But I have never felt it be slow. Speed is not its killer feature. It’s everything you can do with it once you have logs and or metrics flowing into it.
The dashboards and their creation are intuitive. Creating alerts and things from airflow logs is easy using their DSL. Connecting and sending notifications to things like slack just works tm.
So this is how we justify the datadog costs because of all the engineering time (engineers are still expensive, ai hasn’t replaced us yet) it saves and how quickly we can move from raw logs and metrics to useful insights.
mikeshi42 10 hours ago
Totally agree - you use an observability tool because it answers your questions quickly, not just return searches quickly.
Beyond raw performance and cost effectiveness, which is quite important at scale, we work a lot on making sure the application layer itself is intuitive to use. You can always play around with what ours looks like at play.hyperdx.io :)
bilalq 21 hours ago
This is really interesting.
Is Clickhouse the only stateful part of this stack? Would love to see compatbility with Rotel[0], a Rust implementation of the OTEL collector, so that this becomes usable for serverless runtime environments.
One key thing Datadog has is their own proprietary alternative to the OTEL collector that is much more performant.
mikeshi42 21 hours ago
I agree - rotel seems like a really good fit for a lightweight lambda integration for OTel, it of course should work already since we stand up an OTel ingest endpoint so it should be seamless to send data over! (Kind of the beauty of OTel of course)
I've also been in touch with Mike & Ray for a bit, who've told me they've added ClickHouse support recently which makes the story even better :)
mike_heffner 18 hours ago
Hi all — one of the authors of Rotel here. Thanks for the kind words, Bilal and Michael.
We're excited to test our Clickhouse integration with Clickstack, as we believe OTel and Clickhouse make for a powerful observability stack. Our open-source Rust OpenTelemetry collector is designed for high-performance, resource-constrained environments. We'd love for you to check it out!
smetj 9 hours ago
user3939382 21 hours ago
There’s so many of these log aggregators I’ve completely lost track. I used Datadog extensively and found it overpriced and a very confusing UI.
RhodesianHunter 20 hours ago
That's what happens when there's a need for something.
You see an explosion in offerings, and then eventually it's whittled down to a handful of survivors.
secondcoming 18 hours ago
Everyone has found Datadog to be overpriced!
So they switch to Prometheus and Grafana and now have to manage a Prometheus cluster. Far cheaper, but far more annoying.
wvh 7 hours ago
I have no experience with Datadog, but I'm not sure "cheaper" is an easy adjective to quantify. The whole metrics/logs/traces thing in Kubernetes is still painful, a lot of work and there's no end to the confusion. After several years in the trenches, it still takes me longer (i.e. more money) to install, configure and make sense of a monitoring stack than to set up the software it is monitoring.
It doesn't help that typically most software is ancient, spits out heaps of stack traces and wall-of-text output, doesn't use structured logging and generally doesn't let itself be monitored easily.
So yeah, getting meaningful insights from a highly available observability stack will take some serious time and resources, and I can understand smaller companies just handing it over to a third party so they can get on with their core business (AKA easy billing).
landl0rd 15 hours ago
Datadog is a good product but one of the most blatantly overpriced things I’ve had the displeasure to use.
ensignavenger 18 hours ago
Really interesting, Unfortunately, it looks like HyperDX depends on Mongo? I wonder if there are any open source document stores (possibly a mongo compatible one)( that could work with it?
ensignavenger 16 hours ago
FerretDB looks like a great alternative, thanks! I'll be keeping Ferret and ClickStack on my radar!
mikeshi42 17 hours ago
In theory you should be able to try using FerretDB for example.
We have this on the medium term roadmap to investigate proper support for a compatibilty layer such as ferret or more likely just using ClickHouse itself as the operational data store.
ptrfarkas 17 hours ago
FerretDB maintainer here - we'll be looking at this
wrn14897 11 hours ago
mikeshi42 17 hours ago
buserror 20 hours ago
I am absolutely amazed at the amount of garbage being "logged", enough that it is not just a huge business, but also one of the primary task for some devops guys. It's like a goal in itself, you have a look at the output and it is absolutely scary, HUGE messages being "logged" for purpose unknown.
I've seen single traces over 100KB of absolute pure randomness encoded as base64... Because! Oh and also, we have to pay for the service, so it looks important.
Sure they tell you it is super helpful for debugging issues, but in a VERY large proportion of cases, it is 1) WAY too much, and 2) never used anyway. And most of the time what's interesting is the last 10 minutes of the debug version, you don't need a "service" for that.
/me gets down his horse :-)
smetj 9 hours ago
I totally agree with this. Same for metrics.
metta2uall 12 hours ago
I think you're at least partially right - not everything but a lot of data is not useful - wasting money, bandwidth, electricity, etc. There should be more dynamic controls over what gets logged/filtered at the client-side..
SOLAR_FIELDS 18 hours ago
Comparison to the other player in this space, Signoz? Also uses clickhouse as backend
ah27182 14 hours ago
Do i need to sign-in when using the docker container?
mikeshi42 13 hours ago
There's a version that we call local mode which is intended for engineers using it as part of their local debugging workflow: https://clickhouse.com/docs/use-cases/observability/clicksta...
Otherwise yes you can authenticate against the other versions with a email/password (really the email doesn't do anything in the open source distribution, just a user identifier but we keep there to be consistent)
Immortalin 21 hours ago
I remember back in the day Mike was building Huggingface before Huggingface was a thing. He was ahead of his time. It's a pity model depot is no longer around.
mikeshi42 19 hours ago
Wow this is an incredible throwback! Can't believe your memory is this good. It's quite funny and I totally agree - I met the Gradio founders in an accelerator (when they were just getting started) after we shut down ModelDepot - and they of course ended up getting acquired into Hugging Face. It's funny how things end up sometimes :)
ksec 21 hours ago
It would have even much better if the link was pointing to https://github.com/hyperdxio/hyperdx the actual source code.
Because right now without the message on HN here, I wouldn't know what "open source observability stack" meant when the webpage does not explain what HyperDX is, nor does it provide a link to it or its code. I was expecting the whole thing "Open Source Datadog" to be ClickStack Repo inside Clickhouse Github. Which is not found anywhere.
But other than that congrats!. I have long wondered why no one has built anything on top of Clickhouse for Datadog / New Relic competition.
The Clickhouse DB opened up the ocean of open source "Scalable" Web Analytics that wont previously available or possible. I am hoping we see this change again to observability platform as well.
ankit01-oss 14 hours ago
check out SigNoz: https://github.com/SigNoz/signoz
We started building signoz as an OS alternative of Datadog/New Relic four years back and opentelemetry-native from day 1. We have shipped some good features on top of Opentelemetry and because of OTel's semantic conventions & our query builder, you can correlate any telemetry across signals.
mikeshi42 21 hours ago
Hey that's a good point on the link! Not something I can change now unfortunately, I was hoping having it near the top of the text post would help too for those that wanted to dig in more :)
That being said - as you've mentioned so many different "store tons of data" apps have been enabled from ClickHouse. Observability is at a point where it's in the same category of: ClickHouse can store a ton of data, OTel can help you collect/process it, and now we just need that analytics user experience layer to present it to the engineers that need an intuitive way to dive in to it all.
sirfz 21 hours ago
SigNoz is a dd/nr alternative built on clickhouse that I know of
cbhl 19 hours ago
Looks like it is pointing there now; old link was https://clickhouse.com/use-cases/observability for posterity