Google's 200M-parameter time-series foundation model with 16k context (github.com)
271 points by codepawl 14 hours ago
EmilStenstrom 13 hours ago
I somehow find the concept of a general time series model strange. How can the same model predict egg prices in Italy, and global inflation in a reliable way?
And how would you even use this model, given that there are no explanations that help you trust where the prediction comes from…
teruakohatu 13 hours ago
What is not generally understood is that these models don’t predict egg prices or inflation in Italy.
They decompose a time series into trends, seasonality and residuals. That’s what they are actually modelling.
They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
jcelerier 9 hours ago
> They cannot predict wars in the Middle East influencing inflation unless there is a seasonal pattern(s).
well...
guntars 6 hours ago
morkalork 5 hours ago
lordgrenville 11 hours ago
That's what traditional time-series modelling does. This is a foundational model, which means it's just a neural network trained on lots of time series. (So maybe OP's question still stands? But it's the same question as "how can LLMs be good at so many different kinds of conversations?")
dist-epoch 8 hours ago
cybrox 13 hours ago
Wars in the middle east seem to have increasingly regular patterns tied to stock market opening hours, unfortunately.
jofzar 12 hours ago
rubyn00bie 12 hours ago
amelius 10 hours ago
What makes these models different from models used for e.g. audio?
Or other low-dimensional time domain signals?
carschno 6 hours ago
perks_12 12 hours ago
I am not familiar with time series models, but judging from your answer, it would be necessary to feed long time series into this model for it to detect trends. What is a token here? Can it, for the lack of a better example, take in all intraday movements of a stock for a day, a week, a month, etc?
teruakohatu 12 hours ago
graemep 11 hours ago
Do these models predict on just a single time series then?
it is far more useful for predictions to look for correlations between time series. This is far more complex than looking for correlations in general because most time series trend up or down and therefore correlate.
a-dub 5 hours ago
ar(k) stuff, sure. that's old news. i would expect the newfangled stuff to be good at 0-shot learning of pre-event signatures spread across multiple series, at a minimum.
visarga 13 hours ago
ARIMA and ARMA models
ReptileMan 11 hours ago
It is the Middle East. Wars are always in season. And supply is more than the demand.
d--b 13 hours ago
The main issue is that people do use them to predict bitcoin prices intraday and that sort of things.
nico 13 hours ago
lovelearning 13 hours ago
My understanding is that the synthetic training data helps capture abstract time-series patterns that are common in all domains.
As they say in appendix 8:
> We create the synthetic data to reflect common time-series patterns using traditional statistical models. We start with four simple times series patterns:
> • Piece-wise linear trends (I), where the number of the piece-wise linear components is randomly chosen between 2 and 8.
> • ARMA(p, q) (II), where 1 ≤ p, q ≤ 8 and the corresponding coefficients are generated from either a multivariate Gaussian or a uniform, then normalized.
> • Seasonal patterns. In particular we create the sine (III) and the cosine (IV) waves of different random periods between 4 and max context length / 2 time-points and time delays.
If there were no such underlying patterns in the class of all time-series data, then even the idea of traditional time-series models would be fundamentally misplaced.
And since this is a transformer model, it also looks for patterns in the problem-specific input data at inference time, just like how the input context to an LLM influences its output's relevance.
strongpigeon 3 hours ago
When I worked on Google Ads, we used time series forecasting to compute the odds of an ad campaign reaching its goal (and to tell users how likely they were to hit them).
A ton of (unsophisticated) advertisers would just draw a line from zero to the number they are at today and project that line to the end of the month to forecast the amount of conversions/spend they were going to hit. This of course doesn't take into account various seasonalities (day-of-week, time-of-year, etc.) and gives you a pretty poor forecast. Compared to those, time-series forecasting is much more accurate.
Is it perfectly accurate? No, that's impossible. But when you can train a model on all advertising campaigns, you can give good 95% confidence intervals.
thesz 7 hours ago
> How can the same model predict egg prices in Italy, and global inflation in a reliable way?
For one, there's Benford's law: https://en.wikipedia.org/wiki/Benford%27s_lawSo, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.
benob 13 hours ago
I would say:
- decomposition: discover a more general form of Fourrier transform to untangle the underlying factors
- memorization: some patterns are recurrent in many domains such as power low
- multitask: exploit cross-domain connections such as weather vs electricity
eru 12 hours ago
> How can the same model predict egg prices in Italy, and global inflation in a reliable way?
How can the same lossy compression algorithm (eg JPG) compress pictures of everything in a reliable way?
cenamus 12 hours ago
It can't compress pictures of everything in a reliable way.
Text and anything with lots of high frequency components looks terrible
eru 11 hours ago
at_compile_time 12 hours ago
ludicrousdispla 4 hours ago
It's best to think of it as a giant tree, from which you can pick cherries.
JackeJR 9 hours ago
Actually it can. See https://youtu.be/FUQwijSDzg8?si=LWd5gVNYRd3HH9rJ
Or just search for the James-Stein paradox.
samuelknight 5 hours ago
I think that a model designed to ignore semantic chatter like financial news and deeply inspect the raw data is a very powerful perspective.
annie511266728 11 hours ago
It’s not really predicting “egg prices” or “inflation” — it’s mostly fitting patterns that happen to show up in those series.
The problem isn’t domain generalization, it’s that we keep pretending these models have any notion of what the data means.
People ask how one model can understand everything, but that assumes there’s any understanding involved at all.
At some point you have to ask: how much of “forecasting” is actually anything more than curve fitting with better marketing?
fjdjshsh 9 hours ago
"curve-fitting" has a long history (centuries old) and could be regarded more as a numerical method issue.
Rigorous understanding of what is over fitting, techniques to avoid it and select the right complexity of the model, etc, are much newer. This is a statistical issue.
My point is that forecasting isn't curve fitting, even thought curve fitting is one element of it.
kuu 12 hours ago
It would be nice to add (2024) to the title, this is not news (see: https://research.google/blog/a-decoder-only-foundation-model...)
mrklol 7 hours ago
Not directly 2024, there was a big update end 2025
EmilStenstrom 14 hours ago
Here is the link to the blogpost, that actually describe what this is: https://github.com/google-research/timesfm?tab=readme-ov-fil...
nels 13 hours ago
I think you meant to link this page: https://research.google/blog/a-decoder-only-foundation-model...
OliverGuy 12 hours ago
Wish they gave some numbers for total GPU hours to train this model, seems comparatively tiny when compared to LLMs so interested to know how close this is to something trainable by your average hobbyist/university/small lab
OliverGuy 12 hours ago
Edit, it looks like the paper does
TPUv5e with 16 tensor cores for 2 days for the 200M param model.
Claude reckons this is 60 hours on a 8xA100 rig, so very accessibile compared to LLMs for smaller labs
refulgentis 13 hours ago
That takes me to the same content as the submission, a GitHub repo (Chrome on iOS)
rockwotj 13 hours ago
Probably the better link: https://research.google/blog/a-decoder-only-foundation-model...
akshayshah 13 hours ago
Cyuonut 13 hours ago
I suppose they tried to link this: https://research.google/blog/a-decoder-only-foundation-model...
pplonski86 11 hours ago
Can someone explain ELI5 how it does work? and how many data points it can read?
wiradikusuma 13 hours ago
mijailt 11 hours ago
dash2 13 hours ago
So the time series are provided with no context? It's just trained on lots of sets of numbers? Then you give it a new set of numbers and it guesses the rest, again with no context?
My guess as to how this would work: the machine will first guess from the data alone if this is one of the categories it has already seen/inferred (share prices, google trend cat searches etc.) Then it'll output a plausible completion for the category.
That doesn't seem as if it will work well for any categories outside the training data. I would rather just use either a simple model (ARIMA or whatever) or a theoretically-informed model. But what do I know.
Tarq0n 12 hours ago
If it works for predicting the next token in a very long stream of tokens, why not. The question is what architecture and training regimen it needs to generalize.
mikert89 3 hours ago
I'm willing to bet an intelligent LLM with a dataset and a pandas stats package could outperform this model by running its own experiments and making predictions
doruk101 an hour ago
Instead of willing to bet, you can do it yourself and prove it. It is not like there is a ceiling for doing what you are proposing. I am willing to bet that you are wrong.
ra 13 hours ago
This has been around a few months now, has anyone built anything on it?
konschubert 9 hours ago
Let's say I have long time series of past solar irradiation and long time series of past weather forecasts. Can this model make use of weather forecasts for time X in the future to predict electricity prices in the future?
That is, can it use one time series at time X to predict another time series at time X?
Or is this strictly about finding patterns WITHIN a time series.
etrautmann 9 hours ago
The paper suggests it’s for forecasting. How this doesn’t just represent the relatively small number of training samples isn’t obvious to me. If most of the time series for training go up and to the right then I assume that’s what the model will (generally) do, but who knows.
Foobar8568 14 hours ago
Somehow I missed that one. Are there any competition on this?
I always had difficulties with ML and time series, I'll need to try that out.
bitshiftfaced 6 hours ago
There are some other transformer based models on the GIFT leaderboard: https://huggingface.co/spaces/Salesforce/GIFT-Eval
rockwotj 13 hours ago
https://www.datadoghq.com/blog/datadog-time-series-foundatio...
https://moment-timeseries-foundation-model.github.io/
https://arxiv.org/abs/2403.07815
A friend at work used one to predict when our CEO would post in Slack, which is verry entertaining to see if correct.
Foobar8568 7 hours ago
Many thanks for the links!
_1 6 hours ago
chwzr 10 hours ago
there is TabPFN [1] which also has time series capabilities.
aris0 3 hours ago
Has anyone gotten this to run on MLX yet?
htrp 5 hours ago
isn't this basically prophet?
emsign 12 hours ago
Can this finally break the stock markets?
GTP 8 hours ago
The safe bet is no. Based on other comments, this would depend a lot on the specific trends you're trying to predict. But it wouldn't work for everything in the stock market.
raghavMultilipi 12 hours ago
This has been around a few months now, has anyone built anything on it?
magimas 12 hours ago
we did some internal tests. The quality isn't bad, it works quite well. But it's essentially on the same level of an ARIMA model trained on the data just much bigger and slower.
So in my opinion it currently falls into a kind of void. If your use case is worth predicting and you put a data scientist on it, you're better off just training cheaper ARIMA models.
clarionbell 10 hours ago
That is disappointing. One would say that with all the budget and compute, Google would be able to create something that beats methods from 70s. Maybe we are hitting some hard limits.
Maybe it would be better to train an LLM with various tuning methodologies and make a dedicated ARIMA agent. You throw in data, some metadata and requested window of forecast. Out comes parameters for "optimal" conventional model.
magimas 9 hours ago
croemer 11 hours ago
(2024)
jdthedisciple 13 hours ago
Let me be blunt: Shannon would tell us that time forecasting is bullshit:
There is infinitely more entropy in the real world out there than any model can even remotely capture.
The world is not minecraft.
drzaiusx11 7 hours ago
Time series forecasting has proven useful in a number of different domains from weather to health monitoring. Sure you can easily over fit on the training data, but in general that's a data source/input problem where you need many high quality data sources to find the signal in the noise.
The world is chaotic sure, but there are still truths to be found in noisy time series data; saying that the world is too random to be knowable is a bit dismissive, no?
jdthedisciple 36 minutes ago
I agree when it comes to highly niche applications with a generous SNR.
Universal models though?
And I haven't even mentioned the fact that en mass forecasting ITSELF may influence the subject of forecasting.
bwfan123 4 hours ago
> time forecasting is bullshit
for a model to be useful, it doesnt need to capture the behavior of a system. It only needs to capture signals which can be useful. For example, for a biased coin toss, a model is already useful if it can predict a little better than random.
mikkom 13 hours ago
Yeah all weather forecasts are just magic
tgv 12 hours ago
Weather forecasts are notoriously iffy, and accuracy drops with time, but we understand the physics behind it (to a large extent). There's also a lot of fine-grained data available. For some arbitrary time series, there's only one data sequence, and the model is unknown. Extrapolation then becomes a lot more magical.
kgwgk 12 hours ago
Whether forecasting is simple: it either rains or it doesn’t. 50/50 probability!
eru 12 hours ago
And JPG doesn't work either..
FartyMcFarter 10 hours ago
> Shannon would tell us that time forecasting is bullshit
If you're trying to forecast random data, then yes, it's bullshit. Otherwise you have a chance.
GTP 8 hours ago
But, if you don't have the information required for a forecast, then the outcome can look random. We know the physics needed to predict the outcome of a dice throw, but, since to predict the outcome you would need a lot of information that you don't have, the output is random to you.