A Better R Programming Experience Thanks to Tree-sitter (ropensci.org)

101 points by sebg 7 hours ago

tylermw 2 hours ago

I read this article a week or so ago and immediately implemented a VS Code extension that I've always wanted: a static analysis tool for targets pipelines. targets is an R package which provides Make-like pipelines for data science and analysis work. You write your pipeline as a DAG and targets orchestrates the analysis and only re-runs downstream nodes if upstream ones are invalidated and the output changes. Fantastic tool, but at a certain level of complexity the DAG becomes a bit hard to navigate and reason about ("wait, what targets are downstream of this one again?"). This isn't really a targets problem, as this will happen with any analysis of decent complexity, but the structure targets adds to the analysis actually allows for a decent amount of static analysis of the environment/code. Enter tree-sitter.

I wrote a VS Code extension that analyzes the pipeline and provides useful hover information (like size, time last invalidated, computation time for that target, and children/parent info) as well as links to quickly jump to different targets and their children/parents. I've dogfooded the hell out of it and it's already vastly improved my targets workflow within a week. Things like providing better error hints in the IDE for targets-specific malformed inputs and showing which targets are emitting errors really take lots of the friction out of an analysis.

All that to say: nice work on extending tree-sitter to R!

tarborist: targets + tree-sitter https://open-vsx.org/extension/tylermorganwall/tarborist

GH: https://github.com/tylermorganwall/tarborist

nomilk 4 hours ago

The article makes out like auto completion and help on hover are new things, but RStudio IDE has had them for years and years.

R/RStudio was my first language/IDE. I was horribly shocked when moving into other languages to discover they didn't have things you got out of the box with R/RStudio. "You mean I have to look up documentation for a function/method!?! - that's supposed to be automatic!".

R has a bunch of features which other languages lack to the degree that it's a rude shock to learn that other ecosystems lack them. One is the REPL with extremely convenient RStudio keyboard shortcuts to run lines of code (to achieve similar with ruby, I have an elaborate neovim/slime setup that took hours to configure and still isn't as good as RStudio gives out of the box).

A sign of a brilliant tool is when an idiot can get more done with it than an expert can with alternatives.

MostlyStable 4 hours ago

Maybe that explains why I was confused about this article. I kept wondering what exactly on offer, and that it couldn't be as simple as help on hover and auto-complete, because those seemed pretty basic and prevalent. It took me a few years to move to RStudio, but at this point, I literally don't know anyone who doesn't use it. To the point that I once had to explain to a labmate that R and RStudio were, in fact, not the same thing.

So either this is not that exciting, or else the additional things that are on offer are not very clearly explained to the point that I missed them.

nomilk 3 hours ago

I suspect the main benefits are portability (since tree-sitter uses wasm and javascript it can run in any webpage - compared to the previous way of parsing R code which needed an R runtime, so not just any old website could do it; e.g. a shiny app probably could because it has an R runtime available but a standard HTML page couldn't). And the other is tree-sitter is a widely used tool so now anything that uses tree-sitter can now work with R, since the R grammar is available.

Looks like R's tree-sitter grammar has been in use for GitHub search for a while (since 2024), so it's a nice improvement due to R/tree-sitter, although we've probably been benefitting from it for a while already, perhaps without knowing exactly how it worked!

https://github.com/orgs/community/discussions/120397#discuss...

user3939382 an hour ago

epistasis 5 hours ago

Tree-sitter is one of the finer engineering products out there, it enables so much. Thanks to its creator and everyone who has contributed to this project and its many grammars!

fn-mote 4 hours ago

Do the tools built on this understand dplyr pipelines and columns in the data frames appearing as bare variables in the code? If so, I’m really impressed. R does some unusual stuff.

TacticalCoder 4 hours ago

I moved to tree-sitter inside Emacs a while ago and I'd say tree-sitter is much easier than it looks like.

I had a first little use case... For whatever reason the options to align let bindings in Clojure code, no matter if I tried the "semantic" or Tonsky's semi-standard way of formatting Clojure code (several tools adopted Tonsky's suggestion) and no matter which option/knob I turned on, I couldn't align like I wanted.

I really, really, really hate the pure horrible chaos of this:

    (let [abc (+ a 2)
          d (inc b)
          vwxyz (+ abc d)]
      ...
But I love the perfection of this [1]:

    (let [abc     (+ a 2)
          d       (inc b)
          vwxyz   (+ abc d)]
      ...
And the cljfmt is pretty agnostic about it: I can both use cljfmt from Emacs and have a hook forcing cljfmt and it'll align everything but it won't mess with those nice vertical alignments.

Now, I know, I know: it is supposed to work directly from cljfmt but many options are, still in the latest version, labelled as experimental and I simply couldn't make it work on my setup, no matter which knob I turned on.

So what did I do? Claude Code CLI, tree-sitter, and three elisp functions.

And I added my own vertical indenting to Clojure let bindings. And it's compatible with cljfmt (as in: if I run cljfmt it doesn't remove my vertical alignments).

I'd say the tree-sitter syntax tree is incredibly verbose (and has to be) but it's not that hard to use tree-sitter.

P.S: and I'm not alone in liking this kind of alignment and, no, we're not receptive to the "but then you modify one line and several lines are detected as modified". And we're less receptive by the day now that we begin to had tools like diff'ing tools that are indentation-agnostic and only do AST diffs.

eviks an hour ago

Can you move the closing ) to also be vertically aligned?

And the first +/inc in parenthesis?