In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.
Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
data = File.readlines("haystack.txt")
data = data.map(&:strip)
data = data.grep(/needle/)
data = data.map { |i| i.gsub('foo', 'bar') }
data = data.map { |i| File.readlines(i).count }
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.
Lisp macros allow a general solution to this that doesn't just handle chained collection operators but allows you to decide the order in which you write any chain of calls.
For example, we can write:
(foo (bar (baz x))) as
(-> x baz bar foo)
If there are additional arguments, we can accommodate those too:
(sin (* x pi) as
(-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as
(->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
A pipeline operator is just partial application with less power. You should be able to bind any number of arguments to any places in order to create a new function and "pipe" its output(s) to any other number of functions.
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
Pipelining looks nice until you have to debug it. And exception handling is also very difficult, because that means to add forks into your pipelines. Pipelines are only good for programming the happy path.
While the author claims "semantics beat syntax every day of the week," the entire article focuses on syntax preferences rather than semantic differences.
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.
I think the biggest win for pipelining in SQL is the fact that we no longer have to explain that SQL execution order has nothing to do with query order, and we no longer have to pretend that we're mimicking natural language. (That last point stops being the case when you go beyond "SELECT foo FROM table WHERE bar LIMIT 10".)
No longer do we have to explain that expressions are evaluated in the order of FROM -> JOIN -> ON -> SELECT -> WHERE -> GROUP BY -> HAVING -> ORDER BY -> LIMIT (and yes, I know I'm missing several other steps). We can simply just express how our data flows from one statement to the next.
(I'm also stating this as someone who has yet to play around with the pipelining syntax, but honestly anything is better than the status quo.)
The left associativity of functions really doesn't work well with English reading left to right.
I found this especially clear with the'composition opperator' of functions. Where f.g has to mean f _after_ g because you really want:
f.g = f(g(x))
Based on this, I think a reverse polish type of notation would be a lot better. Though perhaps it is a lot nicer to think of "the sine of an angle" than "angle sine-ed".
Not that it matters much, the switching costs are immense. Getting people able to teach it would be impossible, and collaboration with people taught in the other system would be horrible. I am doubtful I could make the switch, even if I wanted.
Computer scientists continue to pick terrible names. Pipelining is already an overloaded concept that implies some type of operation level parallelism. Picking names like this does everyone in the field a disservice. Calling it something like “composition chain” would be much clearer with respect to existing literature in the field. Maybe I’m being nitpicky, but sometimes it feels like the tower of babel parable talking to folks who use different ecosystems.
Pipelining is great! Though sometimes you want to put the value in the first argument of a function, or a different location, or else call a method... it can be nice to simply refer to the value directly with `_` or `%` or `$` or something.
In fact, I always thought it would be a good idea for all statement blocks (in any given programming language) to allow an implicit reference to the value of the previous statement. The pipeline operation would essentially be the existing semicolons (in a C-like language) and there would be a new symbol or keyword used to represent the previous value.
For example, the MATLAB REPL allows for referring to the previous value as `ans` and the Julia REPL has inherited the same functionality. You can copy-paste this into the Julia REPL today:
[1, 2, 3];
map(x -> x * 2, ans);
@show ans;
filter(x -> x > 2, ans);
@show ans;
sum(ans)
You can't use this in Julia outside the REPL, and I don't think `ans` is a particularly good keyword for this, but I honestly think the concept is good enough. The same thing in JavaScript using `$` as an example:
I feel it would work best with expression-based languages having blocks that return their final value (like Rust) since you can do all sorts of nesting and so-on.
I suffer from (what I call) bracket claustrophobia. Whenever brackets get nested too deep I makes me uncomfortable. But I fully realize that there are people who are the complete opposite. Lisp programmers are apparently as claustrophil as cats and spelunkers.
In fact I tried to make some similar points in my CMU "SQL or Death" Seminar Series talk on PRQL (https://db.cs.cmu.edu/events/sql-death-prql-pipelined-relati...) in that I would love to see PRQL (or something like it) become a universal DSL for data pipelines. Ideally this wouldn't even have to go through some query engine and could just do some (byte)codegen for your target language.
P.S. Since you mentioned the Google Pipe Syntax HYTRADBOI 2025 talk, I just want to throw out that I also have a 10 min version for the impatient: https://www.hytradboi.com/2025/deafce13-67ac-40fd-ac4b-175d5...
That's just a PRQL overview though. The Universal Data Pipeline DSL ideas and comparison to LINQ, F#, ... are only in the CMU talk. I also go a bit into imperative vs declarative and point out that since "pipelining" is just function composition it should really be "functional" rather than imperative or declarative (which also came up in this thread).
Admittedly, the chaining is still better. But a fair number of the article's complaints are about the lack of newlines being used; not about chaining itself.
In concatenative languages with an implicit stack (Factor) that expression would read:
iter [ alive? ] filter [ id>> ] map collect
The beauty of this is that everything can be evaluated strictly left-to-right. Every single symbol. "Pipelines" in other languages are never fully left-to-right evaluated. For example, ".filter(|w| w.alive)" in the author's example requires one to switch from postfix to infix evaluation to evaluate the filter application.
The major advantage is that handling multiple streams is natural. Suppose you want to compute the dot product of two files where each line contains a float:
These articles never explain what’s wrong with calling each function separately and storing each return value in an intermediate variable.
Being able to inspect the results of each step right at the point you’ve written it is pretty convenient. It’s readable. And the compiler will optimize it out.
I always wondered how programming would be if we hadn't designed the assignment operator to be consistent with mathematics, and instead had it go LHS -> RHS, i.e. you perform the operation and then decide its destination, much like Unix pipes.
Pipelining in software is covered by Richard C. Waters (1989a, 1989b). Wrangles this library to work with JavaScript. Incredibly effective. Much faster at writing and composing code. And this code executes much faster.
I liked the pipelining syntax so much from pyspark and linq that I ended up implementing my own mini linq-like library for python to use in local development. It's mainly used in quick data processing scripts that I run locally. The syntax just makes everything much nicer to work with.
The one thing that I don’t like about pipelining (whether using a pipe operator or method chaining), is that assigning the result to a variable goes in the wrong direction, so to speak. There should be an equivalent of the shell’s `>` for piping into a variable as the final step. Of course, if the variable is being declared at the same time, whatever the concrete syntax is would still require some getting used to, being “backwards” compared to regular assignment/initialization.
> At this point you might wonder if Haskell has some sort of pipelining operator, and yes, it turns out that one was added in 2014! That’s pretty late considering that Haskell exists since 1990.
The tone of this (and the entire Haskell section of the article, tbh) is rather strange. Operators aren't special syntax and they aren't "added" to the language. Operators are just functions that by default use infix position. (In fact, any function can be called in infix position. And operators can be called in prefix position.)
The commit in question added & to the prelude. But if you wanted & (or any other character) to represent pipelining you have always been able to define that yourself.
Some people find this horrifying, which is a perfectly valid opinion (though in practice, when working in Haskell it isn't much of a big deal if you aren't foolish with it). But at least get the facts correct.
Maybe it's because I love the Unix shell environment so much, but I also really love this style. I try to make good use of it in every language I write code in, and I think it helps make my control flow very simple. With lots of pipelines, and few conditionals or loops, everything becomes very easy to follow.
Hack (Facebook's PHP fork) has this feature. It's called pipes [1]:
$x = vec[2,1,3]
|> Vec\map($$, $a ==> $a * $a) // $$ with value vec[2,1,3]
|> Vec\sort($$); // $$ with value vec[4,1,9]
It is a nice feature. I do worry about error reporting with any feature that combines multiple statements into a single statement, which is essentially what this does. In Java, there was always an issue with NullPointerExceptiosn being thrown and if you chain several things together you're never sure which one was null.
After seeing LangChain abusing the "|" operator overload for pipeline-like DSL, I followed the suite at work and I loved it. It's especially good when you use it in a notebook environment where you literally build the pipeline incrementally through repl.
PowerShell has the best pipeline capability of any language I have ever seen.
For comparison, UNIX pipes support only trivial byte streams from output to input.
PowerShell allows typed object streams where the properties of the object are automatically wired up to named parameters of the commands on the pipeline.
Outputs at any stage can not only be wired directly to the next stage but also captured into named variables for use later in the pipeline.
Every command in the pipeline also gets begin/end/cancel handlers automatically invoked so you can set up accumulators, authentication, or whatever.
UNIX scripting advocates don’t know what they’re missing out on…
Is pipelining the right term here? I've always used the term "transducer" to describe this kind of process, I picked it up from an episode of FunFunFunction if I'm not mistaken.
A thing I really like about pipelines in shell scripts, is all of the buffering and threading implied by them. Semantically, you can see what command is producing output, and what command is consuming it. With some idea of how the CPU will be split by them.
This is far different than the pattern described in the article, though. Small shame they have come to have the same name. I can see how both work with the metaphor; such that I can't really complain. The "pass a single parameter" along is far less attractive to me, though.
To one up this: Of course it is even better, if your language allows you to implement proper pipelining with implicit argument passing by yourself. Then the standard language does not need to provide it and assign meaning to some symbols for pipelining. You can decide for yourself what symbols are used and what you find intuitive.
Pipelining can guide one to write a bit cleaner code, viewing steps of computation as such, and not as modifications of global state. It forces one to make each step return a result, write proper functions. I like proper pipelining a lot.
I think there's a language syntax to be invented that would make everything suffix/pipeline-based. Stack based languages are kind of there, but I don't think exactly the same thing.
That new Rhombus language that was featured here recently has an interesting feature where you can use `_` in a function call to act as a "placeholder" for an argument. Essentially it's an easy way to partially apply a function. This works very well with piping because it allows you to pipe into any argument of a function (including optional arguments iirc) rather than just the first like many pipe implementations have. It seems really cool!
We had this - it was called variables. You could do:
x = iter(data);
y = filter(x, w=>w.isAlive);
z = map(y, w=>w.id);
return collect(z);
It doesn't need new syntax, but to implement this with the existing syntax you do have to figure out what the intermediate objects are, but you also have that problem with "pipelining" unless it compiles the whole chain into a single thing a la Linq.
> (This is not real Rust code. Quick challenge for the curious Rustacean, can you explain why we cannot rewrite the above code like this, even if we import all of the symbols?)
and you can because it's lazy, which is also the same reason you can write it the other way.. in rust. I think the author was getting at an ownership trap, but that trap is avoided the same way for both arrangements, the call order is the same in both arrangements. If the calls were actually a pipeline (if collect didn't exist and didn't need to be called) then other considerations show up.
Every example of why this is meant to be good is contrived.
You have a create_user function that doesn't error? Has no branches based on type of error?
We're having arguments over the best way break these over multiple lines?
Like.. why not just store intermediate results in variables? Where our branch logic can just be written inline? And then the flow of data can be very simply determined by reading top to bottom?
Pipelining is great. Currying is horrible. Though currying superficially looks similar to pipelining.
One difference is that currying returns an incomplete result (another function) which must be called again at a later time. On the other hand, pipelining usually returns raw values. Currying returns functions until the last step. The main philosophical failure of currying is that it treats logic/functions as if they were state which should be passed around. This is bad. Components should be responsible for their own state and should just talk to each other to pass plain information. State moves, logic doesn't move. A module shouldn't have awareness of what tools/logic other modules need to do their jobs. This completely breaks the separation of concerns principle.
When you call a plumber to fix your drain, do you need to provide them with a toolbox? Do you even need to know what's inside their toolbox? The plumber knows what tools they need. You just show them what the problem is. Passing functions to another module is like giving a plumber a toolbox which you put together by guessing what tools they might need. You're not a plumber, why should you decide what tools the plumber needs?
Currying encourages spaghetti code which is difficult to follow when functions are passed between different modules to complete the currying. In practice, if one can design code which gathers all the info it needs before calling the function once; this leads to much cleaner and much more readable code.
Why is the SQL syntax so unnecessarily convoluted? SQL is already an operator language, just an overly constrained one due to historical baggage. If you're going to allow new syntax at all, you can just do
from customer
left join orders on c_custkey = o_custkey and o_comment not like '%unusual%'
group by c_custkey
alias count(o_orderkey) as count_of_orders
group by count_of_orders
alias count(*) as count_of_customers
order by count_of_customers desc
select count_of_customers, count_of_orders;
I'm using 'alias' here as a strawman keyword for what the slide deck calls a free-standing 'as' operator because you can't reuse that keyword, it makes the grammar a mess.
The aliases aren't really necessary, you could just write the last line as 'select count(count(*)) ncust, count(*) nord' if you aren't afraid of nested aggregations, and if you are you'll never understand window functions, soo...
The |> syntax adds visual noise without expressive power, and the novelty 'aggregate'/'call' operators are weird special-case syntax for something that isn't that complex in the first place.
The implicit projection is unnecessary too, for the same reason any decent SQL linter will flag an ambiguous 'select *'
Pipelining might be my favorite programming language feature
(herecomesthemoon.net)370 points by Mond_ 21 April 2025 | 347 comments
Comments
Compare with a simple pipeline in bash:
Each of those components executes in parallel, with the intermediate results streaming between them. You get a similar effect with coroutines.Compare Ruby:
In that case, each line is processed sequentially, with a complete array being created between each step. Nothing actually gets pipelined.Despite being clean and readable, I don't tend to do it any more, because it's harder to debug. More often these days, I write things like this:
It's ugly, but you know what? I can set a breakpoint anywhere and inspect the intermediate states without having to edit the script in prod. Sometimes ugly and boring is better.However.
I would be lying if I didn't secretly wish that all languages adopted the `|>` syntax from Elixir.
```
params
|> Map.get("user")
|> create_user()
|> notify_admin()
```
For example, we can write: (foo (bar (baz x))) as (-> x baz bar foo)
If there are additional arguments, we can accommodate those too: (sin (* x pi) as (-> x (* pi) sin)
Where expression so far gets inserted as the first argument to any form. If you want it inserted as the last argument, you can use ->> instead:
(filter positive? (map sin x)) as (->> x (map sin) (filter positive?))
You can also get full control of where to place the previous expression using as->.
Full details at https://clojure.org/guides/threading_macros
One day, we'll (re)discover that partial application is actually incredibly useful for writing programs and (non-Haskell) languages will start with it as the primitive for composing programs instead of finding out that it would be nice later, and bolting on a restricted subset of the feature.
Pipelining can become hard to debug when chains get very long. The author doesn't address how hard it can be to identify which step in a long chain caused an error.
They do make fun of Python, however. But don't say much about why they don't like it other than showing a low-res photo of a rock with a pipe routed around it.
Ambiguity about what constitutes "pipelining" is the real issue here. The definition keeps shifting throughout the article. Is it method chaining? Operator overloading? First-class functions? The author uses examples that function very differently.
Building pipelines:
https://effect.website/docs/getting-started/building-pipelin...
Using generators:
https://effect.website/docs/getting-started/using-generators...
Having both options is great (at the beginning effect had only pipe-based pipelines), after years of writing effect I'm convinced that most of the time you'd rather write and read imperative code than pipelines which definitely have their place in code bases.
In fact most of the community, at large, converged at using imperative-style generators over pipelines and having onboarded many devs and having seen many long-time pipeliners converging to classical imperative control flow seems to confirm both debugging and maintenance seem easier.
No longer do we have to explain that expressions are evaluated in the order of FROM -> JOIN -> ON -> SELECT -> WHERE -> GROUP BY -> HAVING -> ORDER BY -> LIMIT (and yes, I know I'm missing several other steps). We can simply just express how our data flows from one statement to the next.
(I'm also stating this as someone who has yet to play around with the pipelining syntax, but honestly anything is better than the status quo.)
Not that it matters much, the switching costs are immense. Getting people able to teach it would be impossible, and collaboration with people taught in the other system would be horrible. I am doubtful I could make the switch, even if I wanted.
> Quick challenge for the curious Rustacean, can you explain why we cannot rewrite the above code like this, even if we import all of the symbols?
Probably for lack of
> weird operators like <$>, <*>, $, or >>=
In fact, I always thought it would be a good idea for all statement blocks (in any given programming language) to allow an implicit reference to the value of the previous statement. The pipeline operation would essentially be the existing semicolons (in a C-like language) and there would be a new symbol or keyword used to represent the previous value.
For example, the MATLAB REPL allows for referring to the previous value as `ans` and the Julia REPL has inherited the same functionality. You can copy-paste this into the Julia REPL today:
You can't use this in Julia outside the REPL, and I don't think `ans` is a particularly good keyword for this, but I honestly think the concept is good enough. The same thing in JavaScript using `$` as an example: I feel it would work best with expression-based languages having blocks that return their final value (like Rust) since you can do all sorts of nesting and so-on.Point-free style and pipelining were meant for each other. https://en.m.wikipedia.org/wiki/Tacit_programming
In fact I tried to make some similar points in my CMU "SQL or Death" Seminar Series talk on PRQL (https://db.cs.cmu.edu/events/sql-death-prql-pipelined-relati...) in that I would love to see PRQL (or something like it) become a universal DSL for data pipelines. Ideally this wouldn't even have to go through some query engine and could just do some (byte)codegen for your target language.
P.S. Since you mentioned the Google Pipe Syntax HYTRADBOI 2025 talk, I just want to throw out that I also have a 10 min version for the impatient: https://www.hytradboi.com/2025/deafce13-67ac-40fd-ac4b-175d5... That's just a PRQL overview though. The Universal Data Pipeline DSL ideas and comparison to LINQ, F#, ... are only in the CMU talk. I also go a bit into imperative vs declarative and point out that since "pipelining" is just function composition it should really be "functional" rather than imperative or declarative (which also came up in this thread).
The major advantage is that handling multiple streams is natural. Suppose you want to compute the dot product of two files where each line contains a float:
Being able to inspect the results of each step right at the point you’ve written it is pretty convenient. It’s readable. And the compiler will optimize it out.
https://dspace.mit.edu/handle/1721.1/6035
https://dspace.mit.edu/handle/1721.1/6031
https://dapperdrake.neocities.org/faster-loops-javascript.ht...
https://datapad.readthedocs.io/en/latest/quickstart.html#ove...
The tone of this (and the entire Haskell section of the article, tbh) is rather strange. Operators aren't special syntax and they aren't "added" to the language. Operators are just functions that by default use infix position. (In fact, any function can be called in infix position. And operators can be called in prefix position.)
The commit in question added & to the prelude. But if you wanted & (or any other character) to represent pipelining you have always been able to define that yourself.
Some people find this horrifying, which is a perfectly valid opinion (though in practice, when working in Haskell it isn't much of a big deal if you aren't foolish with it). But at least get the facts correct.
This is my biggest complaint about Python.
[1]: https://docs.hhvm.com/hack/expressions-and-operators/pipe
For comparison, UNIX pipes support only trivial byte streams from output to input.
PowerShell allows typed object streams where the properties of the object are automatically wired up to named parameters of the commands on the pipeline.
Outputs at any stage can not only be wired directly to the next stage but also captured into named variables for use later in the pipeline.
Every command in the pipeline also gets begin/end/cancel handlers automatically invoked so you can set up accumulators, authentication, or whatever.
UNIX scripting advocates don’t know what they’re missing out on…
This is far different than the pattern described in the article, though. Small shame they have come to have the same name. I can see how both work with the metaphor; such that I can't really complain. The "pass a single parameter" along is far less attractive to me, though.
Pipelining can guide one to write a bit cleaner code, viewing steps of computation as such, and not as modifications of global state. It forces one to make each step return a result, write proper functions. I like proper pipelining a lot.
BTW. For people complaining about debug-ability of it: https://doc.rust-lang.org/std/iter/trait.Iterator.html#metho... etc.
x = iter(data);
y = filter(x, w=>w.isAlive);
z = map(y, w=>w.id);
return collect(z);
It doesn't need new syntax, but to implement this with the existing syntax you do have to figure out what the intermediate objects are, but you also have that problem with "pipelining" unless it compiles the whole chain into a single thing a la Linq.
Um, you can:
and you can because it's lazy, which is also the same reason you can write it the other way.. in rust. I think the author was getting at an ownership trap, but that trap is avoided the same way for both arrangements, the call order is the same in both arrangements. If the calls were actually a pipeline (if collect didn't exist and didn't need to be called) then other considerations show up.You have a create_user function that doesn't error? Has no branches based on type of error?
We're having arguments over the best way break these over multiple lines?
Like.. why not just store intermediate results in variables? Where our branch logic can just be written inline? And then the flow of data can be very simply determined by reading top to bottom?
Instead of writing: a().b().c().d(), it's much nicer to write: d(c(b(a()))), or perhaps (d ∘ c ∘ b ∘ a)().
Anyway, JS wins again, give it a try if you haven't, it's one of the best languages out there.
The |> operator is really cool.
One difference is that currying returns an incomplete result (another function) which must be called again at a later time. On the other hand, pipelining usually returns raw values. Currying returns functions until the last step. The main philosophical failure of currying is that it treats logic/functions as if they were state which should be passed around. This is bad. Components should be responsible for their own state and should just talk to each other to pass plain information. State moves, logic doesn't move. A module shouldn't have awareness of what tools/logic other modules need to do their jobs. This completely breaks the separation of concerns principle.
When you call a plumber to fix your drain, do you need to provide them with a toolbox? Do you even need to know what's inside their toolbox? The plumber knows what tools they need. You just show them what the problem is. Passing functions to another module is like giving a plumber a toolbox which you put together by guessing what tools they might need. You're not a plumber, why should you decide what tools the plumber needs?
Currying encourages spaghetti code which is difficult to follow when functions are passed between different modules to complete the currying. In practice, if one can design code which gathers all the info it needs before calling the function once; this leads to much cleaner and much more readable code.
The aliases aren't really necessary, you could just write the last line as 'select count(count(*)) ncust, count(*) nord' if you aren't afraid of nested aggregations, and if you are you'll never understand window functions, soo...
The |> syntax adds visual noise without expressive power, and the novelty 'aggregate'/'call' operators are weird special-case syntax for something that isn't that complex in the first place.
The implicit projection is unnecessary too, for the same reason any decent SQL linter will flag an ambiguous 'select *'
A
.B
.C
I have no idea what this is trying to say, or what it has to do with the rest of the article.
... looking at you R and tidyverse hell.