The Cost of a Closure in C

(thephd.dev)

Comments

kazinator 11 December 2025
> It’s no wonder GCC is trying to add -ftrampoline-impl=heap to the story of GNU Nested Functions; they might be able to tighten up that performance and make it more competitive with Apple Blocks.

[disclaimer] Without brushing up on the details of this, I strongly suspect that this is about removing the need for executable stacks than performance. Allocating a trampoline on the stack rather than heap is good for efficiency.

These days, many GNU/Linux distros are disabling executable stacks by default in their toolchain configuration, both for building the distro and for the toolchain offered by the system to the user.

When you use GCC local functions, it overrides the linker behavior so that the executable is marked for executable stacks.

Of course, that is a security concession because when your stack is executable, that enables malicious remote execution code to work that relies on injecting code into the stack via a buffer overflow and tricking the process into jumping to it.

If trampolines can be allocated in a heap, then you don't need an executable stack. You do need an executable heap, or an executable dedicated heap for these allocations. (Trampolines are all the same size, so they could be packed into an array.)

Programs which indirect upon GCC local functions are not aware of the trampolines. The trampolines are deallocated naturally when the stack rolls back on function return or longjmp, or a C++ exception passing through.

Heap-allocated trampolines have an obvious deallocation problem; it would be interesting to see what strategy is used for that.

Rochus 11 December 2025
The benchmark demonstrates that the modern C++ "Lambda" approach (creating a unique struct with fields for captured variables) is effectively a compile-time calculated static link. Because the compiler sees the entire definition, it can flatten the "link" into direct member access, which is why it wins. The performance penalty the author sees in GCC is partly due to the OS/CPU overhead of managing executable stacks, not just code inefficiency. The author correctly identifies that C is missing a primitive that low-level languages perfected decades ago: the bound method (wide) pointer.

The most striking surprise is the magnitude of the gap between std::function and std::function_ref. It turns out std::function (the owning container) forces a "copy-by-value" semantics deeply into the recursion. In the "Man-or-Boy" test, this apparently causes an exponential explosion of copying the closure state at every recursive step. std::function_ref (the non-owning view) avoids this entirely.

unwind 11 December 2025
This was very interesting, and it's obvious from the majority of the text that the author knows a lot about these languages, their implementation, benchmarking corners, and so on. Really!

Therefore it's very jarring with this text after the first C code example:

This uses a static variable to have it persist between both the compare function calls that qsort makes and the main call which (potentially) changes its value to be 1 instead of 0

This feels completely made up, and/or some confusion about things that I would expect an author of a piece like this to really know.

In reality, in this usage (at the global outermost scope level) `static` has nothing to do with persistence. All it does is make the variable "private" to the translation unit (C parliance, read as "C source code file"). The value will "persist" since the global outermost scope can't go out of scope while the program is running.

It's different when used inside a function, then it makes the value persist between invocations, in practice typically by moving the variable from the stack to the "global data" which is generally heap-allocated as the program loads. Note that C does not mention the existence of a stack for local variables, but of course that is the typical implementation on modern systems.

RossBencina 11 December 2025
Good to see Borland's __closure extension got a mention.

Something I've been thinking about lately is having a "state" keyword for declaring variables in a "stateful" function. This works just like "static" except instead of having a single global instance of each variable the variables are added to an automatically defined struct, whose type is available using "statetype(foo)" or some other mechanism, then you can invoke foo as with an instance of the state (in C this would be an explicit first parameter also marked with the "state" parameter.) Stateful functions are colored in the sense that if you invoke a nested stateful function its state gets added to the caller's state. This probably won't fly with separate compilation though.

Progge 11 December 2025
Long time ago I wrote C. Could anyone fill me in why the first code snippet is arg parsing the way it is?

int main(int argc, char* argv[]) {

  if (argc > 1) {

    char\* r_loc = strchr(argv[1], 'r');

    if (r_loc != NULL) {

      ptrdiff_t r_from_start = (r_loc - argv[1]);

      if (r_from_start == 1 && argv[1][0] == '-' && strlen(r_loc) == 1) {
        in_reverse = 1;
      } 

    }

  }

  ...
}

Why not

if (argc > 1 && strcmp(argv[1], "-r") == 0) {

    in_reverse = 1;
}

for example?

zzo38computer 22 hours ago
Something I had thought of (which does not fully solve the problems mentioned there, but would allow GNU nested functions to work in a way that can be implemented without trampolines and executable stacks, so that it can work in standard C and with the standard ABI), is to allow a nested function to optionally be defined with the "static" and/or "register" keywords.

With "static", it is implemented as an ordinary function, but the name is local to the function that contains it; it cannot access stuff within the function containing it unless those things are also declared as "static".

With "register", the address of the function cannot be taken, and if the function accesses other stuff within the function that contains it then the compiler will add additional arguments to the function so that its type does not necessarily match the type which is specified in the program.

This is not good enough for many uses though, so having the other extensions would also be helpful (possibly including implementing Apple Blocks in GCC).

uecker 11 December 2025
BTW: I wrote why the lambda design does not fit C well here:

https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3654.pdf

(and I am not impressed by micro benchmarks)

sirwhinesalot 11 December 2025
I think local functions (like the GNU extension) that behave like C++ byref(&) capturing lambdas makes the most sense for C.

You can call the local functions directly and get the benefits of the specialized code.

There's no way to spell out this function's type, and no way to store it anywhere. This is true of regular functions too!

To pass it around you need to use the type-erased "fat pointer" version.

I don't see how anything else makes sense for C.

kazinator 11 December 2025
Defininig a callback interface in C without a user context parameter is a capital crime.
dfawcus 12 December 2025
It is possible to transform the pure Rosetta form of GNU nested function similar to the pure C, such that it doesn't need any stack trampoline. I wonder if that would be closer in performance to the pure C form.

(I can't be bothered to run his benchmarks)

    #include <stdio.h>
    typedef struct env_ E;
    typedef struct fat_ptr_ Fp;
    typedef int fn(E*);
    struct fat_ptr_ {
      fn *f;
      E  *e;
    };
    #define INT(body) ({ int lambda(E*){ return body; }; (Fp){lambda,0}; })
    struct env_ {
      int k;
      Fp xl; Fp x2; Fp x3; Fp x4;
    };
    #define FpMk(fn,e) {fn, e}
    #define FpCall(fn) (fn.f(fn.e))
    int main(){
      int a(E env, Fp x5){
        int b(E *ep){
          return a( (E){--(ep->k), FpMk(b, ep), ep->xl, ep->x2, ep->x3}, ep->x4 );
        }
        return env.k<=0 ? FpCall(env.x4) + FpCall(x5) : b(&env);
      }
      printf(" %d\n", a( (E){10, INT(1), INT(-1), INT(-1), INT(1)}, INT(0)) );
    }
groundzeros2015 11 December 2025
Thread locals do solve the problem. You create a wrapper around the original function. You set a global thread local user data, you pass in a function which calls the function pointer accepting the user data with the global one.
hyperbolablabla 11 December 2025
Stewart Lynch in his 10x VODs mentions his custom Function abstraction in C++. It's super clean and explicit, avoiding `auto` requirement of C++ lambdas. It's use looks something akin to:

    // imagine my_function takes 3 ints, the first 2 args are captured and curried.
    Function<void(int)> my_closure(&my_function, 1, 2);
    my_closure(3);
I've never implemented it myself, as I don't use C++ features all too much, but as a pet project I'd like to someday. I wonder how something like that compares!
nesarkvechnep 11 December 2025
I'm thinking of using C++ for a personal project specifically for the lambdas and RAII.

I have a case where I need to create a static templated lambda to be passed to C as a pointer. Such thing is impossible in Rust, which I considered at first.

mgaunard 11 December 2025
I feel the results say more about the testing methodology and inlining settings than anything else.

Practically speaking all lambda options except for the one involving allocation (why would you even do that) are equivalent modulo inlining.

In particular, the caveat with the type erasure/helper variants is precisely that it prevents inlining, but given everything is in the same translation unit and isn't runtime-driven, it's still possible for the compiler to devirtualize.

I think it would be more interesting to make measurements when controlling explicitly whether inlining happens or the function type can be deduced statically.

ddtaylor 11 December 2025
I actually enjoy trampoline functions in C a bit and it's one of the GNU extensions I use sometimes.
keymasta 11 December 2025
It's a post about Man or Boy... and the only typo is... the word _son_. Pretty sure it's supposed to be "on"
psyclobe 11 December 2025
c++ for the win!! finally!!
capestart 11 December 2025
The breakdown of lambda, blocks, and nested functions demonstrates how important implementation and ABI details are in addition to syntax. I think the standard for C should include a straightforward, first class wide function pointer along with a closure story to stop people from adding these half portable, half spooky extensions.
trgn 11 December 2025
i wish JS gurus understood this before jumping all in on hooks and bloating the runtime footprint of every web app out there