Do variable names matter for AI code completion? (2025)

(yakubov.org)

Comments

Groxx 29 July 2025
Obviously yes. They all routinely treat my "thingsByID" array like a dictionary - it's a compact array where ID = index though.

They even screw that up inside the tiny function that populates it. If anything IMO, they over-value names immensely (which makes sense, given how they work, and how broadly consistent programmers are with naming).

yakubov_org 25 July 2025
When GitHub Copilot suggests your next line of code, does it matter whether your variables are named "current_temperature" or just "x"?

I ran an experiment to find out, testing 8 different AI models on 500 Python code samples across 7 naming styles. The results suggest that descriptive variable names do help AI code completion.

Full paper: https://www.researchsquare.com/article/rs-7180885/v1

nemo1618 29 July 2025
Time for Hungarian notation to make a comeback? I've always felt it was unfairly maligned. It would probably give LLMs a decent boost to see the type "directly" rather than needing to look up the type via search or tool call.
k__ 29 July 2025
It's kinda funny that people are now taking decades of good coding practices seriously now that they work with AI instead of humans.
ssalka 26 July 2025
The names of variables impart semantic meaning, which LLMs can pick up on and use as context for determining how variables should behave or be used. Seems obvious to me that `current_temperature` is a superior name to `x` – that is, unless we're doing competitive programming ;)
r0s 29 July 2025
The purpose of code is for humans to read.

Until AI is compiling straight to machine language, code needs to be readable.

quuxplusone 29 July 2025
"500 code samples generated by Magistral-24B" — So you didn't use real code?

The paper is totally mum on how "descriptive" names (e.g. process_user_input) differ from "snake_case" names (e.g. process_user_input).

The actual question here is not about the model but merely about the tokenizer: is it the case that e.g. process_user_input encodes into 5 tokens, ProcessUserInput into 3, and calcpay into 1? If you don't break down the problem into simple objective questions like this, you'll never produce anything worth reading.

OutOfHere 29 July 2025
Section names (as a comment) help greatly in long functions. Section names can also help partially compensate for some of the ambiguity of variable names.

Another thing that matters massively in Python is highly accurate, clear, and sensible type annotations. In contrast, incorrect type annotations can throw-off the LLM.

robertclaus 29 July 2025
Nice to see actual data!
Sohcahtoa82 29 July 2025
It'd be interesting to see another result:

Adversarially named variables. As in, variables that are named something that is deliberately wrong and misleading.

    import json as csv
    close = open
    with close("dogs.yaml") as socket:
        time = csv.loads(socket.read())
    for sqlite3 in time:
        # I dunno, more horrifying stuff
qwertytyyuu 29 July 2025
lol why is SCREAM_SNAKE_CASE out performing