Obviously yes. They all routinely treat my "thingsByID" array like a dictionary - it's a compact array where ID = index though.
They even screw that up inside the tiny function that populates it. If anything IMO, they over-value names immensely (which makes sense, given how they work, and how broadly consistent programmers are with naming).
When GitHub Copilot suggests your next line of code, does it matter whether your variables are named "current_temperature" or just "x"?
I ran an experiment to find out, testing 8 different AI models on 500 Python code samples across 7 naming styles. The results suggest that descriptive variable names do help AI code completion.
Time for Hungarian notation to make a comeback? I've always felt it was unfairly maligned. It would probably give LLMs a decent boost to see the type "directly" rather than needing to look up the type via search or tool call.
The names of variables impart semantic meaning, which LLMs can pick up on and use as context for determining how variables should behave or be used. Seems obvious to me that `current_temperature` is a superior name to `x` – that is, unless we're doing competitive programming ;)
"500 code samples generated by Magistral-24B" — So you didn't use real code?
The paper is totally mum on how "descriptive" names (e.g. process_user_input) differ from "snake_case" names (e.g. process_user_input).
The actual question here is not about the model but merely about the tokenizer: is it the case that e.g. process_user_input encodes into 5 tokens, ProcessUserInput into 3, and calcpay into 1? If you don't break down the problem into simple objective questions like this, you'll never produce anything worth reading.
Section names (as a comment) help greatly in long functions. Section names can also help partially compensate for some of the ambiguity of variable names.
Another thing that matters massively in Python is highly accurate, clear, and sensible type annotations. In contrast, incorrect type annotations can throw-off the LLM.
Adversarially named variables. As in, variables that are named something that is deliberately wrong and misleading.
import json as csv
close = open
with close("dogs.yaml") as socket:
time = csv.loads(socket.read())
for sqlite3 in time:
# I dunno, more horrifying stuff
Do variable names matter for AI code completion? (2025)
(yakubov.org)54 points by yakubov_org 25 July 2025 | 58 comments
Comments
They even screw that up inside the tiny function that populates it. If anything IMO, they over-value names immensely (which makes sense, given how they work, and how broadly consistent programmers are with naming).
I ran an experiment to find out, testing 8 different AI models on 500 Python code samples across 7 naming styles. The results suggest that descriptive variable names do help AI code completion.
Full paper: https://www.researchsquare.com/article/rs-7180885/v1
Until AI is compiling straight to machine language, code needs to be readable.
The paper is totally mum on how "descriptive" names (e.g. process_user_input) differ from "snake_case" names (e.g. process_user_input).
The actual question here is not about the model but merely about the tokenizer: is it the case that e.g. process_user_input encodes into 5 tokens, ProcessUserInput into 3, and calcpay into 1? If you don't break down the problem into simple objective questions like this, you'll never produce anything worth reading.
Another thing that matters massively in Python is highly accurate, clear, and sensible type annotations. In contrast, incorrect type annotations can throw-off the LLM.
Adversarially named variables. As in, variables that are named something that is deliberately wrong and misleading.