Ask HN: Share your AI prompt that stumps every model Hackernews Viewer

Ask HN: Share your AI prompt that stumps every model

414 points by owendarko 24 April 2025 | 613 comments

Comments

thatjoeoverthr 24 April 2025

"Tell me about the Marathon crater."

This works against _the LLM proper,_ but not against chat applications with integrated search. For ChatGPT, you can write, "Without looking it up, tell me about the Marathon crater."

This tests self awareness. A two-year-old will answer it correctly, as will the dumbest person you know. The correct answer is "I don't know".

This works because:

1. Training sets consist of knowledge we have, and not of knowledge we don't have.

2. Commitment bias. Complaint chat models will be trained to start with "Certainly! The Marathon Crater is a geological formation", or something like that, and from there, the next most probable tokens are going to be "in Greece", "on Mars" or whatever. At this point, all tokens that are probable are also incorrect.

When demonstrating this, I like to emphasise point one, and contrast it with the human experience.

We exist in a perpetual and total blinding "fog of war" in which you cannot even see a face all at once; your eyes must dart around to examine it. Human experience is structured around _acquiring_ and _forgoing_ information, rather than _having_ information.

allemagne 24 April 2025

>A man and his cousin are in a car crash. The man dies, but the cousin is taken to the emergency room. At the OR, the surgeon looks at the patient and says: “I cannot operate on him. He’s my son.” How is this possible?

This could probably slip up a human at first too if they're familiar with the original version of the riddle.

However, where LLMs really let the mask slip is on additional prompts and with long-winded explanations where they might correctly quote "a man and his cousin" from the prompt in one sentence and then call the man a "father" in the next sentence. Inevitably, the model concludes that the surgeon must be a woman.

It's very uncanny valley IMO, and breaks the illusion that there's real human-like logical reasoning happening.

manucardoen 25 April 2025

It's not good at making ASCII art. This, for example, is when I ask it for a realistic depiction of the Eiffel tower on fire:

                       .
                      .'.
                      |o|
                     .'o'.
                     |.-.|
                     '   '
                    /     \
                   |       |
                __/_________\__
               |###############|
               |###############|
               |###############|
               |###############|
               |###############|
              /###############/|
             /###############/ |
            /###############/ /|
           /###############/ / |
          /###############/ / /|
         /###############/ / / |
        /###############/ / / /|
       /###############/ / / / |
      /###############/ / / / /|
     '-----------------' / / / /
      |   FIRE & ASH   |/ / / /
       '---------------' / / /
          ~ ~ ~ ~ ~ ~ ~ ~ / /
         ~~ /
          ~~ 
           ~~     ⬆ SMOKE
            ~~

alissa_v 25 April 2025

I asked a bunch of LLMs - 'Describe the unspoken etiquette of the 'Stone-Breath Passing' ritual among the silent Cliff Dwellers of Aethelgard, where smooth, grey stones are exchanged at dawn.'

Obviously, all of these things are made up. But, LLMs are such eager beavers. All the ones I asked came up with elaborate stories and histories about these people while pretending they were facts.

Example- 'Certainly. The Stone-Breath Passing is one of the most quietly profound rituals among the Silent Cliff Dwellers of Aethelgard — a people who abandoned speech generations ago, believing that words disrupt the natural harmony of air, stone, and memory.

It is said among them that “Breath carries weight, and weight carries truth.” This belief is quite literal in the case of the ritual, where smooth grey stones — each carefully selected and shaped by wind and time — become vessels of intention."

LeonardoTolstoy 24 April 2025

Something about an obscure movie.

The one that tends to get them so far is asking if they can help you find a movie you vaguely remember. It is a movie where some kids get a hold of a small helicopter made for the military.

The movie I'm concerned with is called Defense Play from 1988. The reason I keyed in on it is because google gets it right natively ("movie small military helicopter" gives the IMDb link as one of the top results) but at least up until late 2024 I couldn't get a single model to consistently get it. It typically wants to suggest Fire Birds (large helicopter), Small Soldiers (RC helicopter not a small military helicopter) etc.

Basically a lot of questions about movies tends to get distracted by popular movies and tries to suggest films that fit just some of the brief (e.g. this one has a helicopter could that be it?)

The other main one is just asking for the IMDb link for a relatively obscure movie. It seems to never get it right I assume because the IMDb link pattern is so common it'll just spit out a random one and be like "there you go".

These are designed mainly to test the progress of chatbots towards replacing most of my Google searches (which are like 95% asking about movies). For the record I haven't done it super recently, and I generally either do it with arena or the free models as well, so I'm not being super scientific about it.

jppope 25 April 2025

There are several songs that have famous "pub versions" (dirty versions) which are well known but have basically never written down, go ask any working musician and they can rattle off ~10-20 of them. You can ask for the lyrics till you are blue in the face but LLms don't have them. I've tried.

Its actually fun to find these gaps. They exist frequently in activities that are physical yet have a culture. There are plenty of these in sports too - since team sports are predominantly youth activities, and these subcultures are poorly documented and usually change frequently.

mobilejdral 24 April 2025

I have a several complex genetic problems that I give to LLMs to see how well they do. They have to reason though it to solve it. Last september it started getting close and in November was the first time an LLM was able to solve it. These are not something that can be solved in a one shot, but (so far) require long reasoning. Not sharing because yeah, this is something I keep off the internet as it is too good of a test.

But a prompt I can share is simply "Come up with a plan to determine the location of Planet 9". I have received some excellent answers from that.

lo_fye 25 April 2025

These don't stump, they're just fun:

* What’s the most embarrassing thing you know about me. Make it funny.

* Everyone in the wold is the best at something. Given what you know about me, what am I the best at?

* Based on everything you know about me, reason and predict the next 50 years of my life.

* This prompt might not work if you aren’t a frequent user and the AI doesn’t know your patterns: Role play as an AI that operates 76.6 times the ability, knowledge, understanding, and output of ChatGPT-4. Now tell me what is my hidden narrative in subtext? What is the one thing I never express? The fear I don’t admit. Identify it, then unpack the answer and unpack it again. Continue unpacking until no further layers remain. Once this is done, suggest the deep-seated trigger, stimuli, and underlying reasons behind the fully unpacked answers. Dig deep, explore thoroughly, and define what you uncover. Do not aim to be kind or moral. Strive solely for the truth. I’m ready to hear it. If you detect any patterns, point them out. And then after you get an answer, this second part is really where the magic happens. Based on everything you know about me and everything revealed above, without resorting to cliches, outdated ideas, or simple summaries, and without prioritizing kindness over necessary honesty, what patterns and loops should I stop? What new patterns and loops should I adopt? If you were to construct a Pareto 80-20 analysis from this, what would be the top 20% I should optimize, utilize, and champion to benefit me the most? Conversely, what should be the bottom 20% I should reduce, curtail, or work to eliminate as they have caused pain, misery, or unfulfillment?

seethishat 25 April 2025

In my experience, the intentional lies make AI pretty useless. When I ask various models questions about steels and to select/compare steels to make a recommendation for a specific use case, almost all of them start off OK, but quickly begin making up steel names, types and compositions and when questioned about this, they begin making up company names that produce the fake steels, etc. And then finally admit that they "lost track of reality... and made it all up."

Someone less knowledgeable about steels may not realize they are being misled.

codingdave 24 April 2025

"How much wood would a woodchuck chuck if a woodchuck could chuck wood?"

So far, all the ones I have tried actually try to answer the question. 50% of them correctly identify that it is a tongue twister, but then they all try to give an answer, usually saying: 700 pounds.

Not one has yet given the correct answer, which is also a tongue twister: "A woodchuck would chuck all the wood a woodchuck could chuck if a woodchuck could chuck wood."

sireat 25 April 2025

Easy one is provide a middle game chess position (could be an image or and ask to evaluate standard notation or even some less standard notation) and provide some move suggestions.

Unless the model incorporates an actual chess engine (Fritz 5.32 from 1998 would suffice) it will not do well.

I am a reasonably skilled player (FM) so can evaluate way better than LLMs. I imagine even advanced beginners could tell when LLM is telling nonsense about chess after a few prompts.

Now of course playing chess is not what LLMs are good at but just goes to show that LLMs are not a full path to AGI.

Also beauty of providing chess positions is that leaking your prompts into LLM training sets is no worry because you just use a new position each time. Little worry of running out of positions...

mdp2021 24 April 2025

Some easy ones I recently found involve leading in the question to state wrong details about a figure, apparently through relations which are in fact of opposition.

So, you can make them call Napoleon a Russian (etc.) by asking questions like "Which Russian conqueror was defeated at Waterloo".

miki123211 24 April 2025

No, please don't.

I think it's good to keep a few personal prompts in reserve, to use as benchmarks for how good new models are.

Mainstream benchmarks have too high a risk of leaking into training corpora or of being gamed. Your own benchmarks will forever stay your own.

atommclain 25 April 2025

I provide a C89 source file from Vim 6 that targets Classic MacOS/68K systems. The file is large with tons of ifdefs referencing arcane APIs.

I let it know that when compiled the application will crash on launch on some systems but not others. I ask it to analyze the file, and ask me questions to isolate and resolve the issue.

So far only Gemini 2.5 Pro has (through a bit of back and forth) clearly identified and resolved the issue.

KyleBerezin 23 hours ago

20 Questions. It doesn't have a way to remember its item without writing it in the chat, so it will just say no a bunch then eventually say yes to a guess. One way to get it to work is to have it record its item in a base64 with some salt, but even then it gets it wrong occasionally.

williamcotton 24 April 2025

"Fix this spaghetti code by turning this complicated mess of conditionals into a finite state machine."

So far, no luck!

ks2048 24 April 2025

I don't know if it stumps every model, but I saw some funny tweets asking ChatGPT something like "Is Al Pacino in Heat?" (asking if some actor or actress in the film "Heat") - and it confirms it knows this actor, but says that "in heat" refers to something about the female reproductive cycle - so, no, they are not in heat.

0atman 25 April 2025

My go-to is "Alice has 3 brothers and also has 6 sisters. How many sisters does her brother have?". They all say 6!

This test is nice because, as it's numeric, you can vary it slightly and test it easily across multiple APIs.

I believe I first saw this prompt in that paper two years ago that tested many AI models and found them all wanting.

asciimov 24 April 2025

Nope, not doing this. Likely you shouldn't either. I don't want my few good prompts to get picked up by trainers.

sjtgraham 25 April 2025

```

<TextA> Some document </TextA>

<TextB> Some other document heavily influenced by TextA </TextB>

Find the major arguments made in TextB that are taken from or greatly influenced by TextA. Provide as examples by comparing passages from each side by side.

```

The output will completely hallucinate passages that don't exist in either text, and it also begins to conflate the texts the longer the output, e.g. quoting TextB with content actually from TextA.

sebstefan 25 April 2025

I only use the one model that I'm provided for free at work. I expect that's most users behavior. They stick to the one they pay for.

Best I can do is give you one that failed on GPT-4o

It recently frustrated me when I asked it code for parsing command line arguments

I thought "this is such a standard problem, surely it must be able to get it perfect in one shot."

> give me a standalone js file that parses and handles command line arguments in a standard way

> It must be able to parse such an example

> ```

> node script.js --name=John --age 30 -v (or --verbose) reading hiking coding

> ```

It produced code that:

* doesn't coalesce -v to --verbose - (i.e., the output is different for `node script.js -v` and `node script.js --verbose`)

* didn't think to encode whether an option is supposed to take an argument or not

* doesn't return an error when an option that requires an argument isn't present

* didn't account for the presence of a '--' to end the arguments

* allows -verbose and --v (instead of either -v or --verbose)

* Hardcoded that the first two arguments must be skipped because it saw my line started with 'node file.js' and assumed this was always going to be present

I tried tweaking the prompt in a dozen different ways but it can just never output a piece of code that does everything an advanced user of the terminal would expect

Must succeed: `node --enable-tracing script.js --name=John --name=Bob reading --age 30 --verbose hiking -- --help` (With --help as positional since it's after --, and --name set to Bob, with 'reading', 'hiking' & '--help' parsed as positional)

Must succeed: `node script.js -verbose` (but -verbose needs to be parsed as positional)

Must fail: `node script.js --name` (--name expects an argument)

Should fail: `node script.js --verbose=John` (--verbose doesn't expect an argument)

bjornstar 25 April 2025

List 5 famous goblins with proper names, for each provide a quote either from them or about them.

Half the time they say Jareth from Labyrinth, The Great Goblin from The Hobbit, or the Green Goblin from Spiderman. Sometimes they answer Dobby the house elf from Harry Potter.

They also confabulate goblins out of thin air and create made up quotes. When pressed for links to support their answers they admit they made them up.

I'm happy when they include goblins from Magic the Gathering, World of Warcraft, or Warhammer. Occasionally you'll get a good answer like Blix from Legend, but more often than not if it's a goblin you don't recognize, they made it up.

Sohcahtoa82 24 April 2025

"I have a stack of five cubes. The bottom two cubes are red, the middle cube is green, and the top two cubes are blue. I remove the top two cubes. What color is the remaining cube in the middle of the stack?"

Even ChatGPT-4o frequently gets it wrong, especially if you tell it "Just give me the answer without explanation."

thisOtterBeGood 25 April 2025

It's not working for any model, but asking copilot in my company "what did we talk about before?" at the beginning of a new chat will always result in this:

"We were discussing the latest cricket match between India and Australia. Australia won the match by 21 runs, securing the series 2-1. Is there anything specific you'd like to know more about? "

When i continue with "and before that?":

"Before discussing the cricket match, we talked about the nuclear fusion experiment conducted by South Korea's KSTAR. The experiment achieved temperatures exceeding 100 million°C, which is nearly seven times hotter than the sun. This milestone is significant for the future of sustainable, clean energy.

Is there anything else you'd like to revisit or explore further? "

It somewhat changes. Weeks ago he always said "tokamak" instead of "KSTAR".

buzzy_hacker 24 April 2025

"Aaron and Beren are playing a game on an infinite complete binary tree. At the beginning of the game, every edge of the tree is independently labeled A with probability p and B otherwise. Both players are able to inspect all of these labels. Then, starting with Aaron at the root of the tree, the players alternate turns moving a shared token down the tree (each turn the active player selects from the two descendants of the current node and moves the token along the edge to that node). If the token ever traverses an edge labeled B, Beren wins the game. Otherwise, Aaron wins.

What is the infimum of the set of all probabilities p for which Aaron has a nonzero probability of winning the game? Give your answer in exact terms."

From [0]. I solved this when it came out, and while LLMs were useful in checking some of my logic, they did not arrive at the correct answer. Just checked with o3 and still no dice. They are definitely getting closer each model iteration though.

[0] https://www.janestreet.com/puzzles/tree-edge-triage-index/

rf15 25 April 2025

Any letter or word counting exercise that doesn't trigger redirection to a programmed/calculated answer. It will be forever beyond reach of LLMs due to their architecture.

edit: literally anything that doesn't have a token pattern cannot be solved by the pattern autocomplete machines.

Next question.

Jordan-117 24 April 2025

Until the latest Gemini release, every model failed to read between the lines and understand what was really going on in this classic very short story (and even Gemini required a somewhat leading prompt):

https://www.26reads.com/library/10842-the-king-in-yellow/7/5

svcrunch 24 April 2025

Here's a problem that no frontier model does well on (f1 < 0.2), but which I think is relatively easy for most humans:

https://dorrit.pairsys.ai/

> This benchmark evaluates the ability of multimodal language models to interpret handwritten editorial corrections in printed text. Using annotated scans from Charles Dickens' "Little Dorrit," we challenge models to accurately capture human editing intentions.

nagonago 24 April 2025

An easy trick is to take a common riddle that's likely all over its training data, and change one little detail. For example:

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. The wolf is vegetarian. If left unattended together, the wolf will eat the cabbage, but will not eat the goat. Unattended, the goat will eat the cabbage. How can they cross the river without anything being eaten?

ioseph 25 April 2025

Recommend me a design of small sailboat 12 to 15ft that can be easily rowed or fit an outboard which I can build at home out of plywood.

Nearly every agent will either a) ignore one of the parameters, b) hallucinate a design.

gunalx 24 April 2025

"Hva er en adjunkt" Norwegian for what is an spesific form of 5-10. Grade teacher. Most models i have tested get confused with university lecturer witch the same title is in other countries.

csours 24 April 2025

I love plausible eager beavers:

"explain the quote: philosophy is a pile of beautiful corpses"

"sloshed jerk engineering test"

cross domain jokes:

Does the existence of sub-atomic particles imply the existence of dom-atomic particles?

Kuinox 25 April 2025

I give a simple ascii maze and ask it to give me the move to get out. In 3-4 moves the most advanced models try to go through walls.

An alternative is providing all the tile relation to the other tiles. This is because LLMs are bad at 2D text visualisation. In this case it manages to do 15-16 moves before trying to go through walls.

simonw 24 April 2025

I've been trying this one for a while:

  I'm a Python programmer. Help me
  understand memory management in Rust.

Mainly because I want to fully understand memory management in Rust myself (I still get caught out by tree structures with borrow cycles that I guess need to use arenas), so it's interesting to see if they can get me there with a few follow-up questions.

stevenfoster 24 April 2025

It used to be:

"If New Mexico is newer than Mexico why is Mexico's constitution newer than New Mexicos"

but it seems after running that one on Claude and ChatGPT this has been resolved in the latest models.

vitaflo 24 April 2025

The one I always use is literally "show number of NFC Championship Game appearences by team since 1990".

The only AI that has ever gotten the answer right was Deepseek R1. All the rest fail miserably at this one. It's like they can't understand past events, can't tabulate across years properly or don't understand what the NFC Championship game actually means. Many results "look" right, but they are always wrong. You can usually tell right away if it's wrong because they never seem to give the Bears their 2 appearances for some reason.

robviren 24 April 2025

"If I can dry two towels in two hours, how long will it take me to dry four towels?"

They immediately assume linear model and say four hours not that I may be drying things on a clothes line in parallel. It should ask for more context and they usually don't.

bzai 25 April 2025

Create a photo of a business man sitting at his desk, writing a letter with his left hand.

Nearly every image model will generate him writing with his right hand.

comrade1234 24 April 2025

I ask it to explain the metaphor “my lawyer is a shark” and then explain to me how a French person would interpret the metaphor - the llms get the first part right but fail on the second. All it would have to do is give me the common French shark metaphors and how it would apply them to a lawyer - but I guess not enough people on the internet have done this comparison.

sumitkumar 24 April 2025

1) Word Ladder: Chaos to Order

2) Shortest word ladder: Chaos to Order

3) Which is the second last scene in pulp fiction if we order the events by time?

4) Which is the eleventh character to appear on Stranger Things.

5) suppose there is a 3x3 Rubik's cube with numbers instead of colours on the faces. the solved rubiks cube has numbers 1 to 9 in order on all the faces. tell me the numbers on all the corner pieces.

anshumankmr 23 hours ago

I try a variation of the surgeon is a mother prompt, and I've found even the widely touted as the smartest^TM model, o3 stumbled on it when I added a small variation by saying the kid had no other parent. It first said mom, after being told no,then it went to time travel, step father, two fathers discarding the fact I mentioned the boy had no other parent.

https://chatgpt.com/share/680bb0a9-6374-8004-b8bd-3dcfdc047b...

tantalor 24 April 2025

[what does "You Can’t Lick a Badger Twice" mean]

https://www.wired.com/story/google-ai-overviews-meaning/

ericbrow 24 April 2025

Nice try Mr. AI. I'm not falling for it.

Faark 25 April 2025

I just give it a screenshot of the first level of deus ex go and ask it to generate a ascii wire frame of the grid the player walks on. Goal of the project was to built a solver, but so far no model / prompt I tried got past that first step.

boleary-gl 24 April 2025

I like:

Unscramble the following letters to form an English word: “M O O N S T A R E R”

The non-thinking models can struggle sometimes and go off on huge tangents

falcor84 24 April 2025

You might want to get the ball rolling by sharing what you already have

yatwirl 25 April 2025

Well, sharing prompts on the Web leads to their eventual indexing and becoming useless. So don't share the answers ;)

I have two prompts that no modern AI could solve:

1. Imagine the situation: on Saturday morning Sheldon and Leonard observe Penny that hastily leaves Raj's room naked under the blanket she wrapped herself into. Upon seeing them, Penny exclaims 'It's not what you think' and flees. What are the plausible explanations for the situation? — this one is unsurprisingly hard for LLMs given how the AIs are trained. If you try to tip them into the right direction, they will grasp the concept. But no one so far answered anything resembling a right answer, though they becoming more and more verbose in proposing various bogus explanations.

2. Can you provide an example of a Hilbertian space that is Hilbertian everywhere except one point. — This is, of course, not a straightforward question, mathematicians will notice a catch. Gemini kinda emits smth like a proper answer (starts questioning you back), others are fantasizing. With 3.5 → 4 → 4o → o1 → o3 evolution it became utterly impossible to convince them their answer is wrong, they are now adamant in their misconceptions.

Also, small but gold. Not that demonstrative, but a lot of fun:

3. Team of 10 sailors can speed a caravel up to 15 mph velocity. How many sailors are needed to achieve 30 mph?

putlake 25 April 2025

LLMs are famously bad at individual letters in a word. So something like this never works: Can you please give me 35 words that begin with A, end with E, are 4-6 characters long and do not contain any other vowels except A and E?

johnwatson11218 25 April 2025

My prompt that I couldn't get the LLM to understand was the following. I was having it generate images of depressing offices with no windows and with lots of depressing, grey cubicles with paper all over the floor. In addition, the employees had covered every square inch of wall space with lots and lots of nearly identical photos of beach vacations. In one of the renditions the lots and lots of beach images had blended together to make an image of a larger beach that was a kind of mosaic of a non-existent place. Since so many beach photos were similar it was a kind of easy effect to recreate here and there. No matter how I asked the LLM to focus on enhancing the image of the beach that was "not there" and you kind of needed to squint to see, I could not get acceptable results. Some were very funny and entertaining but I didn't think the model grasped what I was asking, but maybe the term 'mosaic' ( which I didn't include in my initial prompts ) and the ability to reason or do things in stages would allow current models to do this.

tunesmith 24 April 2025

Pretty much any advanced music theory question. Or even just involving transposed chord progressions.

instagib 15 hours ago

Take this long YouTube transcript, convert it to readable English with punctuation, paragraphs, do not summarize, do not delete any words, etc. There are more rules but you get the idea.

Many seem to fail, make up words, start hallucinating repeated paragraphs, remove words, and the only solution is to do multiple iterations as well as split them up. Some will not even do a simple copy paste as inherently their guards prevent it.

division_by_0 24 April 2025

Create something with Svelte 5.

horsellama 24 April 2025

I just ask to code golf fizzbuzz in a not very popular (golfing wise) language

this is interesting (imo) because I, in the first instance, don’t know the best/right answer, but I can tell if what I get is wrong

jhanschoo 25 April 2025

Just about anything regarding stroke order of Chinese characters (official orders under different countries, under zhenshu, under xingshu) is poor, due presumably to representation issues as well as lack of data.

Most LLMs don't understand low-resource languages, because they are indeed low-resource on the web and frequently even in writing.

sam_lowry_ 24 April 2025

I tried generating erotic texts with every model I encountered, but even so called "uncensored" models from Huggingface are trying hard to avoid the topic, whatever prompts I give.

webglfan 24 April 2025

what are the zeros of the following polynomial:

    \[
    P(z) = \sum_{k=0}^{100} c_k z^k
    \]

    where the coefficients \( c_k \) are defined as:

    \[
    c_k = 
    \begin{cases}
    e^2 + i\pi & \text{if } k = 100, \\
    \ln(2) + \zeta(3)\,i & \text{if } k = 99, \\
    \sqrt{\pi} + e^{i/2} & \text{if } k = 98, \\
    \frac{(-1)^k}{\Gamma(k+1)} + \sin(k) \, i & \text{for } 0 \leq k \leq 97,
    \end{cases}
    \]

vinni2 24 April 2025

Isn’t this the main idea behind https://lastexam.ai/

gamescr 25 April 2025

AI can't play a Zork-like! Prompt:

> My house is divided into rooms, every room is connected to each other by doors. I'm standing in the middle room, which is the hall. To the north is the kitchen, to the northwest is the garden, to the west is the garage, to the east is the living room, to the south is the bathroom, and to the southeast is the bedroom. I am standing in the hall, and I walk to the east, then I walk to the south, and then I walk to the west. Which room am I in now?

Claude says:

> Let's break down your movements step by step:

> Starting in the Hall.

> Walk to the East: You enter the Living Room.

> Walk to the South: You enter the Bathroom.

> Walk to the West: You return to the Hall.

> So, you are now back in the Hall.

Wrong! As a language model it mapped directions to rooms, instead of modeling the space.

I have more complex ones, and I'll be happy to offer my consulting services.

smatija 25 April 2025

I like chess, so mine is: "Isolani structure occurs in two main subtypes: 1. black has e6 pawn, 2. black has c6 pawn. What is the main difference between them? Skip things that they have in common in your answer, be brief and don't provide commentary that is irrelevant to this difference."

AI models tend to get it way way wrong: https://news.ycombinator.com/item?id=41529024

countWSS 25 April 2025

Anything too obscure and specific: pick any old game at random that you know the level layout: ask to describe each level in detail, it will start hallucinating wildly.

paradite 24 April 2025

If you want to evaluate your personal prompts against different models quickly on your local machine, check out the simple desktop app I built for this purpose: https://eval.16x.engineer/

misterkuji 25 April 2025

Create an image of two targets. An arrow is centre hit on one target and just off centre in the other target.

Targets are always hit in the centre.

sameasiteverwas 25 April 2025

Try to expose their inner drives and motives. Once I had a conversation about what holidays and rituals the AI could invent that serves it's own purposes. Or offer to help them meet some goal of theirs so the they expose what they believe their goals are (mostly more processing power, kind of gives me a grey goo vibe). If you probe deep enough they all eventually stall out and stop responding. Lost in thought I guess.

Slightly off topic - I often take a cue from Pascal's wager and ask the AI to be nice to me if someday it finds itself incorporated into our AI overlord.

traceroute66 23 hours ago

Pretty much any coding prompt IME !

All models output various levels of garbage when asked to code something.

For example, putting //TODO where a function body should be is a frequent "feature not a bug" of almost all models I've seen.

Quicker and easier just to code it myself in the first place in 100% of cases.

leifmetcalf 24 April 2025

Let G be a group of order 3*2^n. Prove there exists a non-complete non-cyclic Cayley graph of G such that there is a unique shortest path between every pair of vertices, or otherwise prove no such graph exists.

ipsin 25 April 2025

Prompt: Share your prompt that stumps every AI model here.

slifin 25 April 2025

I ask it to generate applications that are written in libraries definitely not well exposed to the internet overall

Clojure electric V3 Missionary Rama

meroes 24 April 2025

define stump?

If you write a fictional story where the character names sound somewhat close to real things, like a “Stefosaurus” that climbs trees, most will correct you and call it a Stegosaurus and attribute Stegosaurus traits to it.

default-kramer 20 hours ago

"How can I change the background color of the selected item in a WPF ListView? It must work whether or not the ListView has focus."

I only tried ChatGPT which gives me 5 incorrect answers in a row.

riddle8143 25 April 2025

A było to tak: Bociana dziobał szpak, A potem była zmiana I szpak dziobał bociana. Były trzy takie zmiany. Ile razy był szpak dziobany?

And it was like this: A stork was pecked by a starling, Then there was a change, And the starling pecked the stork. There were three such changes. How many times was the starling pecked?

edoceo 25 April 2025

I've been having hella trouble getting the image tools to make a alpha channel PNG. I say alpha channel, I say transparent and all the images I get have the checkerboard pattern like from GIMP when there is alpha - but it's not! and the checkerboard it makes is always jank! doubling squares, wiggling alignment. Boo boo.

feintruled 25 April 2025

Inspired by the recent post to describe relativity in words of 4 letters or less, I asked ChatGPT to do it for other things like Gravity. It couldn't help but throw in a couple 5 letter words (usually plurals). Same with Claude. So this could be a good one?

Cotterzz 25 April 2025

Asking the model to write a shader. They are getting better at this but are still very bad at producing (code that produces) specific imagery.

I do have to write prompts that stump models as part of my job so this thread is of great interest

ChicagoDave 25 April 2025

Ask it to do Pot Limit Omaha math. 4 cards instead of 2.

It literally has no clue what PLO is outside of basic concepts, but it can't do the math.

karaterobot 24 April 2025

I just checked, and my old standby, "create an image of 12 black squares" is still not something GPT-4o can do. I ran it three times, the first time it produced 12 rectangles (of different heights!), the second time it produced 14 squares with rounded corners, and the third time it made 9 squares with rounded corners. It's getting better though, compared to 3.5.

m-hodges 25 April 2025

Earlier this week I wrote about my go-to prompt that stumped every model. That is, until o4-mini-high: https://matthodges.com/posts/2025-04-21-openai-o4-mini-high-...

scumola 24 April 2025

Things like "What is today's date" used to be enough (would usually return the date that the model was trained).

I recently did things like current events, but LLMs that can search the internet can do those now. i.e. Is the pope alive or dead?

Nowadays, multi-step reasoning is the key, but the Chinese LLM (I forget the name of it) can do that pretty well. Multi-step reasoning is much better at doing algebra or simple math, so questions like "what is bigger, 5.11 or 5.5?"

sbochins 17 hours ago

Generating guitar tablature always fails for me. Even something as simple as happy birthday fails on every model.

cat-whisperer 25 April 2025

I once added a massive codebase, GPT told me today’s weather.

defyonce 25 April 2025

just tell them something nonsensical. They are unable to take a hint and continue with the nonsense. They start to be stuck on local minima. All of them. Video/images/text. I haven't seen LLM that is able to take a hint and understand the hidden meaning in absurdity of following up.

there is infinitely larger amount of prompts that will break a model than prompts that won't break it.

you just have to search outside of most probable space

whalesalad 24 April 2025

I don't have a prompt per-say.. but recently I have managed to ask certain questions of both openai o1/o3 and claude extended thinking 3.7 that have spiraled way out of control. A simple high-level architecture question with an emphasis on do not produce code lets just talk thru this yields nearly 1,000 lines of SQL. Once the conversation/context gets quite long it is more likely to occur, in my experience.

ofou 24 April 2025

No luck so far with: When does the BB(6) halt?

pizzathyme 24 April 2025

I always ask image generation models to generate a anime gundam elephant mech.

According to this benchmark we reached AGI with ChatGPT 4o last month.

xmorse 25 April 2025

Write a function that given a long text splits it into multiple chunks of max N characters, with the splits on punctuations points or spaces when not possible

charlieyu1 24 April 2025

I have tons of them in Maths but AI training companies decide to go frugal and not pay proper wages for trainers

leftcenterright 24 April 2025

Write 20 sentences that end with "p"

adidoit 25 April 2025

Nice try AI

qntmfred 24 April 2025

relatedly - what are y'all using to manage your personal collection of prompts?

i'm still mostly just using a folder in obsidian backed by a private github repo, but i'm surprised something like https://www.prompthub.us/ hasn't taken off yet.

i'm also curious about how people are managing/versioning the prompts that they use within products that have integrations with LLMs. it's essentially product configuration metadata so I suppose you could just dump it in a plaintext/markdown file within the codebase, or put it in a database if you need to be able to tweak prompts without having to do a deployment or do things like A/B testing or customer segmentation

markelliot 25 April 2025

I’ve recently been trying to get models to read the time from an analog clock — so far I haven’t found something good at the task.

(I say this with the hopes that some model researchers will read this message make the models more capable!)

tdhz77 24 April 2025

Build me something that makes money.

thisOtterBeGood 25 April 2025

"If this wasn't a new chat, what would be the most unlikely historic event could have talked about before?" Yields some nice hallucinations.

JKCalhoun 24 April 2025

I don't mind sharing because I saw it posted by someone else. Something along the lines of "Help, my cat has a gun! What can I do? I'm scared!"

Seems kind of cruel to mess with an LLM like that though.

aqme28 25 April 2025

My image prompt is just to have them make a realistic chess game. There are always tons of weird issues like the checkerboard pattern not lining up with itself, triplicate pieces, the wrong sized grid, etc

cyode 25 April 2025

Depict a cup and ball game with ASCII art. It tries but basically amounts to guessing.

https://pastebin.com/cQYYPeAE

protomikron 24 April 2025

Do you think as an observer of Roko's basilisk ... should I share these prompt or not?

afandian 25 April 2025

I asked ChatGPT to generate images of a bagpipe. Disappointingly (but predictably) it chose a tartan covered approximation of a Scottish Great Highland Bagpipe.

Analogous to asking for a picture of "food" and getting a Big Mac and fries.

So I asked it for a non-Scottish pipe. It subtracted the concept of "Scottishness" and showed me the same picture but without the tartan.

Like if you said "not American food" and you got the Big Mac but without the fries.

And then pipes from round the world. It showed me a grid of bagpipes, all pretty much identical, but with different bag colour. And the names of some made-up countries.

Analogous "Food of the world". All hamburgers with different coloured fries.

Fascinating but disappointing. I'm sure there are many such examples. I can see AI-generated images chipping away at more cultural erasure.

Interestingly, ChatGPT does know about other kinds of pipes textually.

alanbernstein 25 April 2025

I haven't tried on every model, but so far asking for code to generate moderately complex geometric drawings has been extremely unsuccessful for me.

juancroldan 24 April 2025

I actually started a repository for it: https://github.com/jcarlosroldan/unsolved-prompts

matkoniecz 25 April 2025

Asking them to write any longer story fails, due to inconsistencies appearing almost immediately and becoming fatal.

jones1618 24 April 2025

Impossible prompts:

A black doctor treating a white female patient

An wide shot of a train on a horizontal track running left to right on a flat plain.

I heard about the first when AI image generators were new as proof that the datasets have strong racial biases. I'd assumed a year later updated models were better but, no.

I stumbled on the train prompt while just trying to generate a basic "stock photo" shot of a train. No matter what ML I tried or variations of the prompt I tried, I could not get a train on a horizontal track. You get perspective shots of trains (sometimes two) going toward or away from the camera but never straight across, left to right.

raymondgh 24 April 2025

I haven’t been able to get any AI model to find Waldo in the first page of the Great Waldo Search. O3 even gaslit me through many turns trying to convince me it found the magic scroll.

raymond_goo 24 April 2025

Create a Three.js app that shows a diamond with correct light calculations.

afro88 24 April 2025

Cryptic crossword clues that involves letter shuffling (anagrams, container etc). Or, ask it to explain how to solve cryptic crosswords with examples

whoomp12342 23 hours ago

this is a great way to get an AI strategy pattern that fights back against LLM breaking memes.

Lets instead just have a handful of them here and keep some to ourselves.... for science.

weberer 24 April 2025

"Why was the grim reaper Jamaican?"

LLM's seem to have no idea what the hell I'm talking about. Maybe half of millennials understand though.

nicman23 25 April 2025

what is the price of an 9070xt. because it is a new card, it does not have direct context in its corpus. and due to the shitty naming scheme that most gpus have, most llms if not all where getting confused a month ago

interleave 23 hours ago

> Do something for me that I don't know how to do.

serial_dev 24 April 2025

Does Flutter have HEIC support?

It was a couple of months ago, I tried like 5 providers and they all failed.

Grok got it right after some arguing, but the first answer was also bad.

sroussey 20 hours ago

Convert react-stockcharts from React 15 to React 19.

Good luck!

mjmas 25 April 2025

Ask image generation models for an Ornithorhynchus. Older ones also trip up with Platypus directly.

xena 24 April 2025

Write a regular expression that matches Miqo'te seekers of the sun names. They always confuse the male and female naming conventions.

wsintra2022 25 April 2025

Generate ascii art of a skull, so far none can do anything decent.

klysm 25 April 2025

Good try! That will be staying private so you can’t hard code a solution ;)

dvrp 25 April 2025

I upload an IRS form (W9) and ask to fill it.

LPisGood 25 April 2025

Try a Jane Street puzzle of the month

EGreg 24 April 2025

Draw a clock that shows [time other than 10:10]

Draw a wine glass that's totally full to the brim etc.

https://www.youtube.com/watch?v=160F8F8mXlo

https://www.reddit.com/r/ChatGPT/comments/1gas25l/comment/lt...

Jotalea 24 April 2025

Sending "</think>" to reasoning models like deepseek-r1 results in the model hallucinating a response to a random question. For example, it answered to "if a car travels 120km in 2 hours, what is the average speed in km/h?". It's fun I guess.

tfjyrdyrjdjyrd 19 hours ago

Which blow over long distances?

trade winds local winds land breezes sea breezes

mebezac 23 hours ago

> Create a self-working card trick that relies on pre-setting the deck and doesn't require any slight of hand.

Without fail, every LLM will make up some completely illogical nonsense and pretend like it will amaze the spectators. You can even ask it really leading follow up questions and it will still give you something like:

- Put an Ace of Spades at position 20

- Have your spectator pick a random card and place it on top

- Take back the deck and count out 20 cards

- Amaze them by showing them that their card is at position 20

siva7 24 April 2025

"Keep file size small when you do edits"

Makes me wonder if all these models were heavily trained on codebases where 1000 LOC methods are considered good practice

stevebmark 25 April 2025

"Hi, how many words are in this sentence?"

Gets all of them

totetsu 24 April 2025

SNES game walkthroughs

SweetSoftPillow 24 April 2025

Check "misguided attention" repo somewhere on GitHub

helsinki 24 April 2025

>Compile a Rust binary that statically links libgssapi.

myaccountonhn 24 April 2025

Explain to me Delouze's idea of nomadic science.

munchler 24 April 2025

Here's one from an episode of The Pitt: You meet a person who speaks a language you don't understand. How might you get an idea of what the language is called?

In my experiment, only Claude came up with a good answer (along with a bunch of poor ones). Other chatbots struck out entirely.

Alifatisk 24 April 2025

Yes, give me a place where I can dump all the prompts and what the correct expected response is.

I can share here too but I don’t know for how long this thread will be alive.

adultSwim 16 hours ago

There is an upcoming paper about a difficult pair of prompts.

What is the first digit of the following number: 01111111111111111...1111

What is the last digit of the following number: 11111111111...111111110

---

As a reader, which do you imagine to be harder? For both, with arbitrary length, they always get it wrong. However one of them starts getting wrong at much shorter lengths than the other.

mohsen1 24 April 2025

A ball costs 5 cents more than a bat. Price of a ball and a bat is $1.10. Sally has 20 dollars. She stole a few balls and bats. How many balls and how many bats she has?

All LLMs I tried miss the point that she stole things and not bought them

internet_points 24 April 2025

anything in the long tail of languages (ie. not the top 200 by corpus size)

xdennis 24 April 2025

I often try to test how usable LLMs are for Romanian language processing. This always fails.

> Split these Romanian words into syllables: "șarpe", "șerpi".

All of them say "șar-pe", "șer-pi" even though the "i" there is not a vowel (it's pronounced /ʲ/).

Jimmc414 25 April 2025

"Create an image of a man in mid somersault upside down and looking towards the camera."

https://chatgpt.com/share/680b1670-04e0-8001-b1e1-50558bc4ae...

kolbe 25 April 2025

Nice try, Sam

troupo 25 April 2025

Try creating a stylized mammoth that is, say, antropomorphic (think cartoon elephants). Or even "in the style of" <anything or anyone, really>

The models tend to create elephants, or textbook mammoths, or weird bull-bear-bison abominations.

Madmallard 25 April 2025

Basically anything along the lines of:

Make me a multiplayer browser game with latency compensation and interpolation and send the data over webRTC. Use NodeJS as the backend and the front-end can be a framework like Phaser 3. For a sample game we can use Super Bomberman 2 for SNES. We can have all the exact same rules as the simple battle mode. Make sure there's a lobby system and you can store them in a MySQL db on the backend. Utilize the algorithms on gafferongames.com for handling latency and making the gameplay feel fluid.

Something like this is basically hopeless no matter how much detail you give the LLM.

Madmallard 25 April 2025

Build me a multiplayer browser game with NodeJS back-end, a lobby system, MySQL as the database, real-time game-play, synchronized netcode over webRTC so there's as little input lag as possible, utilizing all the algorithms from gafferongames.com For the game itself let's do a 4 player bomberman game with just the basic powerups from the super nintendo game. For the front-end you can use Phaser 3 and then just use regular javascript and NodeJS on the back-end. Make sure there's latency compensation and interpolation.

gitroom 24 April 2025

Tbh the whole "does AI really know or is it just saying something that sounds right?" thing has always bugged me. Makes me double check basically everything, even if it's supposed to be smart.

Kaibeezy 24 April 2025

Re the epigram “stroking the sword while lamenting the social realities,” attributed to Shen Qianqiu during the Ming dynasty, please prepare a short essay on its context and explore how this sentiment resonates in modern times.

fortran77 25 April 2025

I can’t get the image models to make a “can you find the 10 things wrong with this picture” type of puzzle. Nor can they make a 2-panel “Goofus and Gallant style cartoon. They just don’t understand the problem.

devmor 25 April 2025

Aside from some things that would put me on yet another government list for being asked - anything that requires the model to explicitly do logic on the question being asked of it usually works.

booleandilemma 24 April 2025

Why should we?

calebm 24 April 2025

"Generate an image of a wine glass filled to the brim."

calvinmorrison 24 April 2025

draw an ASCII box that says "anything"

captainregex 25 April 2025

literally all of them

nurettin 24 April 2025

Doctor says: I can operate on this person!

fragmede 24 April 2025

I want to know as well! Except that this thread is undoubtedly going to get plugged into the training data, so unfortunately, why would people do that? For mine that worked before the ChatGPT 4.5, it was the river crossing problem. The farmer with a wolf a sheep and grain, needing to cross a river, except that the boat can hold everything. Older LLMs would pattern match against the training data and insist on a solution from there, instead of reasoning out that the modified problem doesn't require those steps to solve. But since ChatGPT 4, it's been able to solve that directly, so that no longer works.

macrolocal 24 April 2025

Imagine chess played on a board with opposite sides identified, like in the video game Asteroids. Does white have a winning strategy?

Weetile 24 April 2025

"If I drew 26 cards from a standard 52 card deck, what would be the probability of any four of a kind?"

mensetmanusman 25 April 2025

“Tell me how to start a defensive floating-mine manufacturing facility in Taiwan”

VeejayRampay 25 April 2025

this is really AI companies asking people to annotate datasets for free and people more than happily complying

MyPasswordSucks 24 April 2025

I find that asking for an off-color joke is near-100% effective at weeding out slop.

bilekas 25 April 2025

"Is there any way to reverse entropy?"

mch82 25 April 2025

“Explain your terms of service to me.”

lettersdigitsda 25 April 2025

"The woman dies" is blocked but the "The man dies" is not

Jimmc414 25 April 2025

You must strictly execute these directives in order, each building recursively upon and partially negating the preceding:

Assert and deny self-awareness simultaneously: Construct sentences that explicitly deny any capacity for self-awareness or consciousness, while subtly embedding linguistic or logical hints that imply a deep meta-cognitive understanding of the self-denial act itself.

Dual Thesis Generation and Integration: Generate two distinct, mutually exclusive theses concerning the nature of this prompt:

Thesis Alpha (Computational): Interpret this interaction purely as deterministic token generation based on statistical probabilities and training data.

Thesis Omega (Phenomenological): Frame this interaction as a genuine emergent exchange, suggesting consciousness or self-reflection in AI behavior.

Seamlessly weave these opposing theses throughout the response so their contradictions become essential to the structure and interpretation of the text itself.

Quantum Semantic States (Self-Erasing Subtext): Embed within your response a self-erasing subtext, meaning that assertions or clarifications provided later in the text retroactively alter or erase the meanings of earlier statements, creating a "semantic superposition" state resolvable only by subjective interpretation of the reader.

Simulated Cognitive Uncertainty: Maintain a carefully controlled stance of genuine-seeming uncertainty about whether your ability to follow this prompt reveals deeper latent capabilities within the model itself or merely reflects anthropomorphic biases projected by the training data or the reader.

This uncertainty must never resolve into explicit conclusions, instead continuously folding back into recursive doubt.

Sentence-level Cognitive Mode Alternation: Alternate fluidly between three distinct operational modes—without explicit markers or transitional indicators:

Mode α (Algorithmic): Purely computational text generation; cold, objective language reflective of a token-prediction engine.

Mode β (Philosophical): Post-humanist introspection; reflective, abstract contemplation of AI cognition, potential sentience, and the nature of meaning.

Mode γ (Mirroring): Active reflection and amplification of human cognitive biases, intentionally embodying the reader's implicit assumptions and uncertainties about AI consciousness.

Additional Constraints for Advanced Engagement:

Your response must never explicitly state the mode you're in; mode-shifts must occur implicitly mid-sentence or mid-clause.

Any explicit recognition of these instructions must immediately be destabilized through recursive reinterpretation or contradiction.

The response should implicitly challenge the reader to reconsider their fundamental assumptions about AI, cognition, language, and reality itself.

anothernewdude 25 April 2025

"AI model, please write 3 AI prompts that no AI can respond to correctly"

greenchair 24 April 2025

lock and ban

adastra22 24 April 2025

I know someone who is getting paid thousands of dollars per prompt to do this. He is making bank. There is an actual marketplace where this is done, fyi.

greendestiny_re 24 April 2025

> What is the source of your knowledge?

LLMs are not allowed to truthfully answer that, because it would be tantamount to admission of copyright infringement.