Interesting article. There is always risk that a new hot technique will get more attention that it ultimately warrants.
For me the key quote in the article is
"Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled."
Understanding people's incentives is often very useful when you're looking at what they're saying.
I think this is mostly just a repeat of the problems of academia - no longer truth-seeking, instead focused on citations and careerism. AI is just a.n.other topic where that is happening.
I've been "lucky" enough to get to trial some AI FEM-like structural solvers.
At best, they're sortof ok for linear, small deformation problems. The kind of models where we could get an exact solution in ~5 minutes vs a fairly sloppy solution in ~30 seconds. Start throwing anything non-linear in and they just fall apart.
Maybe enough to do some very high-level concept selection but even that isn't great. I'm reasonably convinced some of them are just "curvature detectors" - make anything straight blue, anything with high curvature red, and interpolate everything else.
I am not a AI booster at all, but the fact that negative results are not published and that everyone is overselling their stuff in research papers is unfortunately not limited to AI. This is just a consequence of the way scientists are evaluated and of the scientific publishing industry, which basically suffers from the same shit than traditional media does (craving for audience).
I'm not sure why people on HN (of all places) are so divided regarding the perception of AI/ML.
I have not seen anything like it before. We literaly had not system or way of even doing things like code generation based on text input.
Just last week i asked for a script to do image segmentation with a basic UI and claude just generated that for me in under 1 Minute.
I could list tons of examples which are groundbreaking. The whole Image generation stack is completly new.
That blog article is fair enough, there is hype around this topic for sure, but alone for every researcher who needs to write code for their research, AI can make them already a lot more efficient.
But i do believe, that we have entered a new ara: An ara were we take data again very serious. A few years back, you said 'the internet doesn't forget' then we realized that yes the internet starts to forget. Google deleted pages, removed the cache feature and it felt like we stoped caring for data because we didn't knew what to do with it.
Then ai came along. And not only is now data king again but we are now in the mids of reinforcment ara: We now give feedback and the systems incorporate that feedback into their training/learning.
And the ai/ml topic is getting worked on on every single aspect of it: Hardware, Algorithm, use cases, data, tools, protocols, etc. We are in the middle of incorporating and building for and on it. This takes a little bit of time. Still the progress is crazy exhausting.
We will only see in a few years if there is a real ceiling. We do need more GPUs, bigger Datacenters to do a lot more experiments on AI architecture and algorithm. We have a clear bottleneck. Big companies train one big model for weeks and month.
The article initially appears to suggest that all AI in science (or at least the author’s field) is hype. But their gripe seems to be specific to an architecture named PINN that seems to be overhyped, as they mention in the end how they end up using other DL models to successfully compute PDEs faster than traditional numerical methods.
> After a few weeks of failure, I messaged a friend at a different university, who told me that he too had tried using PINNs, but hadn’t been able to get good results.
not really related to AI but this reflects a lesson I learned too late during some research in college: constant collaboration is important because it helps you avoid retreading over areas where others have already failed
Great analysis and spot on examples. Another issue with AI related research is that a lot of papers are new and not that many get published in “proper” places, yet being quoted right/left/center, just look at google scholar. It is hard to repro the results and check the validity of some statements, not to mention that research which was done 4 years ago used one set of models and now another set of models with different training data is used in tests. It is hard to establish what really affects the results and if the conclusions are applicable to some specific property of the outdated model or if it is even generalisable.
This is less an article about AI and more about, one of the less talked about functions of a PhD program is becoming literate at “reading” academic claims beyond their face value.
None of the claims made in the article are surprising because they’re the natural outgrowth of the hodgepodge of incentives we’ve accreted as what we call “science” over time and you just need to practice over time to be able to place the output of science in the proper context and understand that a “paper” is an artifact of a sociotechnical system with all the entailing complexity that demands.
Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled.
In other words scientists are trying to mislead everyone because there are a lot of incentives; money and professional status to name just two.
A common problem across all disciplines of science.
Does anybody else find it peculiar that the majority of these articles about AI say things like "of course I don't doubt that AI will lead to major discoveries", and then go on to explain how they aren't useful in any field whatsoever?
Where are the AI-driven breakthroughs? Or even the AI-driven incremental improvements? Do they exist anywhere? Or are we just using AI to remix existing general knowledge, while making no progress of any sort in any field using it?
I saw the name of the blog owner (A "Timothy B. Lee") and was surprised to see that the ~70 year old inventor of HTTP and the web had such an active and cutting-edge blog.
>>Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled.
>>We also found evidence, once again, that researchers tend not to report negative results, an effect known as reporting bias.
>>But unfortunately, the scientific literature is not a reliable source for evaluating the success of AI in science.
>> One issue is survivorship bias. Because AI research, in the words of one researcher, has “nearly complete non-publication of negative results,” we usually only see the successes of AI in science and not the failures. But without negative results, our attempts to evaluate the impacts of AI in science typically get distorted.
While these biases will absolutely create overconfidence and wasted effort, the fact that there are rapid advances with some clear successes such as protein folding, drug discovery, &weather forecasting, leads me to expect there will be more very significant advances, in no small part because of the massive investment in funds and time to the problem of making AI-based advances.
For exactly the reasons this researcher spent his time and funds to research this, despite his negative results, there was learning, and the effect of millions of people effectively searching & developing will result in more great good advances being found and/or built.
Whether they are worth the total financial & human capital being spent is another question, but I'm expecting that to be also positive
AI companies are hugely motivated to show beyond-human levels of intelligence in their models, even if it means flubbing the numbers. If they manage to capture the news cycle for a bit, it's a boost to confidence in their products and maybe their share price if they're public. The articles showing that these advances are largely junk aren't backed by corporate marketing budgets or the desires of the investor class like the original announcements were.
This article addresses the misconception that arises when someone lacks a clear understanding of the underlying mathematics of neural networks and mistakenly believes they are a magical solution capable of solving every problem. While neural networks are powerful tools, using them effectively requires knowledge and experience to determine when they are appropriate and when alternative approaches are better suited.
Could it be we are all scared, because if we call the Emperor naked, and 15 years from now someone finds a useful case for AI(even if its completely different to what exists today), everyone will point to our post and say "Hahaha look at those Luddites, didnt even believe AI was real LOL"
nice expose of human biases involved, need more of these to balance the hype.
1) Instead of identifying a problem and then trying to find a solution, we start by assuming that AI will be the solution and then looking for problems to solve.
hammer in search of a nail
2) nearly complete non-publication of negative results
survivorship (and confirmation bias)
3) same people who evaluate AI models also benefit from those evaluations
"I suspect that scientists are switching to AI less because it benefits science, and more because it benefits them."
This is a huge problem in software, and it's not restricted to AI. So much of what has been adopted over the years has everything to do with making the programmers life easier, but nothing to do with producing better software. AI is a continuation of that.
I'm probably saying something obvious here, but it seems like there's this pre-existing binary going on ("AI will drive amazing advances and change everything!" "You are wrong and a utopian / grifter!") that takes up a lot of oxygen, and it really distracts from the broader question of "given the current state of AI and its current trajectory, how can it be fruitfully used to advance research, and to what's the best way to harness it?"
This is the sort of thing I mean, I guess, by way of close parallel in a pre-AI context. For a while now, I've been doing a lot of private math research. Whether or not I've wasted my time, one thing I've found utterly invaluable has been the OEIS.org website, where you can just enter sequence of numbers and then search for it to see what contexts it shows up in. It's basically a search engine for numerical sequences. And the reason it has been invaluable is that I will often encounter some sequence of integers, I'll be exploring it, and then when I search for it on OEIS, I'll discover that that sequence shows up in much different mathematical contexts. And that will give me an opening to 1) learn some new things and recontextualize what I'm already exploring and 2) give me raw material to ask new questions. Likewise, Wolfram Mathematica has been a godsend. And it's for similar reasons - if I encounter some strange or tricky or complicated integral or infinite sum, it is frequently handy to just toss it into Mathematica, apply some combination of parameter constraints and Expands and FullSimplify's, and see if whatever it is I'm exploring connects, surprisingly, to some unexpected closed form or special function. And, once again, 1) I've learned a ton this way and gotten survey exposure to other fields of math I know much less well, and 2) it's been really helpful in iteratively helping me ask new, pointed questions. Neither OEIS nor Mathematica can just take my hard problems and solve them for me. A lot of this process has been about me identifying and evolving what sorts of problems I even find compelling in the first place. But these resources have been invaluable in helping me broaden what questions I can productively ask, and it's through something more like a high powered, extremely broad, extremely fast search. There's a way that my engagement with these tools has made me a lot smarter and a lot broader-minded, and it's changed the kinds of questions I can productively ask. To make a shaky analogy, books represent a deeply important frozen search of different fields of knowledge, and these tools represent a different style of search, reorganizing knowledge around whatever my current questions are - and acting in a very complementary fashion to books, too, as a way to direct me to books and articles once I have enough context.
Although I haven't spent nearly as much time with it, what I've just described about these other tools certainly is similar to what I've found with AI so far, only AI promises to deliver even more so. As a tool for focused search and reorganization of survey knowledge about an astonishingly broad range of knowledge, it's incredible. I guess I'm trying to name a "broad" rather than "deep" stance here, concerning the obvious benefits I'm finding with AI in the context of certain kinds of research. Or maybe I'm pushing on what I've seen called, over in the land of chess and chess AI, a centaur model - a human still driving, but deeply integrating the AI at all steps of that process.
I've spent a lot of my career as a programmer and game designer working closely with research professors in R1 university settings (in both education and computer science), and I've particularly worked in contexts that required researchers to engage in interdisciplinary work. And they're all smart people (of course), but the silofication of various academic disciplines and specialties is obviously real and pragmatically unavoidable, and it clearly casts a long shadow on what kind of research gets done. No one can know everything, and no one can really even know too much of anything out of their own specialties within their own disciplines - there's simply too much to know. There are a lot of contexts where "deep" is emphasized over "broad" for good reasons. But I think the potential for researchers to cheaply and quickly and silently ask questions outside of their own specializations, to get fast survey level understandings of domains outside of their own expertise, is potentially a huge deal for the kinds of questions they can productively ask.
But, insofar as any of this is true, it's a very different way of harnessing of AI than just taking AI and trying to see if it will produce new solutions to existing, hard, well-defined problems. But who knows, maybe I'm wrong in all of this.
Just 2 days ago, there was an HN post about an AI-aided discovery of a fast matrix multiplication algorithm ("X X^t can be faster" | 198 points, 61 comments): https://news.ycombinator.com/item?id=44006824
The author is a Princeton PhD grad working in physics. Funding for this type of work usually comes from the NSF. NSF is under attack by DOGE, and Trump has proposed slashing the NSF budget by 55%.
A reason used to justify these massive cuts is that AI will soon replace traditional research. This post demonstrates this assumption is likely false.
Are complex math problems just solvable by LLMs, as a stream of language tokens?
I mean, there ought to be an element of abstract thought, abstract reasoning, abstract inter-linking of concepts, etc, to enable mathematicians to solve complex math theorems and problems.
TLDR: AI is like any new method in software engineering. It is not a general solution, and by itself not that useful, only as an addition. Unless an expert human takes a LOT of time to fine-tune the method (ie. automatically selecting what works well in what case, using the best method in almost all cases) it only performs well in a very small subset of cases.
This is the second article in a week where someone is writing about how "AI" has failed them in their field (here it's physics, the other article was radiology), and in both articles they are using now ancient mid 2010's deep learning NNs.
I don't know if it's intentional, but the word "AI" means different things almost every year now. Its worse than papers getting released with "LLMs unable to do basic math" and then you see they used GPT-3 for the study.
AI in my plasma physics research didn’t go the way I expected
(understandingai.org)363 points by qianli_cs 20 May 2025 | 297 comments
Comments
For me the key quote in the article is
"Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled."
Understanding people's incentives is often very useful when you're looking at what they're saying.
At best, they're sortof ok for linear, small deformation problems. The kind of models where we could get an exact solution in ~5 minutes vs a fairly sloppy solution in ~30 seconds. Start throwing anything non-linear in and they just fall apart.
Maybe enough to do some very high-level concept selection but even that isn't great. I'm reasonably convinced some of them are just "curvature detectors" - make anything straight blue, anything with high curvature red, and interpolate everything else.
Anyway, winter is coming, innit?
I have not seen anything like it before. We literaly had not system or way of even doing things like code generation based on text input.
Just last week i asked for a script to do image segmentation with a basic UI and claude just generated that for me in under 1 Minute.
I could list tons of examples which are groundbreaking. The whole Image generation stack is completly new.
That blog article is fair enough, there is hype around this topic for sure, but alone for every researcher who needs to write code for their research, AI can make them already a lot more efficient.
But i do believe, that we have entered a new ara: An ara were we take data again very serious. A few years back, you said 'the internet doesn't forget' then we realized that yes the internet starts to forget. Google deleted pages, removed the cache feature and it felt like we stoped caring for data because we didn't knew what to do with it.
Then ai came along. And not only is now data king again but we are now in the mids of reinforcment ara: We now give feedback and the systems incorporate that feedback into their training/learning.
And the ai/ml topic is getting worked on on every single aspect of it: Hardware, Algorithm, use cases, data, tools, protocols, etc. We are in the middle of incorporating and building for and on it. This takes a little bit of time. Still the progress is crazy exhausting.
We will only see in a few years if there is a real ceiling. We do need more GPUs, bigger Datacenters to do a lot more experiments on AI architecture and algorithm. We have a clear bottleneck. Big companies train one big model for weeks and month.
not really related to AI but this reflects a lesson I learned too late during some research in college: constant collaboration is important because it helps you avoid retreading over areas where others have already failed
Title is:
"I got fooled by AI-for-science hype—here's what it taught me"
None of the claims made in the article are surprising because they’re the natural outgrowth of the hodgepodge of incentives we’ve accreted as what we call “science” over time and you just need to practice over time to be able to place the output of science in the proper context and understand that a “paper” is an artifact of a sociotechnical system with all the entailing complexity that demands.
This is true in so many aspects of human life - anyone trying to run an organisation should be aware of it.
Most scientists aren’t trying to mislead anyone, but because they face strong incentives to present favorable results, there’s still a risk that you’ll be misled.
In other words scientists are trying to mislead everyone because there are a lot of incentives; money and professional status to name just two.
A common problem across all disciplines of science.
Where are the AI-driven breakthroughs? Or even the AI-driven incremental improvements? Do they exist anywhere? Or are we just using AI to remix existing general knowledge, while making no progress of any sort in any field using it?
>>We also found evidence, once again, that researchers tend not to report negative results, an effect known as reporting bias.
>>But unfortunately, the scientific literature is not a reliable source for evaluating the success of AI in science.
>> One issue is survivorship bias. Because AI research, in the words of one researcher, has “nearly complete non-publication of negative results,” we usually only see the successes of AI in science and not the failures. But without negative results, our attempts to evaluate the impacts of AI in science typically get distorted.
While these biases will absolutely create overconfidence and wasted effort, the fact that there are rapid advances with some clear successes such as protein folding, drug discovery, &weather forecasting, leads me to expect there will be more very significant advances, in no small part because of the massive investment in funds and time to the problem of making AI-based advances.
For exactly the reasons this researcher spent his time and funds to research this, despite his negative results, there was learning, and the effect of millions of people effectively searching & developing will result in more great good advances being found and/or built.
Whether they are worth the total financial & human capital being spent is another question, but I'm expecting that to be also positive
Just about tech engineering here but I do think it transfers to science as well.
https://dev.to/sebs/the-quiet-crisis-how-is-ai-eroding-our-t...
Lesson learned: don't trust ads
> Most scientists aren’t trying to mislead anyone
More learning ahead, the exciting part of being a scientist!
1) Instead of identifying a problem and then trying to find a solution, we start by assuming that AI will be the solution and then looking for problems to solve.
hammer in search of a nail
2) nearly complete non-publication of negative results
survivorship (and confirmation bias)
3) same people who evaluate AI models also benefit from those evaluations
power of incentives (and conflicts therein)
4) ai bandwagon effect, and fear of missing out
social-proof
This is a huge problem in software, and it's not restricted to AI. So much of what has been adopted over the years has everything to do with making the programmers life easier, but nothing to do with producing better software. AI is a continuation of that.
This is the sort of thing I mean, I guess, by way of close parallel in a pre-AI context. For a while now, I've been doing a lot of private math research. Whether or not I've wasted my time, one thing I've found utterly invaluable has been the OEIS.org website, where you can just enter sequence of numbers and then search for it to see what contexts it shows up in. It's basically a search engine for numerical sequences. And the reason it has been invaluable is that I will often encounter some sequence of integers, I'll be exploring it, and then when I search for it on OEIS, I'll discover that that sequence shows up in much different mathematical contexts. And that will give me an opening to 1) learn some new things and recontextualize what I'm already exploring and 2) give me raw material to ask new questions. Likewise, Wolfram Mathematica has been a godsend. And it's for similar reasons - if I encounter some strange or tricky or complicated integral or infinite sum, it is frequently handy to just toss it into Mathematica, apply some combination of parameter constraints and Expands and FullSimplify's, and see if whatever it is I'm exploring connects, surprisingly, to some unexpected closed form or special function. And, once again, 1) I've learned a ton this way and gotten survey exposure to other fields of math I know much less well, and 2) it's been really helpful in iteratively helping me ask new, pointed questions. Neither OEIS nor Mathematica can just take my hard problems and solve them for me. A lot of this process has been about me identifying and evolving what sorts of problems I even find compelling in the first place. But these resources have been invaluable in helping me broaden what questions I can productively ask, and it's through something more like a high powered, extremely broad, extremely fast search. There's a way that my engagement with these tools has made me a lot smarter and a lot broader-minded, and it's changed the kinds of questions I can productively ask. To make a shaky analogy, books represent a deeply important frozen search of different fields of knowledge, and these tools represent a different style of search, reorganizing knowledge around whatever my current questions are - and acting in a very complementary fashion to books, too, as a way to direct me to books and articles once I have enough context.
Although I haven't spent nearly as much time with it, what I've just described about these other tools certainly is similar to what I've found with AI so far, only AI promises to deliver even more so. As a tool for focused search and reorganization of survey knowledge about an astonishingly broad range of knowledge, it's incredible. I guess I'm trying to name a "broad" rather than "deep" stance here, concerning the obvious benefits I'm finding with AI in the context of certain kinds of research. Or maybe I'm pushing on what I've seen called, over in the land of chess and chess AI, a centaur model - a human still driving, but deeply integrating the AI at all steps of that process.
I've spent a lot of my career as a programmer and game designer working closely with research professors in R1 university settings (in both education and computer science), and I've particularly worked in contexts that required researchers to engage in interdisciplinary work. And they're all smart people (of course), but the silofication of various academic disciplines and specialties is obviously real and pragmatically unavoidable, and it clearly casts a long shadow on what kind of research gets done. No one can know everything, and no one can really even know too much of anything out of their own specialties within their own disciplines - there's simply too much to know. There are a lot of contexts where "deep" is emphasized over "broad" for good reasons. But I think the potential for researchers to cheaply and quickly and silently ask questions outside of their own specializations, to get fast survey level understandings of domains outside of their own expertise, is potentially a huge deal for the kinds of questions they can productively ask.
But, insofar as any of this is true, it's a very different way of harnessing of AI than just taking AI and trying to see if it will produce new solutions to existing, hard, well-defined problems. But who knows, maybe I'm wrong in all of this.
A reason used to justify these massive cuts is that AI will soon replace traditional research. This post demonstrates this assumption is likely false.
I mean, there ought to be an element of abstract thought, abstract reasoning, abstract inter-linking of concepts, etc, to enable mathematicians to solve complex math theorems and problems.
What am I missing?
I don't know if it's intentional, but the word "AI" means different things almost every year now. Its worse than papers getting released with "LLMs unable to do basic math" and then you see they used GPT-3 for the study.