Great read. One of the interesting insights from it is how difficult good application of AI is.
A lot of companies are just "deploying a chatbot" and some of the results from this study show that this doesn't work very well. My experience is similar: deploying simple chatbots to the enterprise doesn't do a lot.
For things to get better, two things are required, neither of which are easy:
- Integration into existing systems. You have to build data lakes or similar system that allow the AI to use data and information broadly across an enterprise. For example, for an AI tool to be useful in accounting, it's going to need high quality data access to the company's POs, issued invoices, receivers, GL data, vendor invoices, and so on. But many systems are old, have dodgy or nonexistent APIs, and data is held in various bureaucratic fiefdoms. This work is hard and doesn't scale that well.
- Knowledge of specific workflows. It's better when these tools are built with specific workflows in mind that are designed around specific peoples' jobs. This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities. My experience is that I sell software as "AI solutions," but often I feel a lot of the value created is because it's replacing bad processes (either terrible older software, or attempting to do collaborative work via spreadsheet), and the AI tastefully sprinkled throughout may not be the primary value driver.
Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.
Moreover, this is deceiving because for some types of work (coding, ideating around marketing copy) you really don't need that much scaffolding at all because the capabilities are latent in the AI, and layering stuff on top mostly gets in the way.
My experience is that this type of work is a narrow slice of the total amount of work to be done, though, which is why I'd agree with the overall direction this study is suggesting that creating actual measurable major economic value with AI is going to be a long-term slog, and that we'll probably gradually stop calling it AI in the process as we attenuate to it and it starts being used as a tool within software processes.
Interesting study! Far too early in the adoption lifecycle for any conclusions I think, especially given that the data is from Denmark which tends to be have a far less hype-driven business culture than the US going by my bit of experience working in both. Anecdotally, I've seen a couple of AI hiring freezes in the states (some from LLM integrations I've built) that I'm fairly sure will be reversed when management gets a more realistic sense of capabilities, and my general sense is that the Danes I've worked with would be far less likely to overestimate the value of these tools.
We seriously live in the world of Anathem now where apparently most people need a specialized expert to cut through plausible generated misinformation as a whole.
This is a second similar study I've seen today on HN that seems in part generated by AI, and fails rigorous methodology, while making conclusions that are unbased to seemingly fuel a narrative.
The study fails to account for a number of elements which nullify the conclusions as a whole.
AI Chatbot tasks by their nature are communication tasks involving a third-party (the customer). When the Chatbot fails to direct, or loops coercively, and this is a task computer's really can't do well; customers get enraged because it results in crazy-making/inducing behavior. The Chatbot in such cases imposes time-cost, with all the necessary elements suitable to call it torture. Those elements being isolation, cognitive dissonance, coercion with perceived/real loss, lack of agency. There is little if any differentiation between the tasks measured. Emotions Kill [1].
This results in outcomes where there is no change, or higher demand for workers, just to calm that person down and this is true regardless of occupation. In other words the punching bag of verbal hostility, which is the role of CSR receiving calls or communications from irrationally enraged customers after AI has had their first chance to wind them up.
It is a stochastic environment, and very few conclusions can actually be supported because they seem to follow reasoning along a null hypothesis.
The surveys use Denmark as an example (being part of the EU), but its unclear if they properly take into account company policies about not submitting certain private data for tasks to a US-based LLM given the risks related to GDPR. They say the surveys were sent to workers directly who are already employed, but it makes no measure of displaced workers, nor overall job reductions, which historically is how the changes in integration are adopted, misleading the non-domain expert reader.
The paper does not appear to be sound, and given it relies solely on a DiD approach without specifying alternatives, it may be pushing a pre-fabricated narrative that AI won't disrupt the workforce when the study doesn't actually support that in any meaningful rational way.
This isn't how you do good science. Overgeneralizing is a fallacy, and while some computation is being done to limit that it doesn't touch on what you don't know, because what you don't know hasn't been quantified (i.e. the streetlight effect)[1].
To understand this, the layman and expert alike must always pay attention to what you don't know. The video below touches on some of the issues without requiring technical expertise. [1]
[1][Talk] Survival Heuristics: My Favorite Techniques for Avoiding Intelligence Traps - SANS CTI Summit 2018
It's incredibly hard to model complex non-linear systems. So, while I applaud the researchers to provide some data points, these things provide ZERO value for current/future decision making.
Chatbots were absolute garbage before chatGPT, while post chatGPT everything changed. So, there is going to be a tipping point event on labor market effects and past single variable "data analysis" will not provide anything to predict the event or it's effects
Large language models, small labor market effects [pdf]
(bfi.uchicago.edu)140 points by luu 25 April 2025 | 31 comments
Comments
A lot of companies are just "deploying a chatbot" and some of the results from this study show that this doesn't work very well. My experience is similar: deploying simple chatbots to the enterprise doesn't do a lot.
For things to get better, two things are required, neither of which are easy:
- Integration into existing systems. You have to build data lakes or similar system that allow the AI to use data and information broadly across an enterprise. For example, for an AI tool to be useful in accounting, it's going to need high quality data access to the company's POs, issued invoices, receivers, GL data, vendor invoices, and so on. But many systems are old, have dodgy or nonexistent APIs, and data is held in various bureaucratic fiefdoms. This work is hard and doesn't scale that well.
- Knowledge of specific workflows. It's better when these tools are built with specific workflows in mind that are designed around specific peoples' jobs. This can start looking less like pure AI and more like a mix of traditional software with some AI capabilities. My experience is that I sell software as "AI solutions," but often I feel a lot of the value created is because it's replacing bad processes (either terrible older software, or attempting to do collaborative work via spreadsheet), and the AI tastefully sprinkled throughout may not be the primary value driver.
Knowledge of specific workflows also requires really good product design. High empathy, ability to understand what's not being said, ability to understand how to create an overall process value stream from many different peoples' narrower viewpoints, etc. This is also hard.
Moreover, this is deceiving because for some types of work (coding, ideating around marketing copy) you really don't need that much scaffolding at all because the capabilities are latent in the AI, and layering stuff on top mostly gets in the way.
My experience is that this type of work is a narrow slice of the total amount of work to be done, though, which is why I'd agree with the overall direction this study is suggesting that creating actual measurable major economic value with AI is going to be a long-term slog, and that we'll probably gradually stop calling it AI in the process as we attenuate to it and it starts being used as a tool within software processes.
LLMs are more tech demo than product right now, and it could take many years for their full impact to become apparent.
This is a second similar study I've seen today on HN that seems in part generated by AI, and fails rigorous methodology, while making conclusions that are unbased to seemingly fuel a narrative.
The study fails to account for a number of elements which nullify the conclusions as a whole.
AI Chatbot tasks by their nature are communication tasks involving a third-party (the customer). When the Chatbot fails to direct, or loops coercively, and this is a task computer's really can't do well; customers get enraged because it results in crazy-making/inducing behavior. The Chatbot in such cases imposes time-cost, with all the necessary elements suitable to call it torture. Those elements being isolation, cognitive dissonance, coercion with perceived/real loss, lack of agency. There is little if any differentiation between the tasks measured. Emotions Kill [1].
This results in outcomes where there is no change, or higher demand for workers, just to calm that person down and this is true regardless of occupation. In other words the punching bag of verbal hostility, which is the role of CSR receiving calls or communications from irrationally enraged customers after AI has had their first chance to wind them up.
It is a stochastic environment, and very few conclusions can actually be supported because they seem to follow reasoning along a null hypothesis.
The surveys use Denmark as an example (being part of the EU), but its unclear if they properly take into account company policies about not submitting certain private data for tasks to a US-based LLM given the risks related to GDPR. They say the surveys were sent to workers directly who are already employed, but it makes no measure of displaced workers, nor overall job reductions, which historically is how the changes in integration are adopted, misleading the non-domain expert reader.
The paper does not appear to be sound, and given it relies solely on a DiD approach without specifying alternatives, it may be pushing a pre-fabricated narrative that AI won't disrupt the workforce when the study doesn't actually support that in any meaningful rational way.
This isn't how you do good science. Overgeneralizing is a fallacy, and while some computation is being done to limit that it doesn't touch on what you don't know, because what you don't know hasn't been quantified (i.e. the streetlight effect)[1].
To understand this, the layman and expert alike must always pay attention to what you don't know. The video below touches on some of the issues without requiring technical expertise. [1]
[1][Talk] Survival Heuristics: My Favorite Techniques for Avoiding Intelligence Traps - SANS CTI Summit 2018
https://www.youtube.com/watch?v=kNv2PlqmsAc
Chatbots were absolute garbage before chatGPT, while post chatGPT everything changed. So, there is going to be a tipping point event on labor market effects and past single variable "data analysis" will not provide anything to predict the event or it's effects