Every time I see "Book me a trip to X" I immediately shut down. I have yet to see any LLM handle all the cases that human would/needs to. Those sites are hard to navigate for a human, doing a search and clicking on the first result is not "autopilot". If that's all I was going to do then I'd just do it myself.
Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]
I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).
> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.
Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.
I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.
[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.
Ace: Realtime Computer Autopilot
(generalagents.com)87 points by huerne 2 April 2025 | 18 comments
Comments
Instead I need to read through 10s+ of listings, keeping track of cleaning/other fees, weighing location and price as well amenities if any. [0]
I have yet to see a model do any of that (yes, I'm aware it's possible and maybe someone is doing that).
> We're just beginning. As we increase training resources, Ace will become more intelligent and capable.
Can we not? "As time goes on we will get better because magic" - I truly hate this hopium in the LLM community. LLM problems are not like normal software problems, you cannot code your way out of a hole. You can prompt or re-train, both suck and (training at least) has a long turn-around time and is not cheap.
I really enjoy LLMs and love trying new things that use them. This idea/product/service/whatever is just not compelling. It feels like I'd need to babysit this process to make sure it didn't do something stupid. It's the same reason I have never in my life bought something through my Amazon Echos, the upside is minimal and the downside can be massive.
[0] OpenAI's Deep Research is closer to what I'm talking about but even that is laughably bad sometimes. It looks impressive as hell, it impressed me, and then I went to ask it an "easy" question so I could share the question to a friend to show them how cool it was. The "easy" question was something I was familiar with and the final results were lacking (to be nice). I asked it to research local bakeries and it missed a ton of places that show up in 1-2 google searches. -- This is the problem with LLMs across the board, they are great at producing good sounding output but that doesn't make it right/true/complete.