I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor

(msn.com)

Comments

chrisfosterelli 14 hours ago
Health metrics are absolutely tarnished by a lack of proper context. Unsurprisingly, it turns out that you can't reliably take a concept as broad as health and reduce it to a number. We see the same arguments over and over with body fat percentages, vo2 max estimates, BMI, lactate thresholds, resting heart rate, HRV, and more. These are all useful metrics, but it's important to consider them in the proper context that each of them deserve.

This article gave an LLM a bunch of health metrics and then asked it to reduce it to a single score, didn't tell us any of the actual metric values, and then compared that to a doctor's opinion. Why anyone would expect these to align is beyond my understanding.

The most obvious thing that jumps out to me is that I've noticed doctors generally, for better or worse, consider "health" much differently than the fitness community does. It's different toolsets and different goals. If this person's VO2 max estimate was under 30, that's objectively a poor VO2 max by most standards, and an LLM trained on the internet's entire repository of fitness discussion is likely going to give this person a bad score in terms of cardio fitness. But a doctor who sees a person come in who isn't complaining about anything in particular, moves around fine, doesn't have risk factors like age or family history, and has good metrics on a blood test is probably going to say they're in fine cardio health regardless of what their wearable says.

I'd go so far to say this is probably the case for most people. Your average person is in really poor fitness-shape but just fine health-shape.

wawayanda 14 hours ago
A year or so ago, I fed my wife's blood work results into chatgpt and it came back with a terrifying diagnosis. Even after a lot of back and forth it stuck to its guns. We went to a specialist who performed some additional tests and explained that the condition cannot be diagnosed with just the original blood work and said that she did not have the condition. The whole thing was a borderline traumatic ordeal that I'm still pretty pissed about.
cthalupa 2 hours ago
I'll preface this with I generally trust doctors. I think on the whole they are well positioned to provide massive benefit to their patients.

I will also preface this with saying I do not think any LLM is better than the average doctor and that you are far better served going to your doctor than asking ChatGPT what your health is like on any factor.

But I'll also say that the quality of doctors varies massively, and that a good amount of doctors learn what they learn in school and do not keep up with the latest advances in research, particularly those that have broad spectrums such as GPs. LLMs that search scientific literature, etc., might point you in the direction of this research that the doctors are not aware of. Or hallucinate you into having some random disease that impacts 3 out of every million people and send you down a rabbithole for months.

Unfortunately, it's difficult to resolve this without extremely good insurance or money to burn. The depth you get and the level of information that a good preventative care cardiologist has is just miles ahead of where your average family medicine practitioner is at. Statins are an excellent example - new prescriptions are for atorvastatin are still insanely high despite it being a fairly poor choice in comparison to rosuvastatin or pitavastatin for a good chunk of the people on it. They often are behind on the latest recommendations from the NLA and AHA, etc.

There's a world where LLMs or similar can empower everyday people to talk to their doctor about their options and where they stand on health, where they don't have to hope their doc is familiar with where the science has shifted over the past 5-10 years, or cough up the money for someone who specializes in it. But that's not the world of today.

In the mean time, I do think people should be comfortable being their own advocates with their doctors. I'm lucky enough that my primary care doc is open to reading the studies I send over to him on things and work with me. Or at least patient enough to humor me. But it's let me get on medications that treat my symptoms without side effects and improved my quality of life (and hopefully life/healthspan). There's also been things I've misinterpreted - I don't pick a fight with him if we come to opposite conclusions. He's shown good faith in agreeing with me where it makes sense to me, and pushed back where it hasn't, and I acknowledge he's the expert.

francisofascii 2 hours ago
> There were big swings in my resting heart rate whenever I got a new Apple Watch, suggesting the devices may not have been tracking the same way.

First of all, wrist based HR measurements are not reliable. If you feed ChatGPT a ton of HR data that is just plain wrong, expect a bad result. Everyone who wants to track HR reliably should invest in a chest strap. The VO2 Max calculation is heavily based on your pace at a given heart rate. It makes some generalizations on on your running biomechanics. For example, if your "real" lab tested VO2 max stays constant, but you improve your biomechanics / running efficiency, you can run faster at the same effort, and your Apple watch will increase your VO2 Max number.

cameldrv 9 hours ago
I dunno, if the Apple Watch said he had a vo2max of 30, that probably means he can’t run a mile in less than 12 minutes or so. He’s probably not at all healthy…
sinuhe69 8 hours ago
My general take on any AI/ML in medicine is that without a proper clinical validation, they are not worth to try. Also, AI Snake Oil is worth reading.
freedomben 15 hours ago
> Despite having access to my weight, blood pressure and cholesterol, ChatGPT based much of its negative assessment on an Apple Watch measurement known as VO2 max, the maximum amount of oxygen your body can consume during exercise. Apple says it collects an “estimate” of VO2 max, but the real thing requires a treadmill and a mask. Apple says its cardio fitness measures have been validated, but independent researchers have found those estimates can run low — by an average of 13 percent.

There's plenty of blame to go around for everyone, but at least for some of it (such as the above) I think the blame more rests on Apple for falsely representing the quality of their product (and TFA seems pretty clearly to be blasting OpenAI for this, not others like Apple).

What would you expect the behavior of the AI to be? Should it always assume bad data or potentially bad data? If so, that seems like it would defeat the point of having data at all as you could never draw any conclusions from it. Even disregarding statistical outliers, it's not at all clear what part of the data is "good" vs "unrealiable" especially when the company that collected that data claims that it's good data.

zhisme 1 hour ago
Check out iatrogenesis. There's no need to rely on apple watch data to become some drug addicted guy curing never existed diseases. That's not the metric you want to define whether you need meds and medical help at all.
hasbot 1 hour ago
Hmm, sure it's maybe wrong now, but in several years, it could be correct. So maybe I should wear a device now so when it does become correct and I'm even older, AI might be useful.

I'm definitely not going with Apple. Are there any minimally obtrusive trackers that provide downloadable data?

seemaze 14 hours ago
I can't wait until it starts recommending signing me up for an OpenAI personalized multi-vitamin® supscription
elzbardico 13 hours ago
LLMs are not a mythical universal machine learning model that you can feed any input and have it magically do the same thing a specialized ML model could do.

You can't feed an LLM years of time-series meteorological data, and expect it to work as a specialized weather model, you can't feed it years of medical time-series and expect it to work as a model specifically trained, and validated on this specific kind of data.

An LLM generates a stream of tokens. You feed it a giant set of CSVs, if it was not RL'd to do something useful with it, it will just try to make whatever sense of it and generate something that will most probably have no strong numerical relationship to your data, it will simulate an analysis, it won't do it.

You may have a giant context windows, but attention is sparse, the attention mechanism doesn't see your whole data at the same time, it can do some simple comparisons, like figuring out that if I say my current pressure is 210X180 I should call an ER immediately. But once I send it a time-series of my twice a day blood-pressure measurements for the last 10 years, it can't make any real sense of it.

Indeed, it would have been better for the author to ask the LLM to generate a python notebook to do some data analysis on it, and then run the notebook and share the result with the doctor.

alpineman 5 hours ago
My wife is a doctor and there is a general trend at the moment of everyone thinking their intelligence in one area (say programming) carries over into other areas such as medicine, particularly with new tools such as ChatGPT.

Imagine if as a dev someone came to you and told you everything that is wrong with your tech stack because they copy pasted some console errors into ChatGPT. There's a reason doctors need to spend almost a decade in training to parse this kind of info. If you do the above then please do it with respect for their profession.

spicyusername 1 hour ago
So we're feeding bad data into a system known for making answers up and expecting... what exactly, lol
dfajgljsldkjag 15 hours ago
The author is a healthy person but the computer program still gave him a failing grade of F. It is irresponsible for these companies to release broken tools that can cause so much fear in real people. They are treating serious medical advice like it is just a video game or a toy. Real users should not be the ones testing these dangerous products.
brandonb 15 hours ago
We trained a foundation model specifically for wearable data: https://www.empirical.health/blog/wearable-foundation-model-...

The basic idea was to adapt JEPA (Yann LeCun's Joint-Embedding Predictive Architecture) to multivariate time series, in order to learn a latent space of human health from purely unlabeled data. Then, we tested the model using supervised fine tuning and evaluation on on a bunch of downstream tasks, such as predicting a diagnosis of hypertension (~87% accuracy). In theory, this model could be also aligned to the latent space of an LLM--similar to how CLIP aligns a vision model to an LLM.

IMO, this shows that accuracy in consumer health will require specialized models alongside standard LLMs.

gizmodo59 7 hours ago
For every sensational article of AI was useless, there is plenty of examples where using ChatGPT to find out what else could be happening and then having a conversation with doctor has helped many that I know of anecdotally and many such reports online as well.

At the end of the day, it’s yet another tool that people can use to help their lives. They have to use their brain. The culture of seeing doctor as a god doesn’t hold up anymore. So many people have had bad experiences when the entire health care industry at least in US is primarily a business than helping society get healthy.

gizajob 6 hours ago
Hard to tell who is stupider, the writer or ChatGPT.
Aachen 5 hours ago
> I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor

... and you won't believe what happened next!

Can we do away with the clickbait from MSN? The article is about LLMs misdiagnosing cardiovascular status when given fitness tracker data

siliconc0w 12 hours ago
The problem is that false positives can be incredibly expensive in money, time, pain, and anxiety. Most people cannot afford (and healthcare system cannot handle) thousands of dollars in tests to disprove every AI hunch. And tests are rarely consequence free. This is effectively a negative externality of these AI health products and society is picking up the tab.
daft_pink 10 hours ago
the problem with ai is that it isn’t good at recognizing red flags in data. i used it to find red flags in a financial report and it finds red flags in virtually every financial report it lays eyes on.
djoldman 10 hours ago
I'm less interested in what "grade" the AI gave and much more interested in what therapy or remedy it would have suggested. That's curiously lacking here.
jdub 12 hours ago
Why do people even begin to believe that a large language model can usefully understand and interpret health data?

Sure, LLM companies and proponents bear responsibility for the positioning of LLM tools, and particularly their presentation as chat bots.

But from a systems point of view, it's hard to ignore the inequity and inconvenience of the US health system driving people to unrealistic alternatives.

(I wonder if anyone's gathering comparable stats on "Doctor LLM" interactions in different countries... there were some interesting ones that showed how "Doctor Google" was more of a problem in the US than elsewhere.)

zombot 2 hours ago
Giving your health data to an AI is sick. Unfortunately no doctor can cure you of that.
stego-tech 12 hours ago
This is not remotely surprising.

Look, AI Healthbros, I'll tell you quite clearly what I want from your statistical pattern analyzers, and you don't even have to pay me for the idea (though I wouldn't say no to a home or Enterprise IT gig at your startup):

I want an AI/ML tool to not merely analyze my medical info (ON DEVICE, no cloud sharing kthx), but also extrapolate patterns involving weather, location, screen time, and other "non-health" data.

Do I record taking tylenol when the barometric pressure drops? Start alerting me ahead of time so I can try to avoid a headache.

Does my screen time correlate to immediately decreased sleep scores? Send me a push notification or webhook I can act upon/script off of, like locking me out of my device for the night or dimming my lights.

Am I recording higher-intensity workouts in colder temperatures or inclement weather? Start tracking those metrics and maybe keep better track of balance readings during those events for improved mobility issue detection.

Got an app where I track cannabis use or alcohol consumption? Tie that to my mental health journal or biological readings to identify red flags or concerns about misuse.

Stop trying to replace people like my medical care team, and instead equip them with better insights and datasets they can more quickly act upon. "Subject has been reporting more negative moods in his mental health journal, an uptick in alcohol consumption above his baseline, and inconsistent cannabis use compared to prior patterns" equips the care team with a quick, verifiable blurb from larger datasets that can accelerate care and improve patient outcomes - without the hallucinations of generative AI.

CqtGLRGcukpy 15 hours ago
Original article can be read at https://www.washingtonpost.com/technology/2026/01/26/chatgpt....

Paywall-free version at https://archive.ph/k4Rxt

eleveriven 4 hours ago
Right now this looks less like "AI for healthcare" and more like a very polished way to scare (or falsely reassure) people
evolighting 9 hours ago
Health data, medical records, even research data, is very scarce in the public domain. This is not just due to so-called privacy concerns, but because such data could have generated “value” (and been sold at a good price) long before the emergence of large language models.
anonzzzies 14 hours ago
Apple watch told me, based on vo2 max, that i'm almost dead, all the time. I went to the doctor, did a real test and it was complete nonsense. I had the watch replaced 3 times but same results, so I returned it and will not try again. Scaring people with stuff you cannot actually shut off (at least you couldn't before) is not great.
elzbardico 13 hours ago
A simple understanding of transformers should be enough to make someone see that using an LLM to analyze multi-variate time series data is a really stupid endeavor.
creatonez 15 hours ago
ChatGPT Health is a completely wreckless and dangerous product, they should be sued into oblivion for even naming it "health".
Barathkanna 5 hours ago
TLDR: AI didn’t diagnose anything, it turned years of messy health data into clear trends. That helped the author ask better questions and have a more useful conversation with their doctor, which is the real value here.
maxdo 14 hours ago
Typical Western coverage: “How dare they call me unhealthy.” In reality, the doctor said it needs further investigation and that some data isn’t great. They didn’t say “unhealthy”; they said “needs more investigation.” What’s wrong with that? Is the real issue just a bruised Western ego?