I suffer from a yet unknown chronic illness and as everyone in my predicament knows, I have seen a staggering amount of medical professionals who ranged from arrogant jerks who didn't listen not take you seriously to highly empathic and thorough people (who still failed to figure out what was wrong) and passing by overzealous specialists who sent me down completely wrong paths with copious amounts of anxiety. Last month I felt particularly low and desperate and decided to dump everything (medical history, doctor notes, irregular lab results, symptoms) into GPT4 and ask for help (did a bit of prompt tuning to get professional level responses).
It was mind blowing: it identified 2 possible explanation that were already on my radar, 3 more that I had never considered of which one seems very likely and I am currently getting tested for, explained how each of those correlated with my symptoms and medical history, and asked why I had not had a specific marker tested (HLA b27) that is commonly checked for this type of disease (and indeed, my doctor was equally stumped - he just thought that test had been done already and didn't double-check).
Bonus: I asked if the specific marker could be inferred from whole genome sequencing data (had my genome sequenced last year). He told me which tool I could use, helped me align my sequencing data to the correct reference genome expected by that tool, gave me step by step instructions on how to prepare the data and tool, and I'm now waiting for results of the last step (NGS data analysis is veery slow).
One of the defining characteristics of LLMs is that their knowledge bases are shockingly wide. Anyone who has played with GPT (or similar) has experienced that the width of its knowledge is beyond most if not all humans.
Medicine, and in particular diagnosis, is particularly difficult due to the width of knowledge required due to the span of possible diseases.
It completely makes sense that GPT, or similar, would simply be better than doctors at diagnosis in time, and it is very plausible that the time is now.
This is fantastic news for humanity. Humanity gets better diagnosis and we don't put high IQ people thought a grinder that is medical school and residency to do a job which is not suited to human cognition.
This is not yet a victory, but progress is already good. To really trust AI in medicine, we need to conduct an incredible amount of research and tests. I don’t think that today you can blindly believe its diagnoses.
Sure. Went like this (I have seen so many doctors that I already wrote down my complete history in french to save time at new visits. Didn't bother translating the history, didn't seem to phase ChatGPT):
"""
Here is the patient history (in french):
• 2000-2010:
◦ ...
• 2013: ...
...
Additional Notes:
- <Some additional observations and comments - patterns I noticed, family history, etc.>
Patient is particularly worried about xxx and yyy. What are some possible causes explaining these symptoms and the overall history? Give detailed reasoning to support your hypotheses, include differential diagnosis, think about rare diseases (common causes have already been considered), consider possible combinations of diseases, and don't hesitate to ask follow-up questions to improve diagnosis. Please answer in English.
Additional test results that are outside of normal ranges:
- <list of abnormal results>
Try to consider and explain as many of the blood tests in your work up, and ask for any missing information if necessary.
"""
The key was to address it as if I was a doctor asking for an opinion. If I asked it "as a patient", I got much lower quality and dumbed down answers.
> arrogant jerks who didn't listen [and would] not take you seriously
Pride goeth before the fall. I wonder how many arrogant jerks will be humbled to see that they too are now inferior to a computer ("soon"). Humans will always be better at being human though, perhaps they will learn empathy is more important than they thought.
Humans will be better at being humans, for sure, but the jury is still out as to if humans prefer to interact with other humans given a sufficiently reliable alternative. Hell is other people, after all.
> Humans will always be better at being human though
What do you mean by that? If you mean humans will be always the best being the creature we call human then that goes by definition.
If you mean humans will be always more compassionate/emotionaly understanding/better suited to deliver bad news then I am afraid that is unsubstantiated.
I meant that a human is valuable just because they are a human; this is not cold truth, but it is a moral value almost everyone shares (and I don't want to imagine a future without this value). In days past some have become arrogant because, let's say, they know more about health and medicine than everyone else; it's their source of self-worth. They may soon have to reassess their value and values.
What you are calling empathy is just patterns of language statistically optimized to be convincing to the average person. And I might sound arrogant when I despair at being surrounded by morons who "buy it", but that is IMO still better than being a sociopath who enjoys it when others are easy to manipulate with pretty words.
I personally feel you're being overly optimistic. Genes, markers and all of that might seem like high tech medicine, but to the extent that I know there has not been much progress on that front, although it's been hyped a lot, and gets a lot of media coverage.
I don't think it has to be about being "high tech medicine".
In this case, the existing documentation of such things combined with the events in the GP's own medical history have been fed into a machine that can identify patterns that a human doctor should have, but for whatever reason has not, identified.
I think the potential ramifications for this are huge.
5 to 10 percent in the general population. If you already have many of the symptoms, but those symptoms are not specific enough to distinguish between similar autoimmune diseases, then the prior is widely different so a test like this becomes much more relevant.
When you say “He told me which tool I could use”, did you mean the doctor told you, or was that a typo and you meant that ChatGPT told you which tool and walked you through it? Seemed like the latter, but was too ambiguous to assume.
Yes, it. What it said. In my defense, my native language has no neutral nouns, and neural networks and language models are both masculine - so they're a "he".
Brainstorming rare diseases and making diagnosis and providing treatment using medical science are different things.
If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.
Proof based higher math and being good at calculating the answers to arithmetical formulas are two pretty unrelated things that just happen to both be called "math".
One of my better math professors in a very good pure math undergraduate program added 7 + 9 and got 15 during a lecture, that really doesn't say anything about his ability as a mathematician though.
That’s sorta my point: diagnosing well studied diseases and providing precise treatment is different from speculating causes for rare diseases.
Who knows, OP could be a paint sniffer and that’s their root issue. Brainstorming these things requires creativity and even hallucination. But that’s not what doctors do.
I thought all math was similar due to the ability to work with it requiring decent working memory. Both mental math and conceptually complex items from theory require excellent working memory, which is a function of IQ
Does it though? When allowing LLMs to use their outputs as a form of state they can very much succeed up to 14 digits with > 99.9% accuracy, and it goes up to 18 without deteriorating significantly [1].
That really isn't a good argument because you are asking it to do one-shot something that 99.999% of humans can't.
Try asking it to combine some simple formulas involving unit conversions. It does not do math. You can ask it questions that let it complete patterns more easily.
It does not have to do math in one shot, and neither can humans. The model needs only to decompose the problem to subcomponents and solve those. If it can do so recursively via the agents approach then by all means it can do it.
The cited paper covers this to some extend. Instead of asking the LLMs to do multiplication of large integers directly, they ask the LLM to break the task into 3-digit numbers, do the multiplications, add the carries, and then sum everything up. It does quite well.
When I ask a human to do 13 digit addition, 99.999% of them will do the addition in steps, and almost nobody will immediately blurt out an answer that is also correct without doing intermediate steps in their head. Addition requires carries, and we start from least to most significant and calculate with the carries. That is what 1-shot refers to.
If allow LLMs to do the same instead of producing the output in a single textual response, then they will do just fine according to the cited paper.
Average humans can do multiplication in 1 step for small numbers because they have memorized the tables. So can LLMs. Humans need multiple steps for addition, and so do LLMs.
Ok. In the context of AI, 1-shot generally means that the system was trained only on 1 example (or few examples).
Regarding of the number of steps it takes an LLM to get the right answer: isn't it more important that it gets the right answer, since LLMs are faster than humans anyway?
I am well aware what it means, and I used 1-shot for the same reason we humans say I gave it "a shot", meaning attempt.
LLMs get the right answer and do so faster than humans. The only real limitation here is the back and forth because of the chat interface and implementation. Ultimately, it all boils down to giving prompts that achieve the same thing as shown in the paper.
Furthermore, this is a weird boundary/goal-post humans get stuff wrong all the time, and we created tools to make our lives easier, if we let LLMs use tools, they do even better.
If a search engine result says water is wet, they’ll tell you about it.
If not, then we should consider all the issues around water and wetness, but note that water is a great candidate for wetting things, though it is important to remember that it has severe limitations with respect to wetting things, and, at all costs some other alternatives should be considered, including list of paragraphs about tangential buzzwords such as buckets and watering cans go here.
Why does this apply for math but not for being a doctor?? It can do basic math, but you say that of course it can't do math- math isn't language. The fact that it can do some basic diagnosis does not mean it's good at doctor things or even that its better than webmd.
Arithmetic requires a step-by-step execution of an algorithm. LLMs don't do that implicitly. What they do is vector adjacency search in absurdly high-dimensional space. This makes them good at giving you things related to what you wrote. But it's the opposite of executing arbitrary algorithms.
Or, look at it this way: the LLM doesn't have a "voice in its head" in any form other than a back-and-forth with you. If I gave you any arithmetic problem less trivial than the times table, you won't suddenly come up with the right answer - you'll do some sequence of steps in your head. If you let an LLM voice the steps, it gets better at procedural tasks too.
Despite the article, I don’t think it would be a good doctor.
I read a report of a doctor who tried it on his case files from the ER (I’m sure it was here in HN) It called some of the cases correctly, missed a few others, and would have killed one woman. I’m sure it has its place, but use a real doctor if your symptoms are in any way concerning.
> If I ask GPT4 about some arcane math concept it’ll wax lyrical about how it has connections to 20 other areas of math. But it fails at simple arithmetic.
The only reason failing at basic arithmetic indicates something when discussing a human is because you can reasonably expect any human to be first taught arithmetic in school. Otherwise, those things are hardly related. Now, LLMs don't go to school.
Most humans fail at doing simple arithmetic in their head. At the very least I'd say GPT4 is superior to 99% of people at mental math. And because it can explain its work step by step it's easy to find where the flaw in its reasoning is and fix it. GPT-4 is capable of self-correction with the right prompts in my experience.
Literally the majority of the page is basic arithmetic, mostly Bayes. Diagnosis is a process of determining (sometimes quantitative, sometimes qualitative) the relative incidences of different diseases and all the possible ways they can present. Could this be X rare virus, or is it Y common virus presenting atypically?
Given that doctors are basically inefficient data analysts focused on a single domain, I imagine GPT can replace most of the need to consult a doctor until some physical action needs to be taken. I think an AI that monitors daily vitals and symptoms and reports to you anything that seems alarming might help people live longer and more healthy.
But that marker is not really indicative both ways (you can have it without the marker and be healthy with it) - find a good rheumatologist who usually sends you to a good radiologist and have some MRTs that identify it certain and quickly.
>you can have it without the marker and be healthy with it
That is literally true of every marker in existence. It's not the most specific marker, no, but if you already have a strong prior for presence of autoimmune disease, then the presence or absence of that HLA subtype can point towards most likely root cause (autoimmune diseases are all incredibly similar in the early stages).
It was mind blowing: it identified 2 possible explanation that were already on my radar, 3 more that I had never considered of which one seems very likely and I am currently getting tested for, explained how each of those correlated with my symptoms and medical history, and asked why I had not had a specific marker tested (HLA b27) that is commonly checked for this type of disease (and indeed, my doctor was equally stumped - he just thought that test had been done already and didn't double-check).
Bonus: I asked if the specific marker could be inferred from whole genome sequencing data (had my genome sequenced last year). He told me which tool I could use, helped me align my sequencing data to the correct reference genome expected by that tool, gave me step by step instructions on how to prepare the data and tool, and I'm now waiting for results of the last step (NGS data analysis is veery slow).