wut. please unpack how LLMs are “problem makers” in “most” domains?

kdmccormick · on June 14, 2024

Spam

Phishing

Buggy generated code

Cheating in online courses

ChatBots that try to do too much and do it worse than real human service reps, like that one that wrongly assured a customer that their airline ticket was refundable

Deluge of low-value generated content taking attention and revenue away from high-value content creators

tnel77 · on June 14, 2024

Have you tried writing code or a paper that’s actually factual with ChatGPT? They are so obviously wrong in many cases that it’s often a hindrance rather than helpful when I’ve tried to use it.

I do love using ChatGPT for fun stuff like “write me a recipe for enchiladas that’s also a country western song.” My kids and I find it hilarious.

Tao3300 · on June 15, 2024

We had a remote workshop with GitHub for Copilot. The example was to have it create various functions for a game of Rock, Paper, Scissors. The extra exercise for afterward was to have it add the "Lizard and Spock" options. When I tried to have it do that, it spun its wheels for a little and told me the code it generated violated their responsible use guidelines or whatever.

In retrospect it probably detected it generated something it didn't have the IP rights to give me, but ever since then I've described the state of the art as "like talking through the intercom at a McDonald's drive-thru, but every now and then the attendant says 'sorry, can we start over? I got distracted thinking about killing you.'"

qup · on June 15, 2024

It sounds like you asked it to write the paper. It would probably be better if you actually used it as you said--to help you write the paper.

tnel77 · on June 15, 2024

I haven’t actually used it to write a paper, but I have family who teach and they have shared some of the examples they have had turned in.

aljgz · on June 14, 2024

I think people mix up "Improper/indecent/harmful/... uses of AI" with "Troubles made by AI". If we exclude my usage of Copilot in VsCode, my most common exposure to AI is one of my colleagues polluting every slack thread with a low effort, low quality content from ChatGPT that he's most probably not even read once.

But Copilot has revolutionized my coding. I have to code in many languages on a daily basis: Typescript, tsx, Css, Html, Dart, config files (like docker[compose], k8s, Ansible, json configs), c#, python. I'm only fluent in c# and ts. The fact that I do not need to remember the syntax for all the other is a big game changer. I was able to be immediately productive in a new language/framework after reading the documents. Previously it took some time before I ramped up, and then it would be lost after some inactivity. I'm not talking about important concepts, or CS fundamentals. I'm talking about specific ways things can be done in each language/framework. Copilot makes me 1000x more productive in this part. I'm still limited by my mental bandwidth, so I'm probably 2x more productive on an average day.

I also use ChatGPT, and run some models locally just to play with them, but all happen much less frequently than my colleague disrupting discussions with ChatGPT content.

benterix · on June 15, 2024

I felt similar at the beginning but then I realized the suggestions were suboptimal, and it happened like 50% of the time. Usually not completely wrong, just imitating something that was already written, but sometimes introducing subtle bugs. So in the end it actually made me less productive because I had to stop my flow and start analyzing if there is no catch in the suggestion. It was a bit tiring and in the end I decided it's easier for me to stay with the flow.

I'll give it a try next year, maybe it improves to the point where the number of suboptimal suggestions falls to 20% or so, it would be much easier then.

kiernanmcgowan · on June 14, 2024

Sure - I guess I should say "domain" is an incorrect word, when "use case" is a better phrasing.

LLMs have a tendency to hallucinate at a rate that makes them untrustworthy at scale w/o a human in the loop. The more open ended the prompt, the higher the hallucination rate. Here I mean minor things, like swapping a negative, that can fundamentally change a result.

Thus, any place that we trust computer to perform reliable logic, we cannot trust an LLM because it's error rate is too high.

Methods such as RAG can box in the LLM to keep them on track, but this error rate means that they can never be mission critical, a-la business logic, and keeps them to being a toy.

Where LLMs are game changers are ETL pipelines / data scrapers. I used to work at Clearbit where we built thousands of lines of code just to extract the address of a company's HQ or if a company is owed by another org. LLMs just do that... for free. With LLMs data extraction from free form text is now a solved problem, and thats god damn mindblowing for me.