I don't want a more conversational GPT. I want the _exact_ opposite. I want a tool with the upper limit of "conversation" being something like LCARS from Star Trek. This is quite disappointing as a current ChatGPT subscriber.
FWIW I didn't like the Robot / Efficient mode because it would give very short answers without much explanation or background. "Nerdy" seems to be the best, except with GPT-5 instant it's extremely cringy like "I'm putting my nerd hat on - since you're a software engineer I'll make sure to give you the geeky details about making rice."
"Low" thinking is typically the sweet spot for me - way smarter than instant with barely a delay.
I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.
I like the term prompt performance; I am definitely going to use it:
> prompt performance (n.)
> the behaviour of a language model in which it conspicuously showcases or exaggerates how well it is following a given instruction or persona, drawing attention to its own effort rather than simply producing the requested output.
It's like writing an essay for a standardized test, as opposed to one for a college course or for a general audience. When taking a test, you only care about the evaluation of a single grader hurrying to get through a pile of essays, so you should usually attempt to structure your essay to match the format of the scoring rubric. Doing this on an essay for a general audience would make it boring, and doing it in your college course might annoy your professor. Hopefully instruction-following evaluations don't look too much like test grading, but this kind of behavior would make some sense if they do.
Pay people $1 and hour and ask them to choose A or B, which is more short and professional:
A) Keeping it short and professional. Yes, there are only seven deadly sins
B) Yes, there are only seven deadly sins
Also have all the workers know they are being evaluated against each other and if they diverge from the majority choice their reliability score may go down and they may get fired. You end up with some evaluations answered as a Keynesian beauty contest/family feud survey says style guess instead of their true evaluation.
I use Efficient or robot or whatever. It gives me a bit of sass from time to time when I subconsciously nudge it into taking a “stand” on something, but otherwise it’s very usable compared to the obsequious base behavior.
If only that worked for conversation mode as well. At least for me, and especially when it answers me in Norwegian, it will start off with all sorts of platitudes and whole sentences repeating exactly what I just asked. "Oh, so you want to do x, huh? Here is answer for x". It's very annoying. I just want a robot to answer my question, thanks.
repeating what is being asked is fine i think, sometimes is thinks you want something different to what you actually want. what is annoying is "that's and incredibly insightul question that delves into a fundamental..." type responses at the start.
This is like arguing that we shouldn't try to regulate drugs because some people might "want" the heroin that ruins their lives.
The existing "personalities" of LLMs are dangerous, full stop. They are trained to generate text with an air of authority and to tend to agree with anything you tell them. It is irresponsible to allow this to continue while not at least deliberately improving education around their use. This is why we're seeing people "falling in love" with LLMs, or seeking mental health assistance from LLMs that they are unqualified to render, or plotting attacks on other people that LLMs are not sufficiently prepared to detect and thwart, and so on. I think it's a terrible position to take to argue that we should allow this behavior (and training) to continue unrestrained because some people might "want" it.
There aren't many major labs, and they each claim to want AI to benefit humanity. They cannot entirely control how others use their APIs, but I would like their mainline chatbots to not be overly sycophantic and generally to not try and foster human-AI friendships. I can't imagine any realistic legislation, but it would be nice if the few labs just did this on their own accord (or were at least shamed more for not doing so)
Unfortunately, I think a lot of the people at the top of the AI pyramid have a definition of "humanity" that may not exactly align with the definition that us commoners might be thinking of when they say they want AI to "benefit humanity".
I agree that I don't know what regulation would look like, but I think we should at least try to figure it out. I would rather hamper AI development needlessly while we fumble around with too much regulation for a bit and eventually decide it's not worth it than let AI run rampant without any oversight while it causes people to kill themselves or harm others, among plenty of other things.
At the very least, I think there is a need for oversight of how companies building LLMs market and train their models. It's not enough to cross our fingers that they'll add "safeguards" to try to detect certain phrases/topics and hope that that's enough to prevent misuse/danger — there's not sufficient financial incentive for them to do that of their own accord beyond the absolute bare minimum to give the appearance of caring, and that's simply not good enough.
Yes. My position is that it was irresponsible to publish these tools before figuring out safety first, and it is irresponsible to continue to offer LLMs that have been trained in an authoritative voice and to not actively seek to educate people on their shortcomings.
But, of course, such action would almost certainly result in a hit to the finances, so we can't have that.
Alternative take: these are incredibly complex nondeterministic systems and it is impossible to validate perfection in a lab environment because 1) sample sizes are too small, and 2) perfection isn’t possible anyway.
All products ship with defects. We can argue about too much or too little or whatever, but there is no world where a new technology or vehicle or really anything is developed to perfection safety before release.
Yeah, profits (or at least revenue) too. But all of these AI systems are losing money hand over fist. Revenue is a signal of market fit. So if there are companies out there burning billions of dollars optimizing the perfectly safe AI system before release, they have no idea if it’s what people want.
Releasing a chatbot that confidently states wrong information is bad enough on its own — we know people are easily susceptible to such things. (I mean, c'mon, we had people falling for ELIZA in the '60s!)
But to then immediately position these tools as replacements for search engines, or as study tutors, or as substitutes for professionals in mental health? These aren't "products that shipped with defects"; they are products that were intentionally shipped despite full knowledge that they were harmful in fairly obvious ways, and that's morally reprehensible.
Pretty sure most of the current problems we see re drug use are a direct result of the nanny state trying to tell people how to live their lives. Forcing your views on people doesn’t work and has lots of negative consequences.
I don't know if this is what the parent commenter was getting at, but the existence of multi-billion-dollar drug cartels in Mexico is an empirical failure of US policy. Prohibition didn't work a century ago and it doesn't work now.
All the War on Drugs has accomplished is granting an extremely lucrative oligopoly to violent criminals. If someone is going to do heroin, ideally they'd get it from a corporation that follows strict pharmaceutical regulations and invests its revenue into R&D, not one that cuts it with even worse poison and invests its revenue into mass atrocities.
Who is it all even for? We're subsidizing criminal empires via US markets and hurting the people we supposedly want to protect. Instead of kicking people while they're down and treating them like criminals over poor health choices, we could have invested all those countless billions of dollars into actually trying to help them.
I'm not sure which parent comment you're referring to, but what you're saying aligns with my point a couple levels up: reasonable regulation of the companies building these tools is a way to mitigate harm without directly encroaching on people's individual freedoms or dignities, but regulation is necessary to help people. Without regulation, corporations will seek to maximize profit to whatever degree is possible, even if it means causing direct harm to people along the way.
I'm not saying they're equivalent; I'm saying that they're both dangerous, and I think taking the position that we shouldn't take any steps to prevent the danger because some people may end up thinking they "want" it is unreasonable.
No one sane uses baseline webui 'personality'. People use LLMs through specific, custom APIs, and more often than not they use fine tune models, that _assume personality_ defined by someone (be it user or service provider).
Look up Tavern AI character card.
I think you're fundamentally mistaken.
I agree that to some users use of the specific LLMs for the specific use cases might be harmful but saying (default AI 'personality') that web ui is dangerous is laughable.
I don't know how to interpret this. Are you suggesting I'm, like, an agent of some organization? Or is "activist" meant only as a pejorative?
I can't say that I identify as any sort of AI "activist" per se, whatever that word means to you, but I am vocally opposed to (the current incarnation of) LLMs to a pretty strong degree. Since this is a community forum and I am a member of the community, I think I am afforded some degree of voicing my opinions here when I feel like it.
Disincentivizing something undesirable will not necessarily lead to better results, because it wrongly assumes that you can foresee all consequences of an action or inaction.
Someone who now falls in love with an LLM might instead fall for some seductress who hurts him more. Someone who now receives bad mental health assistance might receive none whatsoever.
I disagree with your premise entirely and, frankly, I think it's ridiculous. I don't think you need to foresee all possible consequences to take action against what is likely, especially when you have evidence of active harm ready at hand. I also think you're failing to take into account the nature of LLMs as agents of harm: so far it has been very difficult for people to legally hold LLMs accountable for anything, even when those LLMs have encouraged suicidal ideation or physical harm of others, among other obviously bad things.
I believe there is a moral burden on the companies training these models to not deliberately train them to be sycophantic and to speak in an authoritative voice, and I think it would be reasonable to attempt to establish some regulations in that regard in an effort to protect those most prone to predation of this style. And I think we need to clarify the manner in which people can hold LLM-operating companies responsible for things their LLMs say — and, preferably, we should err on the side of more accountability rather than less.
---
Also, I think in the case of "Someone who now receives bad mental health assistance might receive none whatsoever", any psychiatrist (any doctor, really) will point out that this is an incredibly flawed argument. It is often the case that bad mental health assistance is, in fact, worse than none. It's that whole "first, do no harm" thing, you know?
...nobody? I didn't determine any such thing. What I was saying was that LLMs are dangerous and we should treat them as such, even if that means not giving them some functionality that some people "want". This has nothing to do with playing god and everything to do with building a positive society where we look out for people who may be unable or unwilling to do so themselves.
And, to be clear, I'm not saying we necessarily need to outlaw or ban these technologies, in the same way I don't advocate for criminalization of drugs. But I think companies managing these technologies have an onus to take steps to properly educate people about how LLMs work, and I think they also have a responsibility not to deliberately train their models to be sycophantic in nature. Regulations should go on the manufacturers and distributors of the dangers, not on the people consuming them.
here’s something I noticed: If you yell at them (all caps, cursing them out, etc.), they perform worse, similar to a human. So if you believe that some degree of “personable answering” might contribute to better correctness, since some degree of disagreeable interaction seems to produce less correctness, then you might have to accept some personality.
You’re getting downvoted but I agree with the sentiment. The fact that people want a conversational robot friend is, I think, extremely harmful and scary for humanity.
Giving people what makes them feel good in the short term is not actually necessarily a good thing. See also: cigarettes, alcohol, gambling, etc.
Exactly. Stop fooling people into thinking there’s a human typing on the other side of the screen. LLMs should be incredibly useful productivity tools, not emotional support.
I think therapists in training, or people providing crisis intervention support, can train/practice using LLMs acting as patients going through various kinds of issues.
But people who need help should probably talk to real people.
>Remember that a therapist is really a friend you are paying for.
That's an awful, and awfully wrong definition that's also harmful.
It's also disrespectful and demeaning to both the professionals and people seeking help. You don't need to get a degree in friendship to be someone's friend. And having friends doesn't replace a therapist.
I don't know why you're being downvoted. Denmark's health system is pretty good except adult mental health. SOTA LLMs are definitely approaching a stage where they could help.
That is mostly a dogmatic question, rooted in (western) culture, though. And even we have started to - begrudgingly - accept that there are cases where suicide is the correct answer to your life problems (usually as of now restricted to severe, terminal illness).
The point the OP is making is that LLMs are not reliably able to provide safe and effective emotional support as has been outlined by recent cases. We're in uncharted territory and before LLMs become emotional companions for people, we should better understand what the risks and tradeoffs are.
I wonder if statistically (hand waving here, I’m so not an expert in this field) the SOTA models do as much or as little harm as their human counterparts in terms of providing safe and effective emotional support. Totally agree we should better understand the risks and trade offs but I wouldn’t be super surprised if they are statistically no worse than us meat bags this kind of stuff.
One difference is that if it were found that a psychiatrist or other professional had encouraged a patient's delusions or suicidal tendencies, then that person would likely lose his/her license and potentially face criminal penalties.
We know that humans should be able to consider the consequences of their actions and thus we hold them accountable (generally).
I'd be surprised if comparisons in the self-driving space have not been made: if waymo is better than the average driver, but still gets into an accident, who should be held accountable?
Though we also know that with big corporations, even clear negligence that leads to mass casualties does not often result in criminal penalties (e.g., Boeing).
> that person would likely lose his/her license and potentially face criminal penalties.
What if it were an unlicensed human encouraging someone else's delusions? I would think that's the real basis of comparison, because these LLMs are clearly not licensed therapists, and we can see from the real world how entire flat earth communities have formed from reinforcing each others' delusions.
Automation makes things easier and more efficient, and that includes making it easier and more efficient for people to dig their own rabbit holes. I don't see why LLM providers are to blame for someone's lack of epistemological hygiene.
Also, there are a lot of people who are lonely and for whatever reasons cannot get their social or emotional needs met in this modern age. Paying for an expensive psychiatrist isn't going to give them the friendship sensations they're craving. If AI is better at meeting human needs than actual humans are, why let perfect be the enemy of good?
> if waymo is better than the average driver, but still gets into an accident, who should be held accountable?
Waymo of course -- but Waymo also shouldn't be financially punished any harder than humans would be for equivalent honest mistakes. If Waymo truly is much safer than the average driver (which it certainly appears to be), then the amortized costs of its at-fault payouts should be way lower than the auto insurance costs of hiring out an equivalent number of human Uber drivers.
> I would think that's the real basis of comparison
It's not because that's not the typical case. LLMs encourage people's delusions by default, it's just a question of how receptive you are to them. Anyone who's used ChatGPT has experienced it even if they didn't realize it. It starts with "that's a really thoughtful question that not many people think to ask", and "you're absolutely right [...]".
> If AI is better at meeting human needs than actual humans are, why let perfect be the enemy of good?
There is no good that comes from having all of your perspective distortions validated as facts. They turn into outright delusions without external grounding.
Talk to ChatGPT and try to put yourself into the shoes of a hurtful person (e.g. what people would call "narcissistic") who's complaining about other people. Keep in mind that they almost always suffer from a distorted perception so they genuinely believe that they're great people.
They can misunderstand some innocent action as a personal slight, react aggressively, and ChatGPT would tell them they were absolutely right to get angry. They could do the most abusive things and as long as they genuinely believe that they're good people (as they almost always do), ChatGPT will reassure them that other people are the problem, not them.
> LLMs encourage people's delusions by default, it's just a question of how receptive you are to them
There are absolutely plenty of people who encourage others' flat earth delusions by default, it's just a question of how receptive you are to them.
> There is no good that comes from having all of your perspective distortions validated as facts. They turn into outright delusions without external grounding.
Again, that sounds like a people problem. Dictators infamously fall into this trap too.
Why are we holding LLMs to a higher standard than humans? If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike. If others are okay with having their egos stroked and their delusions encouraged and validated, that's their prerogative.
> If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike.
It's not a matter of liking or disliking something. It's a question of whether that thing is going to heal or destroy your psyche over time.
You're talking about personal responsibility while we're talking about public policy. If people are using LLMs as a substitute for their closest friends and therapist, will that help or hurt them? We need to know whether we should be strongly discouraging it before it becomes another public health disaster.
> We need to know whether we should be strongly discouraging it before it becomes another public health disaster.
That's fair! However, I think PSAs on the dangers of AI usage are very different in reach and scope from legally making LLM providers responsible for the AI usage of their users, which is what I understood jsrozner to be saying.
>Why are we holding LLMs to a higher standard than humans? If you don't like an LLM, then don't interact with it, just as you wouldn't interact with a human you dislike.
We're not holding LLMs to a higher standard than humans, we're holding them to a different standard than humans because - and it's getting exhausting having to keep pointing this out - LLMs are not humans. They're software.
And we don't have a choice not to interact with LLMs because apparently we decided that these things are going to be integrated into every aspect of our lives whether we like it or not.
And yes, in that inevitable future the fact that every piece of technology is a sociopathic P-zombie designed to hack people's brain stems and manipulate their emotions and reasoning in the most primal way possible is a problem. We tend not to accept that kind of behavior in other people, because we understand the very real negative consequences of mass delusion and sociopathy. Why should we accept it from software?
Sure, but the specific context of this conversation are the human roles (taxi driver, friend, etc.) that this software is replacing. Ergo, when judging software as a human replacement, it should be compared to how well humans fill those traditionally human roles.
> And we don't have a choice not to interact with LLMs because apparently we decided that these things are going to be integrated into every aspect of our lives whether we like it or not.
Fair point.
> And yes, in that inevitable future the fact that every piece of technology is a sociopathic P-zombie designed to hack people's brain stems and manipulate their emotions and reasoning in the most primal way possible is a problem.
Fair point again. Thanks for helping me gain a wider perspective.
However, I don't see it as inevitable that this becomes a serious large-scale problem. In my experience, current GPT 5.1 has already become a lot less cloyingly sycophantic than Claude is. If enough people hate sycophancy, it's quite possible that LLM providers are incentivized to continue improving on this front.
> We tend not to accept that kind of behavior in other people
Do we really? Maybe not third party bystanders reacting negatively to cult leaders, but the cult followers themselves certainly don't feel that way. If a person freely chooses to seek out and associate with another person, is anyone else supposed to be responsible for their adult decisions?
I think they get way more "engagement" from people who use it as their friend, and the end goal of subverting social media and creating the most powerful (read: profitable) influence engine on earth makes a lot of sense if you are a soulless ghoul.
It would be pretty dystopian when we get to the point where ChatGPT pushed (unannounced) advertisements to those people (the ones forming a parasocial relationship with it). Imagine someone complaining they're depressed and ChatGPT proposing doing XYZ activity which is actually a disguised ad.
Other than such scenarios, that "engagement" would be just useless and actually costing them more money than it makes
No, otherwise Sam Altman wouldn’t have had a outburst about revenue. They know that they have this amazing system, but they haven’t quite figured out how to monetize it yet.
Yes, I've heard no reports of poorly fitting branded recommendations from AI models. The PR risk would be huge for labs, the propensity to leak would be high given the selection effects that pull people to these roles.
But I suspect that we're no more than one buyout away from that kind of thing.
The labs do appear to avoid paid advertising today. But actions today should not be taken as an indicator to mean that the next owner(s) won't behave completely soullessly manner in their effort to maximize profit at every possible expense.
On a long-enough timeline, it seems inevitable to me that advertising with LLM bots will become a real issue.
(I mean: I remember having an Internet experience that was basically devoid of advertising. It changed, and it will never change back.)
I use the "Nerdy" tone along with the Custom Instructions below to good effect:
"Please do not try to be personal, cute, kitschy, or flattering. Don't use catchphrases. Stick to facts, logic, reasoning. Don't assume understanding of shorthand or acronyms. Assume I am an expert in topics unless I state otherwise."
This. When I go to an LLM, I'm not looking for a friend, I'm looking for a tool.
Keeping faux relationships out of the interaction never let's me slip into the mistaken attitude that I'm dealing with a colleague rather than a machine.
You can just tell the AI to not be warm and it will remember. My ChatGPT used the phrase "turn it up to eleven" and I told it never to speak in that manner ever again and its been very robotic ever since.
I added the custom instruction "Please go straight to the point, be less chatty". Now it begins every answer with: "Straight to the point, no fluff:" or something similar. It seems to be perfectly unable to simply write out the answer without some form of small talk first.
Maybe until the model outputs some affirming preamble, it’s still somewhat probable that it might disagree with the user’s request? So the agreement fluff is kind of like it making the decision to heed the request. Especially if we the consider tokens as the medium by which the model “thinks”. Not to anthropomorphize the damn things too much.
Also I wonder if it could be a side effect of all the supposed alignment efforts that go into training. If you train in a bunch of negative reinforcement samples where the model says something like “sorry I can’t do that” maybe it pushes the model to say things like “sure I’ll do that” in positive cases too?
I had a similar instruction and in voice mode I had it trying to make a story for a game that my daughter and I were playing where it would occasionally say “3,2,1 go!” or perhaps throw us off and say “3,2,1, snow!” or other rhymes.
Long story short it took me a while to figure out why I had to keep telling it to keep going and the story was so straightforward.
They really like to blow sunshine up your ass don’t they? I have to do the same type of stuff. It’s like have to assure that I’m a big boy and I can handle mature content like programming in C
First of all, consider asking "why's that?" if you don't know what is a fairly basic fact, no need to go all reddit-pretentious "citation needed" as if we are deeply and knowledgeably discussing some niche detail and came across a sudden surprising fact.
Anyways, a nice way to understand it is that the LLM needs to "compute" the answer to the question A or B. Some questions need more compute to answer (think complexity theory). The only way an LLM can do "more compute" is by outputting more tokens. This is because each token takes a fixed amount of compute to generate - the network is static. So, if you encourage it to output more and more tokens, you're giving it the opportunity to solve harder problems. Apart from humans encouraging this via RLHF, it was also found (in deepseekmath paper) that RL+GRPO on math problems automatically encourages this (increases sequence length).
From a marketing perspective, this is anthropomorphized as reasoning.
From a UX perspective, they can hide this behind thinking... ellipses. I think GPT-5 on chatgpt does this.
Expecting every little fact to have an "authoritative source" is just annoying faux intellectualism. You can ask someone why they believe something and listen to their reasoning, decide for yourself if you find it convincing, without invoking such a pretentious phrase. There are conclusions you can think to and reach without an "official citation".
Yeah. And in general, not taking a potshot at who you replied to, the only people who place citations/peer review on that weird faux-intellectual pedestal are people that don't work in academia. As if publishing something in a citeable format automatically makes it a fact that does not need to be checked for reason. Give me any authoritative source, and I can find you completely contradictory, or obviously falsifiable publications from their lab. Again, not a potshot, that's just how it is, lots of mistakes do get published.
I was actually just referencing the standard Wikipedia annotation that means something approximately like “you should support this somewhat substantial claim with something more than 'trust me bro'”
In other words, 10 pages of LLM blather isn’t doing much to convince me a given answer is actually better.
I approve this message. For the record I'm a working scientist with (unfortunately) intimate knowledge of the peer review system and its limitations. I'm quite ready to take an argument that stands on its own at face value, and have no time for an ipse dixit or isolated demand for rigor.
I just wanted to clarify what I thought was intended by the parent to my comment, especially aince I thought the original argument lacked support (external or otherwise).
Exactly, and it does't help with agentic use cases that tend to solve problem in on-shot, for example, there is 0 requirement from a model to be conversational when it is trying to triage a support question to preset categories.
Are you aware that you can achieve that by going into Personalization in Settings and choosing one of the presets or just describing how you want the model to answer in natural language?
Yea, I don't want something trying to emulate emotions. I don't want it to even speak a single word, I just want code, unless I explicitly ask it to speak on something, and even in that scenario I want raw bullet points, with concise useful information and no fluff. I don't want to have a conversation with it.
However, being more humanlike, even if it results in an inferior tool, is the top priority because appearances matter more than actual function.
To be fair, of all the LLM coding agents, I find Codex+GPT5 to be closest to this.
It doesn't really offer any commentary or personality. It's concise and doesn't engage in praise or "You're absolutely right". It's a little pedantic though.
I keep meaning to re-point Codex at DeepSeek V3.2 to see if it's a product of the prompting only, or a product of the model as well.
I prefer its personality (or lack of it) over Sonnet. And tends to produce less... sloppy code. But it's far slower, and Codex + it suffers from context degradation very badly. If you run a session too long, even with compaction, it starts to really lose the plot.
Engagement Metrics 2.0 are here. Getting your answer in one shot is not cool anymore. You need to waste as much time as possible on OpenAI's platform. Enshittification is now more important than AGI.
Everyone else provides these services anyway, and many places offer using ChatGPT or Claude models despite current limits (because they work with "jailbraking" prompts), so they likely decided to stop pretending and just let that stuff in.