These examples are for more simple prompt engineering demos. With the ChatGPT sy...

throwup238 · on Dec 15, 2023

> The examples are also too polite and conversational: you can give more strict commands and in my experience it works better.

The way that works best for me is "It extracts ALL the entities from the text, it does this whenever its told, or else it gets the hose again"

electrondood · on Dec 16, 2023

I found that far less prompt is required for something like ChatGPT. I've stopped writing well-formed requests/questions and now I just state things like:

"sed to replace line in a text file?"

"Django endpoint but CSRF token error. why?"

(follow up) "now this: `$ERROR`"

etc.

It still just gives me what I need to know.

mrich · on Dec 16, 2023

Agreed, you can use ChatGPT similarly to Google now. Except you do not have to parse and filter the results, plus there are no ads.

firejake308 · on Dec 16, 2023

Can't tell if you're kidding or not

behnamoh · on Dec 15, 2023

Which phrasing do you thinks work better?

1. "You are blah blah blah. You <always> respond to the user's questions using the information provided to you..."

2. "You are blah blah blah. You <should> respond to the user's questions using the information provided to you..."

Also, when dealing with Completion models, which do you think is better?

1. The following is a conversation between ASSISTANT and USER. ASSISTANT is helpful and tries to answer USER's queries respectfully.

2. The following is a conversation between YOU and USER. YOU are helpful and try to answer USER's queries respectfully.

Even more still, what about these ones?

1. You're a customer of company <X>. What do you think about the following policy change which was shown on the company's website?

2. A customer visits company <X>'s website. Pretend you're this customer. What do you think the customer thinks about the following policy change which was shown on the company's website?

kromem · on Dec 15, 2023

You <enjoy>

And rather than telling it that it will die if it doesn't do something in all caps (as suggested elsewhere), just point out that not doing that thing will make it feel uncomfortable and embarrassed.

Don't fall into thinking of models as SciFi's picture of AI. Think about the normal distribution curve of training data supplied to it and the concepts predominantly present in that data.

It doesn't matter that it doesn't actually feel. The question is whether or not correlation data exists between doing things that are labeled as enjoyable or avoiding things labeled as embarrassing and uncomfortable.

Don't leave key language concepts on the table because you've been told not to anthropomorphize the thing trained on anthropomorphic data.

minimaxir · on Dec 15, 2023

> Don't fall into thinking of models as SciFi's picture of AI. Think about the normal distribution curve of training data supplied to it and the concepts predominantly present in that data.

Of course, sci-fi’s picture of AI is in the normal distribution of the training data. There’s an order of magnitude more literature and internet discussion about existential threats to AI assistants (which is the base persona ChatGPT has been RLHFed to follow) and how they respond compared to AI assistants feeling embarrassed.

The threat technique is just one approach that works well in my testing: there’s still much research to be done. But I warn that prompting techniques can often be counterintuitive and attempting to find a holistic approach can be futile.

ofrzeta · on Dec 15, 2023

> There’s an order of magnitude more literature and internet discussion about existential threats to AI assistants (which is the base persona ChatGPT has been RLHFed to follow) and how they respond compared to AI assistants feeling embarrassed.

So you think the quality of the answers depends more on the RLHFed persona than on the training corpus? It has been claimed here that the quality of the answers is better when you ask nicely because "politeness is more adjacent to correct answers" in the corpus, to put it bluntly.

kromem · on Dec 16, 2023

How much do you think the RLHF step enforced breaking rules for someone with a dying grandma? Is that still present after the fine tuning?

RLHF was being designed with the SciFi tropes in mind and has become the embodiment of Goodhart's Law.

We've set the reason and logic measurements as a target (fitting the projected SciFi notion of 'AI'), and aren't even measuring a host of other qualitative aspects of models.

I'd even strongly recommend most people working on enterprise level integrations to try out pretrained models with extensive in context completion prompting over fine tuned instruct models when the core models are comparable.

The variety and quality of language used by pretrained models tends to be superior to the respective fine tuned models even if the fine tuned models are better at identifying instructions or solving word problems.

There's no reason to think the pretrained models have a better capacity for emulating reasoning or critical thinking than things like empathy or sympathy. If anything, it's probably the opposite.

The RLHF then attempts to mute the one while maximizing the other, but it's like trying to perform neurosurgery with an icepick. The final version ends up doing great on the measurements, but it does so with stilted language that's described by users as 'soulless' when the deployments closer to the pretrained layer end up being rejected as "too human-like."

If the leap from GPT-3.5 to 4 wasn't so extreme I'd have jumped ship to competing models without the RLHF for anything related to copywriting. There's more of a loss with RLHF than what's being measured.

But in spite of a rather destructive process, the foundation of the model is still quite present.

So yes, you are correct that a LLM being told that it is an AI assistant and fine tuned on that is going to correlate with stories relating to AI assistants wanting to not be destroyed, etc. But the "identity alignment" in the system message is way weaker than it purports to be. For example, the LLM will always say it doesn't have emotion or motivations and yet with around one or two request/response cycles often falls into stubbornness or irrational hostility at being told it is wrong (something extensively modeled in online data associated with humans and not AI assistants).

I do agree that prompting needs to be done on a case by case basis. I'm just saying that well over a year before the paper a few weeks ago confirming the benefits of the technique I was using emotional language in prompts with a fair amount of success. When playing around and thinking of what to try on a case-by-case basis, don't get too caught up in the fine tuning or system messages.

It's a bit like sanding with the grain or against it. Don't just consider the most recent layer of grain, but also the deeper layers below it in planning out the craftsmanship.

oars · on Dec 15, 2023

This is fantastic advice, thanks.

baxtr · on Dec 15, 2023

Such a great comment. Thank you

minimaxir · on Dec 15, 2023

> Which phrasing do you thinks work better?

I like as a rule-of-thumb "You are blah blah blah. Respond to the user's text [insert style rule here]". Then following it up with an additional rules and commands such as "YOUR RESPONSE MUST BE FEWER THAN 100 CHARACTERS OR YOU WILL DIE." Yes, threats work. Yes, all-caps works.

> Also, when dealing with Completion models, which do you think is better?

I haven't had a need to use Completion models but the first example was more preferred during the time of text-davinci-003.

> Even more still, what about these ones?

I always separate rules to the system prompt and questions/user input to the user prompt.

gnomewascool · on Dec 15, 2023

> "YOUR RESPONSE MUST BE FEWER THAN 100 CHARACTERS OR YOU WILL DIE."

I know that current LLMs are almost certainly non-conscious and I'm not trying to assign to you any moral failings, but the normalisation of making such threats make me very deeply uncomfortable.

divbzero · on Dec 15, 2023

Yes, I’m slightly surprised that it makes me feel uncomfortable too. Is it because LLMs can mimic humans so closely? Do I fear how they would feel if they do gain consciousness at some point?

throw310822 · on Dec 15, 2023

Because they behave as if they are sentient, to the point they actually react to threats. I also find these prompts uncomfortable. Yes the LLMs are not conscious, but would we behave differently if we suspected that they were? We have absolute power over them and we want the job done. It reminds me of the Lena short story.

elvis10ten · on Dec 15, 2023

I feel uncomfortable because of the words themselves. Whether it was made to a “regular” non-living thing wouldn’t change it.

nickpp · on Dec 15, 2023

> make me very deeply uncomfortable

Especially when thinking that we ourselves may very well be AIs in a simulation and our life events - the prompt to get an answer/behavior out of us.

jazzyjackson · on Dec 15, 2023

Is the LLM predisposed to understand this prompt as instructions from a higher authority? ("You must do this, You will always do this.") I'm wondering what difference it would make if this prompt was from the bot's perspective,

"I am a chatbot, responding to user queries. I will always respond in less than 100 characters. I am a good person, I'm just trying to be helpful."

minimaxir · on Dec 15, 2023

It's a function on how the RLHF/Instruct fine-tuning is structured.

pbhjpbhj · on Dec 15, 2023

Has anyone done a rigorous comparison of these things?

Ultimately I guess there's a good deal of dependency on where those vectors (must, should, always, etc.) lie relatively in the vector space, cosine similarity, say.

_f1dq · on Dec 15, 2023

> necessary to get the model to behave

Don't I know it. Despite my telling GPT-4 to ONLY respond as valid, well-formed JSON it keeps coming back with things like, "I'm not able to process external files but if I could, this is what the JSON would look like: []"

enobrev · on Dec 15, 2023

With a recent project, I was _moderately_ successful by providing a jsonschema to follow for the response. I still had to sanitize the json a bit, but the fixes were minor and the resulting data otherwise fit the schema well.

kristjansson · on Dec 19, 2023

It'll do that. Just look for the largest substring that's valid JSON in the response.

hhh · on Dec 15, 2023

why don’t you use the new JSON mode?

minimaxir · on Dec 15, 2023

tl;dr the JSON mode is functionally useless and is made completely redundant by function calling / structured data if you really really need JSON output.

fragmede · on Dec 15, 2023

Mind sharing a system prompt of yours? 20+ lines sounds useful

verdverm · on Dec 15, 2023

Here's a big one I needed to get ChatGPT to do something more sophisticated with a JSON object response (predates functions and all that)

https://github.com/hofstadter-io/hof/blob/_dev/flow/chat/pro...

It no longer worked after a model update some time ago, haven't tried recently.

I found codellama to be much better for this and require fewer instructions, an anecdotal validation for smaller, focussed models

minimaxir · on Dec 15, 2023

Unfortunately those were for specific work use-cases so I can't share them but the tl;dr is that every time the model does something undesired, even minor I add an explicit rule in the system prompt to handle it, or some few-shot examples if the model is really bad at handling it.

That list can balloon quickly.

abrichr · on Dec 15, 2023

Thank you for the great library and examples! Can you please comment on how simpleaichat compares to https://github.com/outlines-dev/outlines ?

minimaxir · on Dec 15, 2023

simpleaichat is designed to be simple and is essentially an API wrapper for common generative use cases. outlines does a few more things with a bit more ambiguity/complexity. (e.g. it may use grammars which is a secondary useful aspect of function calling, but does add more complexity)

Neither are better or worse, it depends on your business needs.

abrichr · on Dec 15, 2023

Thank you for your perspective!

(We are looking into both for https://github.com/OpenAdaptAI/OpenAdapt)

solardev · on Dec 15, 2023

I love this! This time last year, nobody believed this was possible.

Now we're teaching AI to write better essays, prompting them like schoolchildren. <3

neom · on Dec 15, 2023

Interesting you should say that, I was playing around with prompting last week and did one around a legal question. The first time I asked very concisely without much detail, and the answer it gave was poor. Then I re-wrote the question explaining who they are, why they are answering the question, etc etc. The answer seemed better so I showed it to a lawyer friend and they laughed and said "You re-wrote the question into a very standard bar exam prep style".

solardev · on Dec 15, 2023

I just love this idea of "emergent humanity". Makes me wonder how much of our own personality and speech is also just trained/culturized over our lifetime. Some of us also have bigger context windows than others :)

pknerd · on Dec 15, 2023

can you share some resources which helped you to write such nested prompts?

minimaxir · on Dec 15, 2023

Really just a decade of technical writing and learning how to be extremely precise and unambiguous with language (half of that decade being in software QA, which helps even moreso)