I was having a look at the model mentioned, specifcially `casperhansen/llama-3-70b-instruct-awq`.
When checking this model, I found out [1] it's based on llama-2 ?
```
Expand
Llama 3 70B Instruct AWQ Parameters and Internals
LLM Name Llama 3 70B Instruct AWQ
Repository Open on
Base Model(s) Llama 2 70B Instruct quantumaikr/llama-2-70B-instruct
Model Size 70b
```
I added a question [2] on Hugging Face to learn more about this.
Anyone could explain to me what this means? Does it mean that it has been trained on the version 2 and wrongly named version 3? Or is it something that is not well intended?
“Excellent query good sir! <said slowly enough to let the LLM catch up>…”
And more seriously, it seems like the LLM could be used to precreate lots of filler prefixes that correspond to the rag’d document that are being sent to the model.
While it wouldn’t work if you’re GPU’d bound, multiple prompts could be run in parallel with different pieces of context and then have the model chose the most appropriate response (which could be done in parallel too).
If there are many common services for which you can precompute the embeddings then with a little record keeping and analysis you could figure out some likely questions or requests and pregenerate the responses. That way you could just use similarity search on the question or command you say and skip using the LLM. It would be interesting to try using the LLM to predict some of these based on information available ahead of time like calendar events, weather, recent prompt history, recently played media, today’s headlines, recent browser history, etc. It’d be your own recommendation algorithm.
that's a great idea! I've been looking into that (I'm merely logging all prompts in a JSON file for now, so that I can analyze them later).
skipping the LLM would be tough because there are so many devices in my house, not to mention it would take away from the personality of the assistant.
however, a recommendation algorithm would actually work great since i could augment the LLM prompt with it regardless of the prompt.
Llama3 is very keen to be nice. I kind of wonder if that's due to better results on the chatbot arena (probably not, just a conspiracy theory I like). But with enough context available, you can definitely tweak the response in many ways. Give an example or two, tell it to be an emotionally detached HAL, you'll get what you want.
When checking this model, I found out [1] it's based on llama-2 ?
``` Expand Llama 3 70B Instruct AWQ Parameters and Internals LLM Name Llama 3 70B Instruct AWQ Repository Open on Base Model(s) Llama 2 70B Instruct quantumaikr/llama-2-70B-instruct Model Size 70b ```
I added a question [2] on Hugging Face to learn more about this.
Anyone could explain to me what this means? Does it mean that it has been trained on the version 2 and wrongly named version 3? Or is it something that is not well intended?
[1] https://llm.extractum.io/model/casperhansen%2Fllama-3-70b-in...
[2] https://huggingface.co/casperhansen/llama-3-70b-instruct-awq...