In my humble opinion, it absolutely is theft that humanity has decided is okay to steal everyone's historical work in the spirit of reaching some next level, and the sad part is most if not ALL of them ARE trying their damnedest to replace their most expensive human counterparts while saying the opposite on public forums and then dunking on their counterparts doing the same thing. However, I don't think it will matter or be a thing companies will be racing each other to win here in about 5 years, when it's discovered and widely understood that AI will produce GENERIC results for everything, which I think will bring UP everyone's desire to have REAL human-made things, spawned from HUMAN creativity. I can imagine a world soon where there is a desired for human-spawned creatively and fully made human things, because THAT'S what will be rare then, and that's what will solve that GENERIC feeling that we all get when we are reading, looking at, or listening to something our subconcious is telling us isn't human.
Now, I could honestly also argue and be concerned that human creativity didn't matter about 10 years ago, because now it seems that humanity's MOST VALUABLE asset is the almighty AD. People now mostly make content JUST TO GET TO the ads, so it's already lost its soul, leaving me EVEN NOW, trying to find some TRULY REAL SOUL-MADE music/art/code/etc, which I find extraordinarily hard in today's world.
I also find it kind of funny about all of AI, and ironic that we are going to burn up our planet using the most supposedly advanced piece of technology we have created from all of this to produce MORE ADS, which you watch and see, will be the MAIN thing this is used for after it has replaced everyone it can.
If we are going to burn up the planet for power, we should at least require the use of it's results into things that help what humanity we have left, rather than figuring out how to grow forever.
.... AND BTW, this message was brought to you by Nord VPN, please like and subscribe.... Just kidding guys.
I've been thinking about how far we've come with large language models (LLMs) and the challenge of making them almost perfect. It feels a lot like trying to get a spaceship to travel at the speed of light.
We’ve made impressive progress, getting these models to be quite accurate. But pushing from 90% to 99.9999999% accuracy? That takes an insane amount of data and computing power. It's like needing exponentially more energy as you get closer to light speed.
And just like we can’t actually reach the speed of light, there might be a practical limit to how accurate LLMs can get. Language is incredibly complex and full of ambiguities. The closer we aim for perfection, the harder it becomes. Each tiny improvement requires significantly more resources, and the gains become marginal.
To get LLMs to near-perfect accuracy, we'd need an infinite amount of data and computing power, which isn't feasible. So while LLMs are amazing and have come a long way, getting them to be nearly perfect is probably impossible—like reaching the speed of light.
Regardless I hope to appreciate the progress we've made but also be realistic about the challenges ahead. What do you think? Is this a fair analogy?
"Data" isn't an inexhaustible resource, and also isn't fungible in the way energy is. Of the thousands of languages in the world, a fair chunk don't even have writing systems, and some have very few speakers left. Many are lost forever. Now ask the best llm trained on "all the data" to translate some fragment of some isolate language not in its training set and not very related to existing languages. You can't improve on that task by adding more sentences in English or by combining with learning on other modalities.
> Now ask the best LLM trained on "all the data" to translate some fragment of some isolate language not in its training set and not very related to existing languages.
If you give them the dictionary and grammar book as in-context instructions, it can do pretty well.
“Gemini v1.5 learns to translate from English to Kalamang purely in context, following a full linguistic manual at inference time. Kalamang is a language spoken by fewer than 200 speakers in western New Guinea. Gemini has never seen this language during training and is only provided with 500 pages of linguistic documentation, a dictionary, and ~400 parallel sentences in context. It basically acquires a sophisticated new skill in the neural activations, instead of gradient finetuning.”
Synthetic data might be the answer if you're fine with any data, but I haven't came across many synthetic datasets that are of high quality, and if you want high quality output from a LLM, I'm not sure Tiny Stories et al can provide that.
> Once, there was a girl who wanted to write a story. She thought and thought about what she could write about. She felt it was too boring to just write about trees and flowers. Suddenly, an idea came to her. She decided to write about her waist. She started to write about how her waist was round, and how it jiggled when she danced. Her story was so fun and exciting! She wrote about how she liked to put a belt around her waist and how it made her feel smarter. She even wrote a rhyme about her waist: "My waist is round and jiggly, And when I dance, it's so wiggly." The girl was so proud of the story she wrote. She was no longer bored - writing about her waist was much more fun!
Hardly high quality "story", and an LLM training on data like that won't have high quality output no matter how much you train it.
Edit: Another example from Tiny Stories, just because how fun they end up being:
> One day, a little boy named Jack was playing in his room. He decided to go and sit on his favourite chest. When he sat down, he noticed something unusual. The chest smelled smelly! Jack had never noticed a smelly smell before and he couldn't work out what it was. Jack's Mum heard him say 'That chest smells smelly', so she came into his room to see what was happening. When she saw the chest, she knew what was wrong. Jack's little puppy had been using the chest as a bed! His Mum scooped the naughty puppy up in her arms and took him outside. When the puppy was outside, the smelly smell went away. Jack was so relieved! He sat back down on the chest, and said 'Ahhh, much better!'
Do people really expect to be able to train on this and get high quality output? "Garbage in, garbage out", or however that goes...
>This raises the question of whether the emergence of the ability to produce coherent English text only occurs at larger scales (with hundreds of millions of parameters or more) and complex architectures (with many layers of global attention).
>In this work, we introduce TinyStories, a synthetic dataset of short stories that only contain words that a typical 3 to 4-year-olds usually understand, generated by GPT-3.5 and GPT-4. We show that TinyStories can be used to train and evaluate LMs that are much smaller than the state-of-the-art models (below 10 million total parameters), or have much simpler architectures (with only one transformer block), yet still produce fluent and consistent stories with several paragraphs that are diverse and have almost perfect grammar, and demonstrate reasoning capabilities.
The point of TinyStories isn't to serve as an example of a sophisticated model, but rather to show that the emergent ability of producing coherent language can happen at smaller scales, and from a synthetic data set, no less. TinyStories is essentially the language model equivalent of a young child, and it's producing coherent language -- it's not producing grammatically correct nonsense like the famous "colorless green ideas sleep furiously" phrase from Chomsky.
>but I haven't came across many synthetic datasets that are of high quality
I'm not really sure what your personal experience has to do with the viability of synthetic data; it's already been proven to be a useful resource. For example, Meta directly stated this upon the release of their Llama 3 model:
>We found that previous generations of Llama are good at identifying high-quality data, so we used Llama 2 to help build the text-quality classifiers that are powering Llama 3. We also leveraged synthetic data to train in areas such as coding, reasoning, and long context. For example, we used synthetic data to create longer documents to train on.
It's grammatically correct. Correct grammar despite of it being semantically nonsense, still not defined how small it can get. GPT-2's grammar was atrocious.
But it still might be worth it. A 90% accurate model will only successfully complete a task consisting of 10 subtasks 0.9^10 = 35%of the time, while a 99% will do so 90% of the time making the former useless, but the latter quite useful.
Yes, but a 90% accurate model that's 10x faster than a 99% can be run 3x to achieve higher accuracy while still outperforming the 99% model, for most things. In order for the math to be in the big model's favor there would need to be problems that it could solve >90% of the time where the smaller model was <50%.
So far experiments say yes, with an asterisk. Taking ensembles of weak models and combining them has been shown to be able to produce arbitrarily strong predictors/generators, but there are still a lot of challenges in learning how to scale the techniques to large language models. Current results have shown that an ensemble of GPT3.5 level models can reach near state of the art by combining ~6-10 shots of the prompt, but the ensemble technique used was very rudimentary and I expect that much better results could be had with tuning.
Yes and no. We don't need an insane amount of data to make these models accurate, if you have a small set of data that includes the benchmark questions they'll be "quite accurate" under examination.
The problem is not the amount of data, it's the quality of the data, full stop. Beyond that, there's something called the "No Free Lunch Theorem" that says that a fixed parameter model can't be good at everything, so trying to make a model smarter at one thing is going to make it dumber at another thing.
We'd be much better off training smaller models for specific domains and training an agent that can use tools deepmind style.
> The problem is not the amount of data, it's the quality of the data, full stop. Beyond that, there's something called the "No Free Lunch Theorem" that says that a fixed parameter model can't be good at everything, so trying to make a model smarter at one thing is going to make it dumber at another thing.
My understanding is NFL only applies if the target function is chosen from a uniform distribution of all possible functions — i.e. the "everything" that NFL says you can't predict is more like "given this sequence from a PRNG (but we're not telling you which PRNG), infer the seed and the function" and less like "learn all the things a human could learn if only they had the time".
I think you are probably right, but if humans are at 99.9% (which seems very unlikely) I don't think it will be long before you can trust a model more than a human expert.
Really though I think this line of thinking is better to revisit in 5 or so years. LLM's are still very new, and seemingly everyday new optimizations and strategies are being found. Let's at least hit a plateau before assessing limitations.
>I don't think it will be long before you can trust a model more than a human expert.
You will never be able to trust a LLM more than a human expert. Because human expert will use the best available tools (for example, LLMs), will understand "what the client wants" and will put the data in the right context. At best human expert and LLM will be indistinguishable, but I really doubt it. And I think it will take a long time.
> Because human expert will use the best available tools (for example, LLMs), will understand "what the client wants" and will put the data in the right context.
You're not wrong, but when that happens, does it still count as "a human expert" doing it? A chess grandmaster is capable of using Stockfish, but it's not their victory when they do.
There’s also a rumor that models these days employ a large “safety” parachute behind their engines all the time. Some of these get so big that models become dumber right before your eyes.
I've said this before but (as a noob) I don't think cramming all human knowledge into a model is the correct approach. It should be trained enough to understand language so that it can then go search the web or query a database for answers.
The more certain the domain, the more that is possible. If you have a document database that you trust, great. For example a support desk's knowledge base. And especially if you have an escape valve: "Did this solve your problem? If not, let's escalate this to a human."
But if you are searching the Internet, you'll find multiple answers — probably contradictory — and the next step is to ask the model to judge among them. Now you want all the intelligence you can muster. Unless you really trust the search engine, in which case yeah a small model seems great.
Do we know that reasoning ability and inbuilt knowledge are coupled? It seems to me that having the reasoning ability sufficient to judge between search engine results might want a significantly different type of training than collecting facts.
Past 99%, what does "more accurate" mean? I think it will vary from person to person and use case to use case, which is why I personally don't foresee a world where an LLM or any form of AI/ML is ever perfectly accurate.
I'm struggling to think of any medium that has ever reached 100% accuracy, so to target that for an ML algorithm seems foolhardy
I agree with this. Because it does seem that if it's based on NOT 100% accurate information in terms of training, it can never return 100% accurate results. Which I guess, as humans, we don't either, but as a committee, one MAY argue we could. I'm torn lol.
Yeah LLM's are just a nontrivial stepping stone. Humans don't need to consume the entire set of worlds knowledge, repeated from thousands of different mouths coming from different angles, to be able to learn to output human like thought processes.
At some point we'll discover a mew algorithm/architecture that can actually continuously learn from its environment with limited information and still produce amazing results like us.
Well let's not forget that the large amount of information they ingest also leads to a superhuman level of knowledge though I guess for certain kinds of agents that is not really needed anyway.
GPT4ALL-Python3-Inference is a Python CLI tool for querying various GPT local LLM models, offering detailed logging, customizable arguments, and SQLite-stored responses.
CTEE is a transparent Bash Session recorder/transcriber for Linux and MacOS systems. It allows users to record and replay CLI sessions, take notes in-terminal, and produces a pdf of stdin and stdout after an ended session. The overall project is useful for learning, teaching, troubleshooting, documenting, and sharing BASH CLI activities.
Record and replay Bash CLI sessions transparently and easily on Linux and MacOS systems. CTEE records stdin, stdout, and timestamps; cleaning and placing each within a single sqlite3 database per CTEE session to be used for sharing and/or later recall. Additionally, CTEE produces a final HTML report of the Bash session activity in a very readable way, so that activity can be validated as the user saw it, at the time of execution.
CTEE is a suite of tools designed to enhance the command-line interface on Linux and MacOS systems. It allows users to record and replay CLI sessions, take notes, manipulate timing, and perform other related tasks. The overall project is useful for learning, teaching, troubleshooting, documenting, and sharing BASH CLI activities.