I looked into this. If I am remembering correctly the price was higher. It is just easier to connect a mini PC to an hdmi port and bypass all of the built in TV functionality.
There's historical speculation that a smart TV could connect to an open wireless access point, or more realistically, that it refuses to operate without internet access, perhaps after a certain number of power on hours.
It’s not a process monitor, really, but to me the AWS Lightsail monitor tab feels like this. The “sustainable” line hits me right in the OCD to keep me grinding on cpu usage of the workload to keep extra spend at zero.
I have this as well, but run a heavily locked down and isolated BIND server with NSD and Unbound for external authoritative and internal caching DNS respectively.
Its easy to feed an RBL to unbound to do pi-hole type work, I use pf to transparently redirect all external DNS requests to my local unbound server but I get the bind automation around things like DNSSEC, DHCP ddns and ACME cert renewals.
Its also easy to do 120b on CPU if you have the resources. I had 120b running on my home LLM CPU inference box in just as long as it took to download the GGUFs, git pull and rebuild llama-server.
I had it running at 40t/s with zero effort and 50t/s with a brief tweaking.
Its just too bad that even the 120b isn't really worth running compared to the other models that are out there.
It really is amazing what ggerganov and the llama.cpp team have done to democratize LLMs for individuals that can't afford a massive GPU farm worth more than the average annual salary.
2xEPYC Genoa w/768GB of DDR5-4800 and an A5000 24GB card.
I built it in January 2024 for about $6k and have thoroughly enjoyed running every new model as it gets released. Some of the best money I’ve ever spent.
I've seen some mentions of pure-cpu setups being successful for large models using old epyc/xeon workstations off ebay with 40+ cpus. Interesting approach!
Wow that's not bad. It's strange, for me it is much much slower on a Radeon Pro VII (also 16GB, with a memory bandwidth of 1TB/s!) and a Ryzen 5 5600 with also 64GB. It's basically unworkably slow. Also, I only get 100% CPU when I check ollama ps, the GPU is not being used at all :( It's also counterproductive because the model is just too large for 64GB.
I wonder what makes it work so well on yours! My CPU isn't much slower and my GPU probably faster.
AMD basically decided they wanted to focus on HPC and data center customers rather than consumers, and so GPGPU driver support for consumer cards has been
non-existing or terrible[1].
The Radeon VII Pro is not a consumer card though and works well with ROCm. It even has datacenter "grade" HBM2 memory that most Nvidias don't have. The continuing support has been dropped but ROCm of course still works fine. It's nearly as fast in Ollama as my 4090 (which I don't use for AI regularly but I just play with it sometimes)
I generally download the safetensors and make my own GGUFs, usually at Q8_0.
Is there any measurable benefit to your dynamic quants at that quant level?
I looked at your dynamic quant 2.0 page, but all the charts and graphs appear to cut off at Q4.
There definitely is a benefit for dynamically selecting layers to be at diff bit rates - I wrote about the difference between naively quantizing and selectively quantizing: https://unsloth.ai/blog/deepseekr1-dynamic
Thanks Daniel. I know you upload them, but I was hoping for some solid numbers on your dynamic q8 vs a naive quant. There doesn't seem to be anything on either of those links to show improvement at those quant levels.
My gut feeling is that there's not enough benefit to outweigh the risk of putting a middleman in the chain of custody from the original model to my nvme.
However, I can't know for sure without more testing than I have the time or inclination for, which is why I was hoping there had been some analysis you could point me to.
Trees are pure carbon. I have heard a number of weak “yeah, but…” arguments that try to diminish the fact, but a central, common sense thesis remains.
If we are truly worried about climate change and are unable to curb our consumption, then we should plant as many trees as we can and aggressively shift as much of our long-lived infrastructure to using wood products as possible.
There are good reasons to green-up our cities, but [edit: capturing] global CO2 levels isn't one of them.
Living things typically don't store carbon long-term, unless you take extra steps like burying them in bogs. Even if we were to collectively invest in sequestration, it'd be more effective with trees that are lower-maintenance, more densely/conveniently situated, and where residents don't complain that a tree needs to be kept-longer/removed-sooner. Perhaps we'd choose something else entirely like algae.
Even if it's not typical, when circumstances are right they can store a lot of carbon in a hurry.
My garage is on the same level as my basement, so there's a 5' retaining wall on either side of it. Leaves blow around and get trapped in the corners. Once I didn't bother cleaning it up for several years and when I did I had to move several hundred pounds of new soil into my back yard because of how many leaves had decayed there. Small trees were growing in it.
Similar story with the drainage on the side of my house. Not long after I moved in a heavy rain filled my basement with water. I had to rent a machine to dig a trench on either side so that the back yard would stop becoming a pond when it rained. I'm sure this wasn't a problem in the 60's when it was built, but over time the decaying leaves from my neighbor's tree raised the ground level by something like 1.5 ft and spoiled the original slope (I eventually found the original grade, there was a whole brick patio down there).
We may have to be a bit more intentional than "plant a bunch of trees" to get this effect, but I think it's worth exploiting.
I’m a generally pro-Tree person but I do caution against this gung-ho sentiment because it tends to lead people down the path of 1) seeing a forest as just the trees and 2) seeing it as a single species of tree, because that’s how you get monocultures, and the lack of biological diversity in monocultures threatens the entire fake forest you worked so hard to plant.
So, plant trees, yeah, but smartly, in areas protected from animals initially that will eat the saplings and grow more than one kind and introducing other vegetation over time. All of the extra complexity will slow the work down and get people questioning you about why it’s taking so long to get a forest, but at least you’ll get something resembling a forest that will be able to sustain itself without human intervention long after we’re dead.
Are you sure? There are currently 3 trillion trees on earth and they only absorb about 20% of greenhouse gas emissions (~9.5 GT of CO2) per year [1]. Apparently not all trees absorb the same amount of CO2 as in your assumption. Adding 1 more trillion trees would have a negligible effect.
It's not just the absorption as any stroll anywhere near a forest should tell you. They somehow cool areas dramatically, not just through shade, and change local systems substantially. Anyhow, if you want papers, there was one just recently discussed here. [1]
It's expected that planting a trillion trees (amounting to global land coverage of ~8%) which is analogous to pre-industrial times, would reduce overall heating by some 25% (!!) by itself. This also opens the door to yet another not poorly understand feedback system - CO2 increases greenery which increases trees which decreases temperatures far more than previously expected.
I'm totally on board with planting trees, but as a climate solution, the accounting doesn't make sense. We're burning a hundred million barrels of oil a day or so. If you tried to compensate with forests, you'll quickly start to wonder where you're going to fit them all, and where the water is coming from.
It's almost always going to be vastly easier to reduce emissions than to try to re-absorb it.
In my opinion GPT-SoVITS is the best if you can put in the effort. I'm still using v2 since the output is so good.
Its also the best multilingual one in my testing on Japanese inputs.