Hard disagree. The difference in performance is not something you'll notice if you actually use these cards. In AI benchmarks, the RTX 3090 beats the RTX 4080 SUPER, despite the latter having native BF16 support. 736GiB/s (4080) memory bandwidth vs 936 GiB/s (3090) plays a major role. Additionally, the 3090 is not only the last NVIDIA consumer card to support SLI.
It's also unbeatable in price to performance as the next best 24GiB card would be the 4090 which, even used, is almost tripple the price these days while only offering about 25%-30% more performance in real-world AI workloads.
You can basically get an SLI-linked dual 3090 setup for less money than a single used 4090 and get about the same or even more performance and double the available VRAM.
If you run fp32 maybe but no sane person does that. The tensor performance of the 3090 is also abysmal. If you run bf16 or fp8 stay away from obsolete cards. Its barely usable for llms and borderline garbage tier on video and image gen.
> The tensor performance of the 3090 is also abysmal.
I for one compared my 50-series card's performance to my 3090 and didn't see "abysmal performance" on the older card at all. In fact, in actual real-world use (quantised models only, no one runs big fp32 models locally), the difference in performance isn't very noticeable at all. But I'm sure you'll be able to provide actual numbers (TTFT, TPS) to prove me wrong. I don't use diffusion models, so there might be a substantial difference there (I doubt it, though), but for LLMs I can tell you for a fact that you're just wrong.
To be clear, we are not discussing small toy models but to be fair I also don't use consumer cards. Benchmarks are out there (phoronix, runpod, hugginface or from Nvidias own presentation) and they say it's at least 2x on high and nearly 4x on low precision, which is comparable to the uplift I see on my 6000 cards, if you don't see the performance uplift everyone else sees there is something wrong with your setup and I don't have the time to debug it.
> To be clear, we are not discussing small toy models but to be fair I also don't use consumer cards.
> if you don't see the performance uplift everyone else sees there is something wrong with your setup and I don't have the time to debug it.
Read these two statements and think about what might be the issue. I only run what you call "toy models" (good enough for my purposes), so of course your experience is fundamentally different from mine. Spending 5 figures on hardware just to run models locally is usually a bad investment. Repurposing old hardware OTOH is just fine to play with local models and optimise them for specific applications and workflows.
I would consider this if someone was able to demonstrate a way to distinguish these phenomena from altered states of mind (i.e. hallucinations). We know and can demonstrate that the human psyche can easily be manipulated in various ways (psychological manipulation, drugs, magnetic fields, sleep depravation, stress, etc.) to cause such experiences.
Some actual evidence for for "past life regressions" and "astral projection" would be nice...
PLR is real, read the works of Michael newton and others. Over 8000 PRL from people of all kind of age and background describe the same things happening once we pass on the other side.
Definitely not hallucinations. Actually scary how people still think that instead of exploring for themselves.
Newton was a hypnotherapist. I'm sorry to say this, but hypnosis is precisely the kind of altering a person's state of mind to make it highly susceptible to both deliberate and unintentional suggestion. This has been well documented and researched for decades at this point.
The fact that to this day not a single so called "PRL" has uncovered hitherto unknown, yet verifiable information (e.g. archaeological sites like sunken cities or translations of ancient scripts) points to suggestion (even if unintentional) rather than paranormal phenomena.
Time and space aren't well defined, but current models indeed put a discrete limit on both: Planck-Length and Planck-Time (~1.9×10^−43s and ~5.7×10^−35m respectively).
Below these limits, physical descriptions of the world lose meaning, i.e. shorter time spans or distances don't result in measurable changes and our models break down. That doesn't mean these limits are "real" in the sense that space and time are indeed quantised, but experiments and observations end at these limits.
> Starlink is already a small data center! It has power, radiators, and compute!
It is not. This is like saying your phone is already a small data centre. While technically true, we're not talking about the same scale here. StarLink's compute power is a tiny fraction of a modern data centre GPU/TPU. Most of the power budget goes into communication (i.e. its purpose!).
There is currently no support for:
Paid tiers with guaranteed quotas and rate limits
Bring-your-own-key or bring-your-own-endpoint for additional rate limits
Organizational tiers (self-serve or via contract)
So basically just another case of vendor lock-in. No matter whether the IDE is any good - this kills it for me.
The problem is that we're reverting back to the stone age by throwing unnecessary resources at problems that have a simple and effective solution: open, standardised, and accessible APIs.
We wouldn't need to use an expensive (compute-wise) AI agent to do things like making appointments. Especially if in the end you'd end up with bots talking to bots anyway. The digital equivalent of always up-to-date yellow pages would solve many of these issues. Super simple and "dumb" but reliable programs could perform such tasks.
Scheduling multiple calendars doesn't require "AI" - it's a comparatively simple optimisation problem that can be solved using computationally cheap existing algorithms. It seems more and more to me that AI - and LLMs in particular - are the hammer and now literally everything looks like a nail...
> With many of the good usecases of AI the end user doesn't know that ai exists and so it doesn't feel like there is AI present.
This! The best technology is the one that you don't notice and that doesn't get in the way. A prominent example is the failure of the first generation of smart phones: they only took off once someone (Apple) managed to the hide OS and its details properly from the user. We need the same for AI - chat is simply not a good interface for every use case.
It's also unbeatable in price to performance as the next best 24GiB card would be the 4090 which, even used, is almost tripple the price these days while only offering about 25%-30% more performance in real-world AI workloads.
You can basically get an SLI-linked dual 3090 setup for less money than a single used 4090 and get about the same or even more performance and double the available VRAM.