On the podcast interview now Groq CEO Jonathon Ross did[1] he talked about the creation of the original TPUs (which he built at Google). Apparently originally it was a FPGA he did in his 20% time because he sat near the team who was having inference speed issues.
They got it working, then Jeff Dean did the math and the decided to do an ASIC.
Now of course Google should spin off the TPU team as a separate company. It's the only credible competition NVidia has, and the software support is second only to NVidia.
The way I see, NVidia only has a few advantages ordered from most important to least:
1. Reserved fab space.
2. Highly integrated software.
3. Hardware architecture that exists today.
4. Customer relationships.
but all of these aspects are weak in one way or another:
For #1, fab space is tight, and NVidia can strangle its consumer GPU market if it means selling more AI chips at a higher price. This advantage is gone if a competitor makes big bets years in advance, or another company that has a lot of fab space (intel?) is willing to change priorities.
2. Life is good when your proprietary software is the industry standard. Whether this actually matters will depend on the use case heavily.
3. A benefit now, but not for long. It's my estimation that the hardware design for TPUs is fundamentally much simpler than for GPUs. No need for raytracing, texture samplers, or rasterization. Mostly just needs lots of matrix multiplication and memory. Others moving into the space will be able to catch up quickly.
4. Useful to stay in the conversation, but in a field hungry for any advantage, the hardware vendor with the highest FLOPS (or equivalent) per dollar is going to win enough customers to saturate their manufacturing ability.
So overall, I give them a few years, and then the competition is going to be real quite fast.
Seems you have not worked with ML workloads, but base your comment on "internet wisdom", or worse, business analysts (I am sorry if that's inaccurate).
On GPUs, ML "just works" (inference and training) and are always order of magnitude faster than whatever CPU you have.
TPUs work very well for some model architectures (old ones that they were optimized and designed for) and on some novel others can be actually slower than a CPU (because of gathers and similar) - this was my experience working on ML stuff as an ML Researcher at Google till 2022, maybe it got better but I doubt. Older TPUs were ok only for inference of those specific models and useless for training. And anything new I tried (fundamental part of research...) - the compiler would sonetimes just break with an internal error, most of the time just produce terrible and slow code, and bugs filed against it would stay open for years.
GPU is so much more than a matrix multiplier - it's a fully general, programmable processor. With excellent compilers, but most importantly - low level access that you don't need to rely on proprietary compiler engineers (like TPU ones) and anyone can develop something like Flash Attention. And as a side note: while a Transformer might be mostly matrix multiplication, many other models are not.
If you had worked with ML, you'd know that this is not true. It's actually more like the opposite. It also has nothing to do with the chips themselves. Things don't magically work "because GPU", they work because manufacturers spend the time getting their drivers and ecosystems right. That's why for example noone is using AMD GPUs for ML, despite them offering more compute per dollar on paper. Getting the software stack to the point of Nvidia/CUDA, where things really do "just work", is an enormous undertaking. And as someone who has been researching ML for more than a decade now, I can tell you Nvidia also didn't get these things right in the beginning. That's the reason why they have no real competition today (and still won't for quite some time).
> That's why for example noone is using AMD GPUs for ML
You're right, they are behind, but to say that nobody is using it, is not truthful. AMD HPC clusters are being used [0] and [1] for AI/ML.
The larger issue is that AMD has only been building HPC clusters for the last period of time. Now, with the release of MI300x, we have Azure and Oracle coming online with them now. Disclosure, my business is also building a MI300x super computer as well, with the express goal of enabling more access to developers.
>AMD HPC clusters are being used [0] and [1] for AI/ML.
Funny how you can immediately tell when the business people made these decisions and not the tech people. This is exactly what I would have expected from an organization like the Navy. On paper it does sound great and the Navy bean counters probably loved this. But they are in for a rough awakening.
The best I can say is that my thoughts and prayers go to the ML engineers who will actually have to deal with this. Those companies literally couldn't pay me enough to put up with it. They will likely only attract people who care about the salary and the position instead of getting things done. I've seen it with other colleagues before. These numbers of yours are completely worthless without someone who is willing to put in 5 times the work for the same or worse results.
People choose jobs and tools for a variety of reasons. I don't feel the need to cast judgement on them over it.
The numbers I gave aren't worthless, nor does it take 5x the amount of work. I also don't think that going with a single source for hardware for all of AI is very smart either, especially given the fact that there are serious supply shortages from that single vendor. No fortune 100 would put all their eggs in one basket and even if it was 5x the work, it is worth it.
Hey, this is a good comment. I've only toyed with ML stuff, but I've done a lot with GPUs. I hope you can find my "step back" perspective as valuable I find your up close one.
My chief mistake in the above comment was using "TPU", as that's Google's branding. I probably should've used "AI focused co-processor". I'm not talking exclusively about Google's foray into the space, especially as I haven't used TPUs.
My list of things to ditch on GPUs doesn't include cores. My point there is that there's a bunch of components that are needed for graphics programming that are entirely pointless for AI workloads, both inside the core's ALU and as larger board components. The hardware components needed for AI seem relatively well understood at this point (though that's possible to change with some other innovation).
Put another way, my point is this: Historically, the high end GPU market was mostly limited to scientific computing, enthusiast gaming, and some varied professional workloads. Nvidia has long been king here, but with relatively little attempt by others at competition. ML was added to that list in the last decade, but with some few exceptions (Google's TPU), the people who could move into the space haven't. Then chatGPT happened, investment in AI has gone crazy, and suddenly Nvidia is one of the most valuable companies in the world.
However, The list of companies who have proven they can make all the essential components (in my list in the grandparent) isn't large, but it's also not just Nvidia. Basically every computing device with a screen has some measure of GPU components, and now everyone is paying attention to AI. So I think within a few years Nvidia's market leadership will be challenged, and they certainly won't be the only supplier of top of the line AI co-processors by the end of the decade. Whether first mover advantage will keep them in first place, time will tell.
It's been talked to death but non-CUDA implementations have their challenges regardless of use case. That's what first-mover advantage and > 15 years of investment by Nvidia in their overall ecosystem will do for you.
But support for production serving of inference workloads outside of CUDA is universally dismal. This is where I spend most of my time and compared to CUDA anything else is non-existent or a non-starter unless you're all-in on packaged API driven Google/Amazon/etc tooling utilizing their TPUs (or whatever). The most significant vendor/cloud lock-in I think I've ever seen.
Efficient and high-scale serving of inference workloads is THE thing you need to do to serve customers and actually have a chance at ever making any money. It's shocking to me that Nvidia/CUDA has a complete stranglehold on this obvious use case.
A great summary of how unserious NVIDIA's competitors are is how long it took AMD's flagship consumer/retail GPU, the 7900 XT[X], to gain ROCm support.
NVidia's biggest advantage is that AMD is unwilling to pay for top notch software engineers (and unwilling to pay the corresponding increase in hardware engineer salaries this would entail). If you check online you'll see NVidia pays both hardware and software engineers significantly more than AMD does. This is a cultural/management problem, which AMD's unlikely to overcome in the near-term future. Apple so far seems like the only other hardware company that doesn't underpay its engineers, but Apple's unlikely to release a discrete/stand-alone GPU any time soon.
Don’t underestimate CUDA as the moat. It’s been a decade of sheer dominance with multiple attempts to loosen its grip that haven’t been super fruitful.
I’ll also add that their second moat is Mellanox. They have state of the art interconnect and networking that puts them ahead of the competition that are currently focusing just on the single unit.
This moat is going to get paralleled over the next few years. First off Mellanox is unobtanium with 52+ week lead times.
GigaIO has a PCIe fabric solution that is a fraction of the cost of Mellanox and available today. This enables up to 64 GPUs to appear on a single system.
We're also seeing the ultraethernet stuff come online as well, but that'll have to wait for PCIe6.
I’ve spent the last month deep in GPU driver/compiler world and -
AMD or Apple (Metal) or someone (I haven’t tried Intel’s stuff) just needs to have a single guide to installing a driver and compiler that doesn’t segfault if you look at it wrong, and they would sweep the R&D mindshare.
It is insane how bad CUDA is; it’s even more insane how bad their competitors are.
If you work in hardware and are interested in solving this lemme say this
There are billions of dollars waiting for the first person to get this right. The only reason I haven’t jumped on this myself is a lack of familiarity with drivers.
These have always been NVIDIA's "few" advantages and yet they've still dominated for years. It's their relentless pace of innovation that is their advantage. They resemble Intel of old, and despite Intel's same "few" advantages, Intel is still dominant in the PC space (even with recent missteps).
They've dominated for years, but now all big tech companies are using their products in scale not seen before, and all have vested interest in cutting their margins by introducing some real competition.
Nvidia will do good in the future, but perhaps not good enough to justify their stock price.
Nvidia's datacenter AI chips don't have raytracing or rasterization. Heck, for all we know the new blackwell chip is almost exclusively tensor cores. They gave no numbers for regular CUDA perf.
> Now of course Google should spin off the TPU team as a separate company.
Given the size of the market and its near-monopoly situation, I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...
> I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...
Yes.
But imagine how the company would do: they have a guaranteed market at Google say for 3 years, and while yes maybe Google takes 100% of the production in the first 12 months it's not a bad position to start from.
Plus there are other products which they could ship that might not always need to be built on the latest process. I imagine there would be demand for inference only earlier generation TPUs that can run LLMs fast if the power usage is low enough.
If AMD fixes or open sources their proprietary firmware blob[0]. Geohot streamed all weekend on Twitch, reverse engineering the AMD firmware. It was quite entertaining learning about how that low level hardware firmware works[1] and his rants about AMD of course.
Geohot is wrangling with unsupported consumer hardware.
The datacenter stuff is on a different architecture and driver stack.
The number one supercomputer on the top500 list (frontier at ORNL) is based on AMD GPUs and AMD is probably more invested in supporting that.
I work with Frontier and ORNL/OLCF. They have had and continue to have issues with AMD/ROCm but yes, they do of course get excellent support from AMD. The entire team at OLCF is incredible as well (obviously) and they do amazing work.
Frontier certainly has some unique quirks but the documentation is online[0] and most of these quirks are inherent to the kinds of fundamental issues you'll see on any system in the space (SLURM, etc).
However, most of the issues are fundamentally ROCm and you'll run into them on any MIxxx anywhere. I run into them frequently with supported and unsupported consumer gear all the way up.
I mean, that's kinda nvidia's whole shtick: anyone can play around synthesizing cat pictures on their gaming GPU and if they make a breakthrough, the same software will transfer to X million dollar supercomputers.
Subscriber only videos, so nobody can confirm that he did that, nor archive whatever valuable information he released. At least not without paying some money in the next 7-14 days before they're deleted.
Geohot doesn't know what he's talking about and I'm kinda ashamed to see this lazy thinking leak onto HN. There was an article a couple weeks back on AMD open sourcing drivers in the Linux kernel tree that you should look into.
Firmware crashes => days long "open source it and I'll fix it. no? why does AMD hate its customers?"
I got an appointment and have exactly one minute till I have to leave, apologies for brevity: they can't open source the full driver because then they'd have to release HDMI spec stuff that the consortium says they can't. (I don't support any of that, my only intent is to communicate George isn't really locked in here when he starts casting aspersions or claiming AMD doesn't care)
But they're far behind in adoption in the AI space, while TPUs have both adoption (inside Google and on top) and a very strong software offering (Jax and TF)
There's also Amazon's AWS "Trainium" chips, which is what Anthropic will be using going forward.
If you're talking about training LLMs, involving 10's of thousands of processors, then the specifics of one processor vs another isn't the most important thing - it's the overall architecture and infrastructure in place to manage it.
Speaking of which, mega props to Groq, they really are awesome, so many startups launch with bullshit and promises, but Groq came to the scene with something awesome already working, which is reason enough to love them. I really respect this company and I say that extremely never-often.
I wouldn't call it awesome. It's just a big chip with lots of cache. You need hundreds of them to sufficiently load any decent model. At which point the cost has skyrocketed.
How is it that Google invented the TPU and Google Research came up with the paper on LLM and NVDA and AI startup companies have captured ~100% of the value
There's an old joke explanation about Xerox and PARC, about the difficulty of "pitching a 'paperless office' to a photocopier company".
In Google's case, an example analogy would be pitching making something like ChatGPT widely available, when that would disrupt revenue from search engine paid placements, and from ads on sites that people wouldn't need to visit. (So maybe someone says, better to phase it in subtly, as needed for competitiveness, but in non-disruptive ways.)
I doubt it's as simple as that, but would be funny if that was it.
This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.
My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.
Google gets much more scrutiny then smaller companies so it's understandable to be worried. Pretty much any small mistake of theirs turns into clickbait on here and the other tech news sites and you get hundreds of comments about how evil Big Tech is. Of course it's their own fault that their PR hews negative so frequently but still it's understandable why they were so shy.
Sydney when initially released was much less censored and the vast majority of responses online were positive, "this is hilarious/cool", not "OMG Sydney should be banned!".
It's understandable that people at Google are worried because it's likely very unpleasant to see critical articles and tweets about something you did. But that isn't really bad for Google's business in any of the ways that losing to someone on AI would be.
Google is constantly being sued for nearly everything they do. They create a Chrome Incognito mode like Firefox's private browsing mode and they get sued. They start restricting App permissions on Android, sued. Adding a feature where Google maps lets you select the location of your next appointment as a destination in a single click, sued (that's leveraging your calendar monopoly to improve your map app).
Google has it's hands in so many fields that any change they make that disrupts the status-quo brings down antitrust investigations and lawsuits.
That's the reason why Firefox and Safari dropping support for 3rd party cookies gets a yawn from regulators while Google gets pinned between the CMA wanting to slow down or stop 3rd party cookies deprecation to prevent disrupting the ads market and the ICO wanting Google to drop support yesterday.
This is not about bad press or people feeling bad about news articles. Google has been hit by billion dollar fines in the past and has become hesitant to do anything.
Where smaller companies can take the "Elon Musk" route and just pay fines and settle lawsuits as just the cost of doing business, Google has become an unwieldy juggernaut unable to move out of fear of people complaining and taking another pound of flesh. To be clear, I don't agree with a strategy of ignoring inconvenient regulations, but Google's excess of caution has severely limited their ability to innovate. But given previous judgements against Google, I can't exactly say that they're wrong to do so. Even Google can only pay so many multi-billion dollar fines before they have to close shop, and I can't exactly say the world would be better off if that happened.
That's true for google, sure. But what about individual workers and managers at google?
You can push things forward hard, battle the many stakeholders all of whom want their thing at the top of the search results page, get a load of extra headcount to make a robust and scalable user-facing system, join an on-call rota and get called at 2am, engage in a bunch of ethically questionable behaviour skirting the border between fair use and copyright infringement, hire and manage loads of data labellers in low-income countries who get paid a pittance, battle the internal doubters who think Google Assistant shows chatbots are a joke and users don't want it, and battle the internal fearmongers who think your ML system is going to call black people monkeys, and at the end of it maybe it's great or maybe it ends up an embarrassment that gets withdrawn, like Tay.
Or you can publish some academic papers. Maybe do some work improving the automatic transcription for youtube, or translation for google translate. Finish work at 3pm on a Friday, and have plenty of time to enjoy your $400k salary.
>There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.
Compared to Google, OpenAI's billions is peanuts, while costing a fortune to generate. GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)
There could be an opposite avenue: ad-free Google Premium subscription with AI chat as a crown jewel. An ultimate opportunity to diversify from ad revenue.
The low operating margin of serving a GPT-4 scale model sounds like a compelling explanation for why Google stayed out of it.
But then why did Microsoft put its money behind it? Alphabet's revenue is around $300bn, and Microsoft's is around $210bn which is lower but it is the same order of magnitude.
Monetizing a chatbot is one thing. Beating revenues every year when you are already making 300b a year is a whole different ball game
There must be tens of execs who understand this but their payout depends on keeping status quo
The answer is far weirder - they had a chat bot, and no one even discussed it in the context of search replacements. They didn’t want to release it because they just didn’t think it should be a product. Only after OpenAI actually disrupted search did they start releasing Gemini/Bard which takes advantage of search.
LaMBDA was also briefly available for public testing, but then rapidly withdrawn due to unhinged responses.
One advantage that OpenAI had over Google was having developed RLHF as a way to "align" the model's output to be more acceptable.
Part of Google's dropping the ball at that time period (but catching up now with Gemini) may also have been just not knowing what to do with it. It certainly wasn't apparent pre-ChatGPT that there'd be any huge public demand for something like this, or that people would find so many uses for it in API form, and especially so with LaMBDA's behavioral issues.
My take as someone who worked in Cloud, closely with the AI product teams on GTM strategy, is that it was primarily the former: Google was always extremely risk averse when it came to AI, to the point that until Andrew Moore was pushed out, Google Cloud didn't refer to anything as AI. It was ML-only, hence the BigQuery ML, Video Intelligence ML, NLP API, and so many other "ML" product names. There was strong sentiment internally that the technology wasn't mature enough to legitimately call it "AI", and that any models adequately complex to be non-trivially explainable were a no-go. Part of this was just general conservatism around product launches within Google, but it was significantly driven by EU regulation, too. Having just come off massive GDPR projects and staring down the barrel of DMA, Google didn't want to do anything that expanded the risk surface, whether it was in Cloud, Ads, Mobile or anything else.
Their hand was forced with ChatGPT was launched ... and we're seeing how that's going.
They're like a hyperactive dog chasing its own tail. How many projects did they create only to shut them a bit later? All because there's always some nonsense to chase. Meanwhile the AI train has left the station without them and their search is now an ad infested hot piece of garbage. Don't even get me started on their customer/dev support or how aging things like Google Translate api got absolutely KILLED by GPT-4 like apis overnight.
Google has stage 4 leadership incompetency and can't be helped. The only humane option is euthanasia.
Yes, this! Google Docs is basically basic. But imagine if, years ago, Google had added built-in LLM-based auto-complete and refactoring and summation tools to documents and presentations etc, years ago...
The story I like to tell for the Newton is that it was launched before the technology was ready yet. Like the Sega Game Gear. Old video phones. All those tablets that launched before the iPad.
They’re good ideas, but they shipped a few years too early, and the technology to make them work well at a good price point wasn’t available until later. Like, the Sega Game Gear had a cool active matrix LCD screen, but it took six AA batteries and the batteries only lasted like four hours.
The Palm Pilot V had a dockable cell phone modem, but the connectivity wasn't integrated into the OS. It worked but only as a demonstration. Then Palm released a model with integrated data, but the BlackBerry came out the same year. You can be first and still if someone comes along with a much more compelling product, that's the end of you.
Google has a few years left as a search company, but their enshittification of results has doomed them to replacement by LLMs. They seem to have forgotten Google pushed out their predecessors by having the best search results. Targeted advertisements don't qualify.
Vastly depends on the game played and the settings. In a plane (so airplane mode, with Bluetooth headset) I played Hitman Absolution for 3 hours and still had 50%+ of the battery left. It was on minimal brightness because it was dark and didn't need more, but still.
Yeah, no need to take a (semi)joke literally and go all technical to debunk it. Though without optimizations, battery life on the deck was lucky to hit 2h at first before valve brought in updates and people learned they had to cap resolution and FPS to increase battery life.
Man I remember my last semester of college taking a history of photography course that was only offered every 3-4 years by a pretty legendary professor. The the day before the first day of class (or super close), Eastman Kodak declared bankruptcy after what? 110 years?
He scrapped his day 1 lecture and threw together a talk - with photos of course - about Kodak and how an intrepid engineer developed then the company foolishly hid the first digital camera because it would compete with their film line.
I think the TPU is simple. They do sell it (via cloud), but they focus on themselves first. When there was no shortage of compute, it was an also-ran in the ML hardware market. Now it’s trendy.
ChatGPT v Google is a far crazier history. Not only did Google invent Transformers, not only did Google open-source PaLM and Bert, but they even built chat tuned LLM chat bots and let employees talk with it. This isn’t a case where they were avoiding for disruption or protecting search - they genuinely didn’t see its potential. Worse, they got so much negative publicity over it that they considered it an AI safety issue to release. If that guy hadn’t gone to the press and claimed LaMDA was sentient than they may have entirely open sourced it like PaLM. This would likely mean that GPT-3 was open sourced and maybe never chat tuned either.
GPT-2 was freely available and OpenAI showed off GPT-3 freely as a parlor trick before ChatGPT came out. ChatGPT was originally the same - fun text generation as chat not a full product.
TLDR - Tensors probably didn’t have a lot of value until NVidia because scarce and they actively invented the original ChatGPT and “AI Safety” concerns caused them to lock it down.
Pretty sure it is because if ChatGPT likes would update as frequently as google website index it would render search engines like google obsolete and thus make their revenue nonexistent.
Ah, but part of the reason for CUDA's success is that the open source developer who wants to run unit tests or profile their kernel can pick up a $200 card. That PhD student with a $2000 budget can pick up a card. Academic lab with $20,000 for a beefy server, or tiny cluster? nvidia will take their money.
And that's all fixed capital expenditure - there's no risk a code bug or typo by an inexperienced student will lead to a huge bill.
Also, if you're looking for an alternative to CUDA because you dislike vendor lock-in, switching to something only available in GCP would be an absurd choice.
I'm really shocked at how dependent companies have become on the cloud offerings. Want a GPU? Those are expensive, lets just rent on Amazon and then complain about operational costs!
I've noticed this at companies. Yeah, the cloud is expensive, but you have a data center, and a few servers with RTX 3090s aren't expensive. A lot of research workloads can run on simple, cheap hardware.
Probably not many. However, 4090s would be a different situation. There are plenty of guides on running LLMs, stable diffusion, etc. on local hardware.
The H100s would be for businesses looking to get into this space.
This article really connected a lot of abstract pieces together into how they flow through silicon. I really enjoyed seeing the simple CISC instructions and how they basically map on to LLM inference steps.
This is probably a dumb question, that just shows my ignorance
but I keep hearing on the consumer end of things that the M1-M4
chips are good at some AI.
The most important for me these days would be Photoshop, Resolve etc,
and I have seen those run a lot faster on Apple new proprietary chips
than on my older machines.
That may not translate well at all to what this chip can do or what
a H100s can do.
But does it translate at all?
Of course Apple is not selling their propritary chips either so for it
to be practical Apple would have to release some from of an external,
server something stuffed with their GPUs and AI chips
I’m also not quite an expert, but have benchmarked an M1 and various GPUs.
The M* chips have unified memory and (especially Pro/Max/Ultra) have very high memory bandwidth even compared eg to a 1080 (an M1 Ultra has memory bandwidth between 2080 and 3090).
At small batch sizes (including 1, like most local tasks), inference is bottlenecked by memory bandwidth, not compute ability. This is why people say the M* chips are good for ML.
However H100s are used primarily for training (at enormous batch sizes) and require lots of interconnect to train large models. At that scale, arithmetic intensity is very high, and the M* chips aren’t very competitive (even if they could be networked) - they pick a different part of the Pareto power/efficiency curve than H100s which guzzle up power.
What Google really needs to do is get into the 2nm EUV space and go sub 2nm. When they have the electro lithography (or whatever tech ASML has that prints on the chips) then you have something really dangerous. Probably a hardcore Google X moonshot type project. Or maybe they have 500mm sitting around to just buy one of the machines. If their tpu are really that good - maybe it is a good business - especially if they can integrate all the way to having their own fab with their own tech
This is frankly infeasible. Between the decades of trade secrets they would first need to discover, the tens- or maybe hundreds- of billions in capital needed to build their very first leading edge fab, the decade or two it would take for any such business to mature to the extent it would be functional, and the completely inconsequential volumes of devices they'd produce, they would probably be lighting half a trillion dollars on fire just to get a few years behind where the leading edge sits today, ten or more years from now. The only reason leading edge fabs are profitable today is because of decades of talent and engineering focused on producing general purpose computing devices for a wide variety of applications and customers, often with those very same customers driving innovation independently in critical focus areas (e.g. Micron with chip-on-chip HDI yield improvements, Xilinx with interdie communication fabric and multi chip substrate design). TPUs will never generate the required volumes, or attract the necessary customers, to achieve remotely profitable economies of scale, particularly when Google also has to set an attractive price against their competitors.
If Google has a compelling-enough business case, existing fabs will happily allocate space for their hardware. TPU is not remotely compelling enough.
I listened to a talk by Jim Keller from Tens torrent and their different approach to making AI cores - 5 Risc V cores one core for loading data, one for uploading data and the rest dedicated to performing matrix operations.
He did mention Google's TPU and the fact it was like programming a VLIW and they had about 500 people dedicated to their compiler.
Quote from the OP: "The TPU v1 uses a CISC (Complex Instruction Set Computer) design with around only about 20 instructions."
chuckle CISC/RISC has gone from astute observation, to research program, to revolutionary technology, to marketing buzzwords....and finally to being just completely meaningless sounds.
Idk maybe it's just me, but what I was taught in comp architecture was that cisc vs risc has more to do with the complexity of the instructions, not the raw count. So TPU having a smaller number of instructions can still be a cisc if the instructions are fairly complex.
Granted the last time I took any comp architecture was a grad course like 15 years ago, so my memory is pretty fuzzy (also we spent most of that semester dicking around with Itanium stuff that is beyond useless now)
Right. CISC vs RISC has always been about simplifying the underlying micro-instructions and register set usage. It's definitely CISC if you have a large complex operation on multiple memory direct locations (albeit the lines between RISC and CISC being blurred, as all such polar philosophies do, when real-life performance optimizations come into play)
Guys....what are the instructions? The on-chip memory they are talking about is essentially...a big register set. So we have load from main memory into registers, store from registers into main memory, multiply matrices--source and dest are stored in registers....
We have a 20 instruction, load-store cpu....how is this not RISC? At least RISC how we used the term in 1995?
Its design follows the old idea that an ISA should be designed for assembly programmers; that instructions should implement complex or higher-level functions intended for a programmer to use directly.
RISC rejected that notion (among other things) and focused on designing ISAs for a compiler to target when compiling high level languages, without wasting silicon on instructions a compiler cannot easily use. For the TPU, a compiler cannot easily take a 256x256 matrix multiply written in a high-level language like C and emit a Matrix_Multiply instruction.
I don't think it makes any sense to talk about all on-chip memory as a register set. In practice most uses of REP MOVS these days don't leave L3$ but because it's an instruction that runs for a highly variable amount of time while transferring data between different locations we consider it very CISCy. And the TPU also has instructions to transfer data over PCIe to and from the TPU's local DDR3 memory as well, which isn't on the chip and I hope you would agree that it's not like a register at that point.
If every instruction was always one 256 element unit maybe you could make the analogy stick. But it's working with 256*N element operations.
Given what seems to be an enormous demand for fab space,
when Microsoft or Google create a proprietary chip and need it
produced how do they get to the front of the line?
Are they simple enough that "older outdated less in demand" fabs
can produce them?
I know Apple and Nvidea has a lock on a lot of fab space?
> However, although tensors describe the relationship between arbitrary higher-dimensional arrays, in practice the TPU hardware that we will consider is designed to perform calculations associated with one and two-dimensional arrays. Or, more specifically, vector and matrix operations.
I still don’t understand why the term “tensor” is used if it’s only vectors and matrices.
It says:
tensors describe the relationship between high-d arrays
It does not say:
tensors “only” describe the relationship between high-d arrays
The term “tensor” is used because it covers all cases: scalars, vectors, matrices, and higher-dimensional arrays.
Tensors are still a generalization of vectors and matrices.
Note the context: In ML and computer science, they are considered a generalization. From a strict pure math standpoint they can be considered different.
As frustrating as it seems one is not really more right and context is the decider. There are lots of definitions across STEM fields that change based on the context or field they’re applied to.
The word tensor has become more ambiguous during the time.
Before 1900, the use of the word tensor was consistent with its etymology, because it was used only for symmetric matrices, which correspond to affine transformations that stretch or compress a body in certain directions.
The square matrix that corresponds to a general affine transformation can be decomposed into the product of a tensor (a symmetric matrix which stretches) and a versor (a rotation matrix, which is antisymmetric and which rotates).
When Ricci-Curbastro and Levi-Civitta have published the first theory of what now are called tensors, they did not define any new word for the concept of a multidimensional array with certain rules of transformation when the coordinate system is changed, which is now called tensor.
When Einstein has published the Theory of General Relativity during WWI in which he used what is now called tensor theory, for an unknown reason and without any explanation for this choice he has begun to use the word "tensor" with the current meaning, in contrast with all previous physics publications.
Because Einstein has become extremely popular immediately after WWI, his usage of the word "tensor" has spread everywhere, including in mathematics (and including in the American translations of the works of Ricci and Levi-Civita, where the word tensor has been introduced everywhere, despite the fact that it did not exist in the original).
Nevertheless, for many years the word "tensor" could not be used for arbitrary multi-dimensional arrays, but only for those which observe the tensor transformation rules with respect to coordinate changes.
The use of the word "tensor" as a synonym for the word "array", like in ML/AI, is a recent phenomenon.
Previously, e.g. in all early computer literature, the word "array" (or "table" in COBOL literature) was used to cover all cases, from scalars, vectors and matrices to arrays with an arbitrary number of dimensions, so no new words are necessary.
Famously whether free helium is a molecule or not depends on whether you're talking to a physicist or a chemist.
But yeah, people in different countries speak different languages and the same sound, like "no" can mean a negation in English but a possessive in Japanese. And as different fields establish their jargons they often redefine words in different ways. It's just something you have to be aware of.
(I think) technically, all of these mathematical objects are tensors of different ranks:
0. Scalar numbers are tensors of rank 0.
1. Vectors (eg velocity, acceleration in intro high school physics) are tensors of rank 1.
2. Matrices that you learn in intro linear anlgebra are tensors of rank 2. Nested arrays 1 level deep, aka a 2d array.
0. Tensors numbers are tensors of rank 3 or higher. I explain this as ‘nested arrays’ to people with programming backgrounds as nested arrays of arrays with 3dimensions of arrays or higher.
It's branding (see: TensorFlow); also, pretty much anything (linear) you would do with an arbitrarily ranked tensor can be expressed in terms of vector ops and matmuls
At the end of the day all the arrays are 1 dimensional and thinking of them as 2 dimensional is just an indexing convenience. A matrix multiply is a bunch of vector dot products in a row. Higher tensor contractions can be built out of lower-dimensional ones, so I don't think it's really fair to say the hardware doesn't support it.
I’d say it’s more like calling an ALU that can perform unary and binary operations (so 1 or 2 inputs) an “array processing unit” because it’s like it can process 1- and 2-element arrays. ;)
I do not know which is the real origin of the fashion to use the word tensor in the context of AI/ML.
Nevertheless, I have always interpreted it as a reference to the fact that the optimal method of multiplying matrices is to decompose the matrix multiplication into tensor products of vectors.
The other 2 alternative methods, i.e. decomposing the matrix multiplication into scalar products of vectors or into AXPY operations on pairs of vectors, have a much worse ratio between computation operations and transfer operations.
Unfortunately, most people learn in school the much less useful definition of the matrix multiplication based on scalar products of vectors, instead of its definition based on tensor products of vectors, which is the one needed in practice.
The 3 possible methods for multiplying matrices correspond to the 6 possible orders for the 3 indices of the 3 nested loops that compute a matrix product.
The Einsum notation makes it desirable to formulate your model/layer as multi-dimensional arrays connected by (loosely) named axes, without worrying too much about breaking it down to primitives yourself. Once you get used to it, the terseness is liberating.
I was confused as hell for a long time when I first got into ML, until I figured out how to think about tensors in a visual way.
You're right: fundamentally ML is about vector and matrix operations (1D and 2D). So then why are most ML programs 3D, 4D, and in a transformer sometimes up to 6D (?!)
One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.
Another guess is that it's to represent "time" as in, think of how transformers work: they generate a token, then another given the previous, then a third given the first two, etc. That's a certain way of describing "time". But it turns out that transformers don't do this as a 3D or 4D dimension. It only needs to be 2D, because tokens are 1D -- if you're representing tokens over time, you get a 2D output. So even with a cutting edge model like transformers, you still only need plain old 2D matrix operations. The attention layer creates a mask, which ends up being 2D.
So then why do models get to 3D and above? Usually batching. You get a certain efficiency boost when you pack a bunch of operations together. And if you pack a bunch of 2D operations together, that third dimension is the batch dimension.
For images, you typically end up with 4D, with the convension N,C,H,W, which stands for "Batch, Channel, Height, Width". It can also be N,H,W,C, which is the same thing but it's packed in memory as red green blue, red green blue, etc instead of all the red pixels first, then all the green pixels, then all the blue pixels. This matters in various subtle ways.
I have no idea why the batch dimension is called N, but it's probably "number of images".
"Vector" wouldn't quite cover all of this, and although "tensor" is confusing, it's fine. It's the ham sandwich of naming conventions: flexible, satisfying to some, and you can make them in a bunch of different varieties.
Under the hood, TPUs actually flatten 3D tensors down into 2D matrix multiplications. I was surprised by this, but it makes total sense. The native size for a TPU is 8x128 -- you can think of it a bit like the native width of a CPU, except it's 2D. So if you have a 3x4x256 tensor, it actually gets flattened out to 12x256, then the XLA black box magic figures out how to split that across a certain number of 8x128 vector registers. Note they're called "vector registers" rather than "tensor registers", which is interesting. See https://cloud.google.com/tpu/docs/performance-guide
You'd hate particle physics then. "Spin" and "action" and so on are terrible names, but scientists live with them, because convention.
Convention dominates most of what we do. I'm not sure there's a good way around this. Most conventions suck, but they were established back before there was a clear idea of what the best long-term convention should be.
At least in physics you can understand how the terms came about historically, where at some point they made sense. But “tensor” here, as note in sibling comments, seems to have been chosen primarily for marketing reasons.
It comes from the maths, where tensors are generalisations of matrices/vectors. They got cribbed, because the ML stuff directly used a bunch of the underlying maths. It’s a novel term, it sounds cool, not surprised it also then got promoted up into a marketing term.
> tensors are generalisations of matrices/vectors.
Is that what they are though? Because that really is not my understanding. Tensors are mappings which not all matrices and vectors are. Maybe the matrices in ML layers are all mappings, but a matrix in general is not, not is a vector always a mapping. So tensors aren’t generalizations of matrices and vectors.
> Tensors are mappings which not all matrices and vectors are.
A tensor in Physics is an object that follows some rules when changing reference frame. Their matrix representation is just one way of writing them. It’s the same with vectors: a list with their components is a representation of a vector, not the vector itself. We can think about it that way: the velocity of an object does not depend on the reference frame. Changing the axes does not make the object change its trajectory, but it does change the numerical values of the components of the velocity vector.
> So tensors aren’t generalizations of matrices and vectors.
Indeed. Tensors in ML have pretty much nothing to do with tensors in Maths or Physics. It is very unfortunate that they settled on the same name just because it sounds cool and sciency.
I think to be a tensor, all the bases should be independent. The way I think of it is you use a tensor to describe the rotation of an asteroid around all its major axes (inertia tensor?)
just because an image is 2-D doesn’t mean that the model can’t use higher dimensional representations in subsequent layers.
For an image, you could imagine a network learning to push the image through a filter bank that does oriented local frequency decomposition and turns it into 4D {height}x{width}x{spatial freq}X{orientation} before dealing with color channels or image batches
For whatever reason, I have held a mental image of a Tensor as a Tesseract/HyperCube where the connections are like the Elastic workout bands where they have differing tensile resistances, and they pull on one another to create their encapsulated info-cluster - but I have no clue if thats truly an accurate depiction, but it works in my head....
I'm reluctant to tell people "no, don't think of it that way," especially if it works for you, because I don't know the best way to think of things. I only know what works well for me. But for me, it'd be ~impossible to use your mental model to do anything useful. That doesn't mean it's bad, just that I don't understand what you mean.
The most straightforward mental model I've ever found for ML is, think of it as 2D matrix operations, like high school linear algebra. Matrix-matrix, matrix-vector, vector-matrix, and vector-vector will get you through 95% of what comes up in practice. In fact I'm having trouble thinking of something that doesn't work that way, because even if you have an RGB image that you multiply against a 2D matrix (i.e. HxWxC multiplied by a mask) the matrix is still only going to apply to 2 of the channels (height and width), since that's the only thing that makes sense. That's why there's all kinds of flattening and rearranging everywhere in practice -- everyone is trying to get a format like N,C,H,W down to a 2D matrix representation.
People like to talk up the higher level maths in ML, but highschool linear algebra (or for the gamedevs in the audience, the stuff you'd normally do in a rendering engine) really will carry you most of the way through your ML journey without loss of generality. The higher level maths usually happens when you start understanding how differentiation works, which you don't even need to understand until way later after you're doing useful things already.
>One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.
WRT to ML - may time be better thought of where a thing lives in relation to other things that occurred within the same temporal window?
so "all the shit that happened in 1999 also has an expression within this cluster of events from 1999" - but the same information appears in any location where it is relationally contextual to the other neighbors, such as the SUBJECT of the information? Is this accurate to say why its 'quantum' because the information will show up depending on where the Observation (query) for it is occurring?
Something similar happens on Wikipedia, where topics that use math inevitably get explained in the highest level math possible. It makes topics harder to understand than they need to be.
As a helpful Wiki editor just trying to make sure that we don't lead people astray, I've made some small changes to clarify your statement:
In the virtual compendium of Wikipedia, an extensive repository of human knowledge, there is a discernible proclivity for the hermeneutics of mathematically-infused topics to be articulated through the prism of esoteric and sophisticated mathematical constructs, often employing a panoply of arcane lexemes and syntactic structures of Greek and Latin etymology. This phenomenon, redolent of an academic periphrasis, tends to transmute the exegesis of such subjects into a crucible of abstruse and high-order mathematical discourse. Consequently, this modus operandi obfuscates the intrinsic didactic intent, thereby precipitating an epistemological chasm that challenges the layperson's erudition and obviates the pedagogical utility of the exposition.
Sigh...learning about TPUs a decade ago made me invest heavily in $GOOG for the coming AI revolution...got that one 100% wrong. +400% over 10 years isn't bad but I can't help but feel shortchanged seeing nvidia/etc
yeah, but nvda is up like 500% in 2 years, so if you’re naive enough to think you can time the market, you’d have fomo over having invested in the “wrong” thing.
Seeing the difference between GPT2 and GPT3 made me run to NVDA immediately. One of the few bets in my life I've ever been confident about. I think NVDA was a pretty reasonable bet on AI like 5+, maybe 10 years ago when deep learning was ramping up.
It wasn't just that, it was also all the deep learning stuff. Atari games playing themselves, deep style and variants. There was some interesting image generation happening. AlphaGo was 2015, etc. that was really when things started accelerating imo.
The potential usefulness of things like TPUs actually made me invest in Broadcomm, which helped Google design them and could potentially help Amazon or however else design their equivalents. But I'm also long NVidia and a half dozen other companies with AI exposure while still keeping most of my money in index funds.
Not the whole design - the core processing part (systolic array - matrix multiplier) was designed by Google, but Broadcom designed all the highspeed chip I/O and mapped the design onto TSMCs tools/rules.
They got it working, then Jeff Dean did the math and the decided to do an ASIC.
Now of course Google should spin off the TPU team as a separate company. It's the only credible competition NVidia has, and the software support is second only to NVidia.
[1] https://open.spotify.com/episode/0V9kRgNS7Ds6zh3GjdXUAQ?si=q...