Google's First Tensor Processing Unit: Architecture

nl · on March 26, 2024

On the podcast interview now Groq CEO Jonathon Ross did[1] he talked about the creation of the original TPUs (which he built at Google). Apparently originally it was a FPGA he did in his 20% time because he sat near the team who was having inference speed issues.

They got it working, then Jeff Dean did the math and the decided to do an ASIC.

Now of course Google should spin off the TPU team as a separate company. It's the only credible competition NVidia has, and the software support is second only to NVidia.

[1] https://open.spotify.com/episode/0V9kRgNS7Ds6zh3GjdXUAQ?si=q...

Laremere · on March 26, 2024

The way I see, NVidia only has a few advantages ordered from most important to least:

1. Reserved fab space.

2. Highly integrated software.

3. Hardware architecture that exists today.

4. Customer relationships.

but all of these aspects are weak in one way or another:

For #1, fab space is tight, and NVidia can strangle its consumer GPU market if it means selling more AI chips at a higher price. This advantage is gone if a competitor makes big bets years in advance, or another company that has a lot of fab space (intel?) is willing to change priorities.

2. Life is good when your proprietary software is the industry standard. Whether this actually matters will depend on the use case heavily.

3. A benefit now, but not for long. It's my estimation that the hardware design for TPUs is fundamentally much simpler than for GPUs. No need for raytracing, texture samplers, or rasterization. Mostly just needs lots of matrix multiplication and memory. Others moving into the space will be able to catch up quickly.

4. Useful to stay in the conversation, but in a field hungry for any advantage, the hardware vendor with the highest FLOPS (or equivalent) per dollar is going to win enough customers to saturate their manufacturing ability.

So overall, I give them a few years, and then the competition is going to be real quite fast.

bartwr · on March 26, 2024

Seems you have not worked with ML workloads, but base your comment on "internet wisdom", or worse, business analysts (I am sorry if that's inaccurate).

On GPUs, ML "just works" (inference and training) and are always order of magnitude faster than whatever CPU you have. TPUs work very well for some model architectures (old ones that they were optimized and designed for) and on some novel others can be actually slower than a CPU (because of gathers and similar) - this was my experience working on ML stuff as an ML Researcher at Google till 2022, maybe it got better but I doubt. Older TPUs were ok only for inference of those specific models and useless for training. And anything new I tried (fundamental part of research...) - the compiler would sonetimes just break with an internal error, most of the time just produce terrible and slow code, and bugs filed against it would stay open for years.

GPU is so much more than a matrix multiplier - it's a fully general, programmable processor. With excellent compilers, but most importantly - low level access that you don't need to rely on proprietary compiler engineers (like TPU ones) and anyone can develop something like Flash Attention. And as a side note: while a Transformer might be mostly matrix multiplication, many other models are not.

sigmoid10 · on March 26, 2024

>On GPUs, ML "just works"

If you had worked with ML, you'd know that this is not true. It's actually more like the opposite. It also has nothing to do with the chips themselves. Things don't magically work "because GPU", they work because manufacturers spend the time getting their drivers and ecosystems right. That's why for example noone is using AMD GPUs for ML, despite them offering more compute per dollar on paper. Getting the software stack to the point of Nvidia/CUDA, where things really do "just work", is an enormous undertaking. And as someone who has been researching ML for more than a decade now, I can tell you Nvidia also didn't get these things right in the beginning. That's the reason why they have no real competition today (and still won't for quite some time).

latchkey · on March 26, 2024

> That's why for example noone is using AMD GPUs for ML

You're right, they are behind, but to say that nobody is using it, is not truthful. AMD HPC clusters are being used [0] and [1] for AI/ML.

The larger issue is that AMD has only been building HPC clusters for the last period of time. Now, with the release of MI300x, we have Azure and Oracle coming online with them now. Disclosure, my business is also building a MI300x super computer as well, with the express goal of enabling more access to developers.

[0] https://defensescoop.com/2023/08/23/navys-new-25m-supercompu...

[1] https://arxiv.org/abs/2312.12705

sigmoid10 · on March 28, 2024

>AMD HPC clusters are being used [0] and [1] for AI/ML.

Funny how you can immediately tell when the business people made these decisions and not the tech people. This is exactly what I would have expected from an organization like the Navy. On paper it does sound great and the Navy bean counters probably loved this. But they are in for a rough awakening.

latchkey · on March 28, 2024

As far as I can tell, the only rough awakening is that they paid $25m in 2023, that costs a fraction of that today, for even better performance.

In a few months, my own cluster will be nearly 2x that size, with better networking, and we aren't spending anywhere near $25m.

Disclosure: building my own supercomputer business around AMD hardware

sigmoid10 · on April 8, 2024

The best I can say is that my thoughts and prayers go to the ML engineers who will actually have to deal with this. Those companies literally couldn't pay me enough to put up with it. They will likely only attract people who care about the salary and the position instead of getting things done. I've seen it with other colleagues before. These numbers of yours are completely worthless without someone who is willing to put in 5 times the work for the same or worse results.

latchkey · on April 8, 2024

People choose jobs and tools for a variety of reasons. I don't feel the need to cast judgement on them over it.

The numbers I gave aren't worthless, nor does it take 5x the amount of work. I also don't think that going with a single source for hardware for all of AI is very smart either, especially given the fact that there are serious supply shortages from that single vendor. No fortune 100 would put all their eggs in one basket and even if it was 5x the work, it is worth it.

mike_hearn · on March 26, 2024

Probably bartwr is using "GPUs" to mean NVIDIA GPUs. Seeing as nobody uses AMD GPUs for it, that simplification seems OK.

Laremere · on March 26, 2024

Hey, this is a good comment. I've only toyed with ML stuff, but I've done a lot with GPUs. I hope you can find my "step back" perspective as valuable I find your up close one.

My chief mistake in the above comment was using "TPU", as that's Google's branding. I probably should've used "AI focused co-processor". I'm not talking exclusively about Google's foray into the space, especially as I haven't used TPUs.

My list of things to ditch on GPUs doesn't include cores. My point there is that there's a bunch of components that are needed for graphics programming that are entirely pointless for AI workloads, both inside the core's ALU and as larger board components. The hardware components needed for AI seem relatively well understood at this point (though that's possible to change with some other innovation).

Put another way, my point is this: Historically, the high end GPU market was mostly limited to scientific computing, enthusiast gaming, and some varied professional workloads. Nvidia has long been king here, but with relatively little attempt by others at competition. ML was added to that list in the last decade, but with some few exceptions (Google's TPU), the people who could move into the space haven't. Then chatGPT happened, investment in AI has gone crazy, and suddenly Nvidia is one of the most valuable companies in the world.

However, The list of companies who have proven they can make all the essential components (in my list in the grandparent) isn't large, but it's also not just Nvidia. Basically every computing device with a screen has some measure of GPU components, and now everyone is paying attention to AI. So I think within a few years Nvidia's market leadership will be challenged, and they certainly won't be the only supplier of top of the line AI co-processors by the end of the decade. Whether first mover advantage will keep them in first place, time will tell.

sudosysgen · on March 26, 2024

ML doesn't just work on GPUs. It's not uncommon to have architectures where GPUs don't really work, we just tend not to use those :)

sevagh · on March 26, 2024

Also, it's disingenuous to say "there's only 4 things you need to beat NVIDIA" when each of the 4 is an enormous undertaking.

puppymaster · on March 26, 2024

not to mention every not-so-serious, inference heavy ML developers just want something to work to deliver to client. That itself is a semi-moat.

kkielhofner · on March 26, 2024

It's been talked to death but non-CUDA implementations have their challenges regardless of use case. That's what first-mover advantage and > 15 years of investment by Nvidia in their overall ecosystem will do for you.

But support for production serving of inference workloads outside of CUDA is universally dismal. This is where I spend most of my time and compared to CUDA anything else is non-existent or a non-starter unless you're all-in on packaged API driven Google/Amazon/etc tooling utilizing their TPUs (or whatever). The most significant vendor/cloud lock-in I think I've ever seen.

Efficient and high-scale serving of inference workloads is THE thing you need to do to serve customers and actually have a chance at ever making any money. It's shocking to me that Nvidia/CUDA has a complete stranglehold on this obvious use case.

sevagh · on March 26, 2024

A great summary of how unserious NVIDIA's competitors are is how long it took AMD's flagship consumer/retail GPU, the 7900 XT[X], to gain ROCm support.

That's quite literally unacceptable.

kkielhofner · on March 26, 2024

For those who don't know - one year after launch.

Meanwhile Nvidia will go as far as to back port Hopper support to CUDA 11.8 so it "just runs" the day of launch with everything you already have.

nl · on March 26, 2024

Actually their real advantage is the large set of highly optimised CUDA kernels.

This is the thing that lets them outperform AMD chips even on inferior hardware. And the fact that anything new gets written for CUDA first.

There is OpenAI's Triton language for this too and people are beginning to use it (shout out to Unsloth here!).

> Reserved fab space.

While this is true, it's worth noting that the inference only Groq chip which gets 2x-5x better LLM inference performance is on a 12nm process.

ants_everywhere · on March 26, 2024

Honest question: will AI help AMD catch up with optimized CUDA/ROCM kernels of their own?

logicchains · on March 26, 2024

>2. Highly integrated software.

NVidia's biggest advantage is that AMD is unwilling to pay for top notch software engineers (and unwilling to pay the corresponding increase in hardware engineer salaries this would entail). If you check online you'll see NVidia pays both hardware and software engineers significantly more than AMD does. This is a cultural/management problem, which AMD's unlikely to overcome in the near-term future. Apple so far seems like the only other hardware company that doesn't underpay its engineers, but Apple's unlikely to release a discrete/stand-alone GPU any time soon.

dagmx · on March 26, 2024

Don’t underestimate CUDA as the moat. It’s been a decade of sheer dominance with multiple attempts to loosen its grip that haven’t been super fruitful.

I’ll also add that their second moat is Mellanox. They have state of the art interconnect and networking that puts them ahead of the competition that are currently focusing just on the single unit.

latchkey · on March 26, 2024

This moat is going to get paralleled over the next few years. First off Mellanox is unobtanium with 52+ week lead times.

GigaIO has a PCIe fabric solution that is a fraction of the cost of Mellanox and available today. This enables up to 64 GPUs to appear on a single system.

We're also seeing the ultraethernet stuff come online as well, but that'll have to wait for PCIe6.

jimberlage · on March 26, 2024

I’ve spent the last month deep in GPU driver/compiler world and -

AMD or Apple (Metal) or someone (I haven’t tried Intel’s stuff) just needs to have a single guide to installing a driver and compiler that doesn’t segfault if you look at it wrong, and they would sweep the R&D mindshare.

It is insane how bad CUDA is; it’s even more insane how bad their competitors are.

jimberlage · on March 26, 2024

If you work in hardware and are interested in solving this lemme say this

There are billions of dollars waiting for the first person to get this right. The only reason I haven’t jumped on this myself is a lack of familiarity with drivers.

7e · on March 26, 2024

These have always been NVIDIA's "few" advantages and yet they've still dominated for years. It's their relentless pace of innovation that is their advantage. They resemble Intel of old, and despite Intel's same "few" advantages, Intel is still dominant in the PC space (even with recent missteps).

weweersdfsd · on March 26, 2024

They've dominated for years, but now all big tech companies are using their products in scale not seen before, and all have vested interest in cutting their margins by introducing some real competition.

Nvidia will do good in the future, but perhaps not good enough to justify their stock price.

KeplerBoy · on March 26, 2024

Nvidia's datacenter AI chips don't have raytracing or rasterization. Heck, for all we know the new blackwell chip is almost exclusively tensor cores. They gave no numbers for regular CUDA perf.

Oioioioiio · on March 26, 2024

Nvidia has so much software behind all of this, your list is a tremendes understatement.

Alone how many internal ML things nvidia builds helps them tremendesly to understand the market (what does the market need).

And they use their inventions themselves.

'only has a few' = 'has a handful easy to list but with huge implications which are not easily matched by amd or intel right now'

otabdeveloper4 · on March 26, 2024

CUDA is absolute shit, segfaults or compiler errors if you look at it wrong.

NVidia's software is the only reason I'm not using GPU's for ML tasks and likely never will.

KeplerBoy · on March 26, 2024

That's just C. If you're accessing your arrays out of bounds it's going to segfault. hopefully.

Can't blame CUDA for that one.

otabdeveloper4 · on March 26, 2024

I'm talking about the compiler segfaulting, not the end-user code.

Culonavirus · on March 26, 2024

Skill issue.

otabdeveloper4 · on March 26, 2024

No, CUDA's botched gcc implementation segfaulting due to compiler errors during compilation is not a "skill issue".

(Well, a skill issue of whoever is patching gcc on Nvidia's end, I guess.)

summerlight · on March 26, 2024

> Now of course Google should spin off the TPU team as a separate company.

Given the size of the market and its near-monopoly situation, I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

nl · on March 26, 2024

> I strongly think this has the potential to (almost immediately) surpass the Pixel hardware business. But the problem here is that TPU is a relatively scarce computing resource even inside Google and it's very likely that Google has a hard time to meet its internal demands...

Yes.

But imagine how the company would do: they have a guaranteed market at Google say for 3 years, and while yes maybe Google takes 100% of the production in the first 12 months it's not a bad position to start from.

Plus there are other products which they could ship that might not always need to be built on the latest process. I imagine there would be demand for inference only earlier generation TPUs that can run LLMs fast if the power usage is low enough.

fnbr · on March 26, 2024

I’m surprised they sell any to external customers, to be honest.

KeplerBoy · on March 26, 2024

They don't sell any TPUs, do they? Besides the, now ancient, coral toy-TPUs.

Kelteseth · on March 26, 2024

Has there been any development? The last update is from 2021 [0], but it is not officially killed by google(.com)

[0] https://coral.ai/news/updates-07-2021

ddalex · on March 26, 2024

My guess is that the "AI" accelerators in Google Tensor phone chips are based on Coral....

fnbr · on March 30, 2024

I meant rent on GCP. You’re right.

bfeynman · on March 26, 2024

Amazon acquired Annapurna labs doing the same thing and have their own train,/inferentia silicon, and they definitely have more support than Google.

ipsum2 · on March 26, 2024

> It's the only credible competition NVidia has

This is wrong, both AMD and Intel (through Habana) have GPUs comparable to H100s in performance.

nl · on March 26, 2024

Yes, but they don't have the custom kernels that CUDA has. TPUs do have some!

rurban · on March 26, 2024

They have Vulcan, which is cross-compatible.

And AMD has ROCm. pytorch is standard and pytorch has ROCm support. And the Google TPU v5 also has pytorch support.

We do have a couple of H100's, but I'd love to replace them with AMD's

nl · on March 31, 2024

Vulcan is a driver-level API. It competes with DirectX and OpenGL.

CUDA is a language you write kernels. It competes with OpenAI's Triton language.

Here's what CUDA looks like: https://github.com/tspeterkim/flash-attention-minimal/blob/m...

This is what Triton looks like: https://triton-lang.org/main/getting-started/tutorials/06-fu...

By contrast Vulcan looks like this: https://github.com/KhronosGroup/Vulkan-Samples/blob/main/sam...

(It's true to some extent that maybe you could use Vulcan shaders to write deep learning kernels, maybe? I'm not aware of anyone doing it though)

Kelteseth · on March 26, 2024

If AMD fixes or open sources their proprietary firmware blob[0]. Geohot streamed all weekend on Twitch, reverse engineering the AMD firmware. It was quite entertaining learning about how that low level hardware firmware works[1] and his rants about AMD of course.

[0] https://www.phoronix.com/news/Tinybox-Radeon-Again-UMR

[1] https://www.twitch.tv/georgehotz

KeplerBoy · on March 26, 2024

Geohot is wrangling with unsupported consumer hardware.

The datacenter stuff is on a different architecture and driver stack. The number one supercomputer on the top500 list (frontier at ORNL) is based on AMD GPUs and AMD is probably more invested in supporting that.

kkielhofner · on March 26, 2024

I work with Frontier and ORNL/OLCF. They have had and continue to have issues with AMD/ROCm but yes, they do of course get excellent support from AMD. The entire team at OLCF is incredible as well (obviously) and they do amazing work.

Frontier certainly has some unique quirks but the documentation is online[0] and most of these quirks are inherent to the kinds of fundamental issues you'll see on any system in the space (SLURM, etc).

However, most of the issues are fundamentally ROCm and you'll run into them on any MIxxx anywhere. I run into them frequently with supported and unsupported consumer gear all the way up.

[0] - https://docs.olcf.ornl.gov/systems/frontier_user_guide.html

QuadmasterXLII · on March 26, 2024

I mean, that's kinda nvidia's whole shtick: anyone can play around synthesizing cat pictures on their gaming GPU and if they make a breakthrough, the same software will transfer to X million dollar supercomputers.

immibis · on March 26, 2024

Subscriber only videos, so nobody can confirm that he did that, nor archive whatever valuable information he released. At least not without paying some money in the next 7-14 days before they're deleted.

ionelaipatioaei · on March 26, 2024

https://www.youtube.com/@geohotarchive

refulgentis · on March 26, 2024

Geohot doesn't know what he's talking about and I'm kinda ashamed to see this lazy thinking leak onto HN. There was an article a couple weeks back on AMD open sourcing drivers in the Linux kernel tree that you should look into.

Kelteseth · on March 26, 2024

Care to explain a bit more? His rant was about the firmware having crashes not the Linux driver.

refulgentis · on March 26, 2024

Firmware crashes => days long "open source it and I'll fix it. no? why does AMD hate its customers?"

I got an appointment and have exactly one minute till I have to leave, apologies for brevity: they can't open source the full driver because then they'd have to release HDMI spec stuff that the consortium says they can't. (I don't support any of that, my only intent is to communicate George isn't really locked in here when he starts casting aspersions or claiming AMD doesn't care)

ipsum2 · on March 26, 2024

Huh? The reason why they're competitive with Nvidia is because they have custom kernels for all the popular models.

kettleballroll · on March 26, 2024

But they're far behind in adoption in the AI space, while TPUs have both adoption (inside Google and on top) and a very strong software offering (Jax and TF)

HarHarVeryFunny · on March 26, 2024

There's also Amazon's AWS "Trainium" chips, which is what Anthropic will be using going forward.

If you're talking about training LLMs, involving 10's of thousands of processors, then the specifics of one processor vs another isn't the most important thing - it's the overall architecture and infrastructure in place to manage it.

bionhoward · on March 26, 2024

Speaking of which, mega props to Groq, they really are awesome, so many startups launch with bullshit and promises, but Groq came to the scene with something awesome already working, which is reason enough to love them. I really respect this company and I say that extremely never-often.

elorant · on March 26, 2024

I wouldn't call it awesome. It's just a big chip with lots of cache. You need hundreds of them to sufficiently load any decent model. At which point the cost has skyrocketed.

pyb · on March 26, 2024

There seem to be conflicting reports as to who came up with the TPU https://mastodon.social/@danluu/109641269333636407

hipadev23 · on March 26, 2024

How is it that Google invented the TPU and Google Research came up with the paper on LLM and NVDA and AI startup companies have captured ~100% of the value

neilv · on March 26, 2024

There's an old joke explanation about Xerox and PARC, about the difficulty of "pitching a 'paperless office' to a photocopier company".

In Google's case, an example analogy would be pitching making something like ChatGPT widely available, when that would disrupt revenue from search engine paid placements, and from ads on sites that people wouldn't need to visit. (So maybe someone says, better to phase it in subtly, as needed for competitiveness, but in non-disruptive ways.)

I doubt it's as simple as that, but would be funny if that was it.

halflings · on March 26, 2024

This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.

My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

Karrot_Kream · on March 26, 2024

Google gets much more scrutiny then smaller companies so it's understandable to be worried. Pretty much any small mistake of theirs turns into clickbait on here and the other tech news sites and you get hundreds of comments about how evil Big Tech is. Of course it's their own fault that their PR hews negative so frequently but still it's understandable why they were so shy.

logicchains · on March 26, 2024

Sydney when initially released was much less censored and the vast majority of responses online were positive, "this is hilarious/cool", not "OMG Sydney should be banned!".

knowriju · on March 26, 2024

You have clearly not heard about Tay and Galactica.

Symmetry · on March 26, 2024

It's understandable that people at Google are worried because it's likely very unpleasant to see critical articles and tweets about something you did. But that isn't really bad for Google's business in any of the ways that losing to someone on AI would be.

IX-103 · on March 26, 2024

Google is constantly being sued for nearly everything they do. They create a Chrome Incognito mode like Firefox's private browsing mode and they get sued. They start restricting App permissions on Android, sued. Adding a feature where Google maps lets you select the location of your next appointment as a destination in a single click, sued (that's leveraging your calendar monopoly to improve your map app).

Google has it's hands in so many fields that any change they make that disrupts the status-quo brings down antitrust investigations and lawsuits.

That's the reason why Firefox and Safari dropping support for 3rd party cookies gets a yawn from regulators while Google gets pinned between the CMA wanting to slow down or stop 3rd party cookies deprecation to prevent disrupting the ads market and the ICO wanting Google to drop support yesterday.

This is not about bad press or people feeling bad about news articles. Google has been hit by billion dollar fines in the past and has become hesitant to do anything.

Where smaller companies can take the "Elon Musk" route and just pay fines and settle lawsuits as just the cost of doing business, Google has become an unwieldy juggernaut unable to move out of fear of people complaining and taking another pound of flesh. To be clear, I don't agree with a strategy of ignoring inconvenient regulations, but Google's excess of caution has severely limited their ability to innovate. But given previous judgements against Google, I can't exactly say that they're wrong to do so. Even Google can only pay so many multi-billion dollar fines before they have to close shop, and I can't exactly say the world would be better off if that happened.

michaelt · on March 26, 2024

That's true for google, sure. But what about individual workers and managers at google?

You can push things forward hard, battle the many stakeholders all of whom want their thing at the top of the search results page, get a load of extra headcount to make a robust and scalable user-facing system, join an on-call rota and get called at 2am, engage in a bunch of ethically questionable behaviour skirting the border between fair use and copyright infringement, hire and manage loads of data labellers in low-income countries who get paid a pittance, battle the internal doubters who think Google Assistant shows chatbots are a joke and users don't want it, and battle the internal fearmongers who think your ML system is going to call black people monkeys, and at the end of it maybe it's great or maybe it ends up an embarrassment that gets withdrawn, like Tay.

Or you can publish some academic papers. Maybe do some work improving the automatic transcription for youtube, or translation for google translate. Finish work at 3pm on a Friday, and have plenty of time to enjoy your $400k salary.

nemothekid · on March 26, 2024

>There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.

Compared to Google, OpenAI's billions is peanuts, while costing a fortune to generate. GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

ro_sharp · on March 26, 2024

> GPT-4 doesn't seem profitable (if it was, would they need to throttle it?)

Maybe? Hardware supply isn’t perfectly elastic

nequo · on March 26, 2024

Wouldn't Google be better able to integrate ads into a "ChatGoogle" service than OpenAI is into ChatGPT?

exitheone · on March 26, 2024

The cost per ad is still astronomically different between search ads and LLMs

varjag · on March 26, 2024

There could be an opposite avenue: ad-free Google Premium subscription with AI chat as a crown jewel. An ultimate opportunity to diversify from ad revenue.

disgruntledphd2 · on March 26, 2024

There's not enough money in it, as Google's scale.

Especially because the people who'd pay for Premium tend to be the most prized people from an advertiser perspective.

And most people won't pay, under any circumstances, but they will click on ads which make Google money.

nequo · on March 26, 2024

The low operating margin of serving a GPT-4 scale model sounds like a compelling explanation for why Google stayed out of it.

But then why did Microsoft put its money behind it? Alphabet's revenue is around $300bn, and Microsoft's is around $210bn which is lower but it is the same order of magnitude.

varjag · on March 26, 2024

YouTube does it, at Google scale. And these same people do pay $20/mo for ChatGPT anyway.

nemothekid · on March 26, 2024

YouTube isn't comparable - YouTube revenue is roughly 30B/year, while Search revenue is roughly 175B/year.

Advertisers are willing to pay far more than $20/mo per user, combined with the fact that search costs way less per query than inference.

rs11 · on March 26, 2024

Monetizing a chatbot is one thing. Beating revenues every year when you are already making 300b a year is a whole different ball game There must be tens of execs who understand this but their payout depends on keeping status quo

vineyardmike · on March 26, 2024

The answer is far weirder - they had a chat bot, and no one even discussed it in the context of search replacements. They didn’t want to release it because they just didn’t think it should be a product. Only after OpenAI actually disrupted search did they start releasing Gemini/Bard which takes advantage of search.

ec109685 · on March 26, 2024

They were afraid to release it because of unaligned output and hallucinations.

ChatGPT showed that people could still get value out of something that wasn’t perfect.

E.g. they had this in their labs: https://www.theguardian.com/technology/2022/jun/12/google-en... from July, 2022z

halflings · on March 26, 2024

Agree re:hallucinations/safety issues, that was likely one of the main blockers.

And here's the sad part: they had this back in 2019... see this paper released in Jan 2020: https://blog.research.google/2020/01/towards-conversational-...

HarHarVeryFunny · on March 26, 2024

LaMBDA was also briefly available for public testing, but then rapidly withdrawn due to unhinged responses.

One advantage that OpenAI had over Google was having developed RLHF as a way to "align" the model's output to be more acceptable.

Part of Google's dropping the ball at that time period (but catching up now with Gemini) may also have been just not knowing what to do with it. It certainly wasn't apparent pre-ChatGPT that there'd be any huge public demand for something like this, or that people would find so many uses for it in API form, and especially so with LaMBDA's behavioral issues.

eitally · on March 26, 2024

My take as someone who worked in Cloud, closely with the AI product teams on GTM strategy, is that it was primarily the former: Google was always extremely risk averse when it came to AI, to the point that until Andrew Moore was pushed out, Google Cloud didn't refer to anything as AI. It was ML-only, hence the BigQuery ML, Video Intelligence ML, NLP API, and so many other "ML" product names. There was strong sentiment internally that the technology wasn't mature enough to legitimately call it "AI", and that any models adequately complex to be non-trivially explainable were a no-go. Part of this was just general conservatism around product launches within Google, but it was significantly driven by EU regulation, too. Having just come off massive GDPR projects and staring down the barrel of DMA, Google didn't want to do anything that expanded the risk surface, whether it was in Cloud, Ads, Mobile or anything else.

Their hand was forced with ChatGPT was launched ... and we're seeing how that's going.

tw04 · on March 26, 2024

Because Google can’t focus on a product for more than 18 months if it isn’t generating several billion in PROFIT. They are punch drunk on advertising.

Culonavirus · on March 26, 2024

They're like a hyperactive dog chasing its own tail. How many projects did they create only to shut them a bit later? All because there's always some nonsense to chase. Meanwhile the AI train has left the station without them and their search is now an ad infested hot piece of garbage. Don't even get me started on their customer/dev support or how aging things like Google Translate api got absolutely KILLED by GPT-4 like apis overnight.

Google has stage 4 leadership incompetency and can't be helped. The only humane option is euthanasia.

wstrange · on March 26, 2024

It's far too early to suggest Google will not capture value from AI. They have plenty of opportunity to integrate AI into their products.

earthnail · on March 26, 2024

Microsoft is sooooo far ahead in this game, it’s borderline ridiculous. Google really missed an opportunity to grab major market share from MS Office.

willvarfar · on March 26, 2024

Yes, this! Google Docs is basically basic. But imagine if, years ago, Google had added built-in LLM-based auto-complete and refactoring and summation tools to documents and presentations etc, years ago...

Oh to dream.

abraae · on March 26, 2024

For historical precedent see Xerox Parc.

readyplayernull · on March 26, 2024

IBM, Intel, Apple's Newton.

klodolph · on March 26, 2024

The story I like to tell for the Newton is that it was launched before the technology was ready yet. Like the Sega Game Gear. Old video phones. All those tablets that launched before the iPad.

They’re good ideas, but they shipped a few years too early, and the technology to make them work well at a good price point wasn’t available until later. Like, the Sega Game Gear had a cool active matrix LCD screen, but it took six AA batteries and the batteries only lasted like four hours.

technofiend · on March 26, 2024

The Palm Pilot V had a dockable cell phone modem, but the connectivity wasn't integrated into the OS. It worked but only as a demonstration. Then Palm released a model with integrated data, but the BlackBerry came out the same year. You can be first and still if someone comes along with a much more compelling product, that's the end of you.

Google has a few years left as a search company, but their enshittification of results has doomed them to replacement by LLMs. They seem to have forgotten Google pushed out their predecessors by having the best search results. Targeted advertisements don't qualify.

Rinzler89 · on March 26, 2024

>and the batteries only lasted like four hours

Still more than the OG Steam Deck today :)

sofixa · on March 26, 2024

Vastly depends on the game played and the settings. In a plane (so airplane mode, with Bluetooth headset) I played Hitman Absolution for 3 hours and still had 50%+ of the battery left. It was on minimal brightness because it was dark and didn't need more, but still.

Rinzler89 · on March 26, 2024

Yeah, no need to take a (semi)joke literally and go all technical to debunk it. Though without optimizations, battery life on the deck was lucky to hit 2h at first before valve brought in updates and people learned they had to cap resolution and FPS to increase battery life.

chillfox · on March 26, 2024

Kodak

Forgeties79 · on March 26, 2024

Man I remember my last semester of college taking a history of photography course that was only offered every 3-4 years by a pretty legendary professor. The the day before the first day of class (or super close), Eastman Kodak declared bankruptcy after what? 110 years?

He scrapped his day 1 lecture and threw together a talk - with photos of course - about Kodak and how an intrepid engineer developed then the company foolishly hid the first digital camera because it would compete with their film line.

Incredible lecturer for sure haha

willvarfar · on March 26, 2024

OpenAI lured everyone away from Google with way higher pay.

https://www.linkedin.com/posts/eolver_googles-defense-agains...

vineyardmike · on March 26, 2024

I think the TPU is simple. They do sell it (via cloud), but they focus on themselves first. When there was no shortage of compute, it was an also-ran in the ML hardware market. Now it’s trendy.

ChatGPT v Google is a far crazier history. Not only did Google invent Transformers, not only did Google open-source PaLM and Bert, but they even built chat tuned LLM chat bots and let employees talk with it. This isn’t a case where they were avoiding for disruption or protecting search - they genuinely didn’t see its potential. Worse, they got so much negative publicity over it that they considered it an AI safety issue to release. If that guy hadn’t gone to the press and claimed LaMDA was sentient than they may have entirely open sourced it like PaLM. This would likely mean that GPT-3 was open sourced and maybe never chat tuned either.

GPT-2 was freely available and OpenAI showed off GPT-3 freely as a parlor trick before ChatGPT came out. ChatGPT was originally the same - fun text generation as chat not a full product.

TLDR - Tensors probably didn’t have a lot of value until NVidia because scarce and they actively invented the original ChatGPT and “AI Safety” concerns caused them to lock it down.

romanovcode · on March 26, 2024

Pretty sure it is because if ChatGPT likes would update as frequently as google website index it would render search engines like google obsolete and thus make their revenue nonexistent.

snats · on March 26, 2024

to this day i am impressed that they have not figured out how to embed advertisments to bard outputs so that they can go free.

formercoder · on March 26, 2024

Googler here, if you haven’t looked at TPUs in a while check out the v5. They support PyTorch/JAX now, makes them much easier to use than TF only.

tw04 · on March 26, 2024

Where can I buy a TPU v5 to install in my server? If the answer is “cloud”: that’s why NVidia is wiping the floor.

foota · on March 26, 2024

How many people are out there buying H100s for their personal use?

michaelt · on March 26, 2024

Ah, but part of the reason for CUDA's success is that the open source developer who wants to run unit tests or profile their kernel can pick up a $200 card. That PhD student with a $2000 budget can pick up a card. Academic lab with $20,000 for a beefy server, or tiny cluster? nvidia will take their money.

And that's all fixed capital expenditure - there's no risk a code bug or typo by an inexperienced student will lead to a huge bill.

Also, if you're looking for an alternative to CUDA because you dislike vendor lock-in, switching to something only available in GCP would be an absurd choice.

jsight · on March 26, 2024

I'm really shocked at how dependent companies have become on the cloud offerings. Want a GPU? Those are expensive, lets just rent on Amazon and then complain about operational costs!

I've noticed this at companies. Yeah, the cloud is expensive, but you have a data center, and a few servers with RTX 3090s aren't expensive. A lot of research workloads can run on simple, cheap hardware.

Even older Nvidia P40s are still useful.

Workaccount2 · on March 26, 2024

Probably many orders of magnitude greater than those buying TPU's for personal use...

foota · on March 26, 2024

Technically correct, but only because TPUs aren't for sale. H100s cost like 30,000 USD, if you can even get one.

immibis · on March 26, 2024

So in other words, every AI company has at least 20.

rhdunn · on March 26, 2024

Probably not many. However, 4090s would be a different situation. There are plenty of guides on running LLMs, stable diffusion, etc. on local hardware.

The H100s would be for businesses looking to get into this space.

ShamelessC · on March 26, 2024

You probably can't even rent them from Google if you wanted to, in my experience.

inhumantsar · on March 26, 2024

https://cloud.google.com/tpu

jedberg · on March 26, 2024

I think OPs point was Google claims to have TPUs in their cloud but in reality they are rarely available.

xrd · on March 26, 2024

This article really connected a lot of abstract pieces together into how they flow through silicon. I really enjoyed seeing the simple CISC instructions and how they basically map on to LLM inference steps.

ThinkBeat · on March 26, 2024

This is probably a dumb question, that just shows my ignorance but I keep hearing on the consumer end of things that the M1-M4 chips are good at some AI.

The most important for me these days would be Photoshop, Resolve etc, and I have seen those run a lot faster on Apple new proprietary chips than on my older machines.

That may not translate well at all to what this chip can do or what a H100s can do.

But does it translate at all?

Of course Apple is not selling their propritary chips either so for it to be practical Apple would have to release some from of an external, server something stuffed with their GPUs and AI chips

singhrac · on March 26, 2024

I’m also not quite an expert, but have benchmarked an M1 and various GPUs.

The M* chips have unified memory and (especially Pro/Max/Ultra) have very high memory bandwidth even compared eg to a 1080 (an M1 Ultra has memory bandwidth between 2080 and 3090).

At small batch sizes (including 1, like most local tasks), inference is bottlenecked by memory bandwidth, not compute ability. This is why people say the M* chips are good for ML.

However H100s are used primarily for training (at enormous batch sizes) and require lots of interconnect to train large models. At that scale, arithmetic intensity is very high, and the M* chips aren’t very competitive (even if they could be networked) - they pick a different part of the Pareto power/efficiency curve than H100s which guzzle up power.

uptownfunk · on March 26, 2024

What Google really needs to do is get into the 2nm EUV space and go sub 2nm. When they have the electro lithography (or whatever tech ASML has that prints on the chips) then you have something really dangerous. Probably a hardcore Google X moonshot type project. Or maybe they have 500mm sitting around to just buy one of the machines. If their tpu are really that good - maybe it is a good business - especially if they can integrate all the way to having their own fab with their own tech

ejiblabahaba · on March 26, 2024

This is frankly infeasible. Between the decades of trade secrets they would first need to discover, the tens- or maybe hundreds- of billions in capital needed to build their very first leading edge fab, the decade or two it would take for any such business to mature to the extent it would be functional, and the completely inconsequential volumes of devices they'd produce, they would probably be lighting half a trillion dollars on fire just to get a few years behind where the leading edge sits today, ten or more years from now. The only reason leading edge fabs are profitable today is because of decades of talent and engineering focused on producing general purpose computing devices for a wide variety of applications and customers, often with those very same customers driving innovation independently in critical focus areas (e.g. Micron with chip-on-chip HDI yield improvements, Xilinx with interdie communication fabric and multi chip substrate design). TPUs will never generate the required volumes, or attract the necessary customers, to achieve remotely profitable economies of scale, particularly when Google also has to set an attractive price against their competitors.

If Google has a compelling-enough business case, existing fabs will happily allocate space for their hardware. TPU is not remotely compelling enough.

tibbydudeza · on March 27, 2024

I listened to a talk by Jim Keller from Tens torrent and their different approach to making AI cores - 5 Risc V cores one core for loading data, one for uploading data and the rest dedicated to performing matrix operations.

He did mention Google's TPU and the fact it was like programming a VLIW and they had about 500 people dedicated to their compiler.

rhelz · on March 26, 2024

Quote from the OP: "The TPU v1 uses a CISC (Complex Instruction Set Computer) design with around only about 20 instructions."

chuckle CISC/RISC has gone from astute observation, to research program, to revolutionary technology, to marketing buzzwords....and finally to being just completely meaningless sounds.

I suppose it's the terminological circle of life.

dmoy · on March 26, 2024

Idk maybe it's just me, but what I was taught in comp architecture was that cisc vs risc has more to do with the complexity of the instructions, not the raw count. So TPU having a smaller number of instructions can still be a cisc if the instructions are fairly complex.

Granted the last time I took any comp architecture was a grad course like 15 years ago, so my memory is pretty fuzzy (also we spent most of that semester dicking around with Itanium stuff that is beyond useless now)

cowsandmilk · on March 26, 2024

You’re seeming to imply the number of instructions available is what distinguishes CISC, but it never has been.

xarope · on March 26, 2024

Right. CISC vs RISC has always been about simplifying the underlying micro-instructions and register set usage. It's definitely CISC if you have a large complex operation on multiple memory direct locations (albeit the lines between RISC and CISC being blurred, as all such polar philosophies do, when real-life performance optimizations come into play)

LelouBil · on March 26, 2024

The fact that it's opposed to RISC (Reduced Instruction Set) adds to the confusion.

rhelz · on March 26, 2024

Guys....what are the instructions? The on-chip memory they are talking about is essentially...a big register set. So we have load from main memory into registers, store from registers into main memory, multiply matrices--source and dest are stored in registers....

We have a 20 instruction, load-store cpu....how is this not RISC? At least RISC how we used the term in 1995?

brigade · on March 26, 2024

Its design follows the old idea that an ISA should be designed for assembly programmers; that instructions should implement complex or higher-level functions intended for a programmer to use directly.

RISC rejected that notion (among other things) and focused on designing ISAs for a compiler to target when compiling high level languages, without wasting silicon on instructions a compiler cannot easily use. For the TPU, a compiler cannot easily take a 256x256 matrix multiply written in a high-level language like C and emit a Matrix_Multiply instruction.

dmoy · on March 26, 2024

I think the "multiply matrices" instruction is the one that makes it a cisc

Symmetry · on March 26, 2024

I don't think it makes any sense to talk about all on-chip memory as a register set. In practice most uses of REP MOVS these days don't leave L3$ but because it's an instruction that runs for a highly variable amount of time while transferring data between different locations we consider it very CISCy. And the TPU also has instructions to transfer data over PCIe to and from the TPU's local DDR3 memory as well, which isn't on the chip and I hope you would agree that it's not like a register at that point.

If every instruction was always one 256 element unit maybe you could make the analogy stick. But it's working with 256*N element operations.

ThinkBeat · on March 26, 2024

Given what seems to be an enormous demand for fab space, when Microsoft or Google create a proprietary chip and need it produced how do they get to the front of the line?

Are they simple enough that "older outdated less in demand" fabs can produce them?

I know Apple and Nvidea has a lock on a lot of fab space?

cavisne · on March 27, 2024

They operate on outdated fab's (roughly state of the art - 1)

https://en.wikipedia.org/wiki/Tensor_Processing_Unit#Product...

They also do have a serious presence/spend on things like HBM, semianalysis has some good pieces on this.

sroussey · on March 26, 2024

I wonder how hardware will change if LLMs quantized to -1,0,1 really take off.

11101010001100 · on March 26, 2024

But where is the puck going?

kleton · on March 26, 2024

Which ocean creature name is the current TPU?

layer8 · on March 26, 2024

> However, although tensors describe the relationship between arbitrary higher-dimensional arrays, in practice the TPU hardware that we will consider is designed to perform calculations associated with one and two-dimensional arrays. Or, more specifically, vector and matrix operations.

I still don’t understand why the term “tensor” is used if it’s only vectors and matrices.

WhitneyLand · on March 26, 2024

It says: tensors describe the relationship between high-d arrays

It does not say: tensors “only” describe the relationship between high-d arrays

The term “tensor” is used because it covers all cases: scalars, vectors, matrices, and higher-dimensional arrays.

Tensors are still a generalization of vectors and matrices.

Note the context: In ML and computer science, they are considered a generalization. From a strict pure math standpoint they can be considered different.

As frustrating as it seems one is not really more right and context is the decider. There are lots of definitions across STEM fields that change based on the context or field they’re applied to.

adrian_b · on March 26, 2024

The word tensor has become more ambiguous during the time.

Before 1900, the use of the word tensor was consistent with its etymology, because it was used only for symmetric matrices, which correspond to affine transformations that stretch or compress a body in certain directions.

The square matrix that corresponds to a general affine transformation can be decomposed into the product of a tensor (a symmetric matrix which stretches) and a versor (a rotation matrix, which is antisymmetric and which rotates).

When Ricci-Curbastro and Levi-Civitta have published the first theory of what now are called tensors, they did not define any new word for the concept of a multidimensional array with certain rules of transformation when the coordinate system is changed, which is now called tensor.

When Einstein has published the Theory of General Relativity during WWI in which he used what is now called tensor theory, for an unknown reason and without any explanation for this choice he has begun to use the word "tensor" with the current meaning, in contrast with all previous physics publications.

Because Einstein has become extremely popular immediately after WWI, his usage of the word "tensor" has spread everywhere, including in mathematics (and including in the American translations of the works of Ricci and Levi-Civita, where the word tensor has been introduced everywhere, despite the fact that it did not exist in the original).

Nevertheless, for many years the word "tensor" could not be used for arbitrary multi-dimensional arrays, but only for those which observe the tensor transformation rules with respect to coordinate changes.

The use of the word "tensor" as a synonym for the word "array", like in ML/AI, is a recent phenomenon.

Previously, e.g. in all early computer literature, the word "array" (or "table" in COBOL literature) was used to cover all cases, from scalars, vectors and matrices to arrays with an arbitrary number of dimensions, so no new words are necessary.

Symmetry · on March 26, 2024

Famously whether free helium is a molecule or not depends on whether you're talking to a physicist or a chemist.

But yeah, people in different countries speak different languages and the same sound, like "no" can mean a negation in English but a possessive in Japanese. And as different fields establish their jargons they often redefine words in different ways. It's just something you have to be aware of.

jeffhwang · on March 26, 2024

(I think) technically, all of these mathematical objects are tensors of different ranks:

0. Scalar numbers are tensors of rank 0.

1. Vectors (eg velocity, acceleration in intro high school physics) are tensors of rank 1.

2. Matrices that you learn in intro linear anlgebra are tensors of rank 2. Nested arrays 1 level deep, aka a 2d array.

0. Tensors numbers are tensors of rank 3 or higher. I explain this as ‘nested arrays’ to people with programming backgrounds as nested arrays of arrays with 3dimensions of arrays or higher.

But I’m mostly self-taught in math so ymmv.

necroforest · on March 26, 2024

It's branding (see: TensorFlow); also, pretty much anything (linear) you would do with an arbitrarily ranked tensor can be expressed in terms of vector ops and matmuls

nxobject · on March 26, 2024

"Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

bryzaguy · on March 26, 2024

It’s the perfect name for my next EDM SoundCloud mix, though.

selcuka · on March 26, 2024

> "Fixed-Function Matrix Accelerator" just doesn't have the same buzzy ring to it.

FixMax™ or Maxxelerator™ would be good brands.

ralusek · on March 26, 2024

If nothing else, the term "tensor" is shorter than "vectors and matrices," and then has the added benefit of representing n-dimensional arrays.

layer8 · on March 26, 2024

How is that an added benefit if the hardware doesn’t actually support n-dimensional arrays (other the n = 1 and 2)?

And, strictly speaking, a vector can be considered a 1xn (or nx1) matrix, so Matrix Processing Unit would have been fine.

thatguysaguy · on March 26, 2024

At the end of the day all the arrays are 1 dimensional and thinking of them as 2 dimensional is just an indexing convenience. A matrix multiply is a bunch of vector dot products in a row. Higher tensor contractions can be built out of lower-dimensional ones, so I don't think it's really fair to say the hardware doesn't support it.

whimsicalism · on March 26, 2024

it’s an abstraction, just like 2d arrays

layer8 · on March 26, 2024

I’d say it’s more like calling an ALU that can perform unary and binary operations (so 1 or 2 inputs) an “array processing unit” because it’s like it can process 1- and 2-element arrays. ;)

whimsicalism · on March 26, 2024

what? the ml framework can support n-dimensional arrays. that’s what i mean by an abstraction

adrian_b · on March 26, 2024

I do not know which is the real origin of the fashion to use the word tensor in the context of AI/ML.

Nevertheless, I have always interpreted it as a reference to the fact that the optimal method of multiplying matrices is to decompose the matrix multiplication into tensor products of vectors.

The other 2 alternative methods, i.e. decomposing the matrix multiplication into scalar products of vectors or into AXPY operations on pairs of vectors, have a much worse ratio between computation operations and transfer operations.

Unfortunately, most people learn in school the much less useful definition of the matrix multiplication based on scalar products of vectors, instead of its definition based on tensor products of vectors, which is the one needed in practice.

The 3 possible methods for multiplying matrices correspond to the 6 possible orders for the 3 indices of the 3 nested loops that compute a matrix product.

thatguysaguy · on March 26, 2024

Well, in the transformer forward pass there are a bunch of 4-dimensional arrays being used.

smilekzs · on March 26, 2024

Came in to say this.

The Einsum notation makes it desirable to formulate your model/layer as multi-dimensional arrays connected by (loosely) named axes, without worrying too much about breaking it down to primitives yourself. Once you get used to it, the terseness is liberating.

sillysaurusx · on March 26, 2024

I was confused as hell for a long time when I first got into ML, until I figured out how to think about tensors in a visual way.

You're right: fundamentally ML is about vector and matrix operations (1D and 2D). So then why are most ML programs 3D, 4D, and in a transformer sometimes up to 6D (?!)

One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

Another guess is that it's to represent "time" as in, think of how transformers work: they generate a token, then another given the previous, then a third given the first two, etc. That's a certain way of describing "time". But it turns out that transformers don't do this as a 3D or 4D dimension. It only needs to be 2D, because tokens are 1D -- if you're representing tokens over time, you get a 2D output. So even with a cutting edge model like transformers, you still only need plain old 2D matrix operations. The attention layer creates a mask, which ends up being 2D.

So then why do models get to 3D and above? Usually batching. You get a certain efficiency boost when you pack a bunch of operations together. And if you pack a bunch of 2D operations together, that third dimension is the batch dimension.

For images, you typically end up with 4D, with the convension N,C,H,W, which stands for "Batch, Channel, Height, Width". It can also be N,H,W,C, which is the same thing but it's packed in memory as red green blue, red green blue, etc instead of all the red pixels first, then all the green pixels, then all the blue pixels. This matters in various subtle ways.

I have no idea why the batch dimension is called N, but it's probably "number of images".

"Vector" wouldn't quite cover all of this, and although "tensor" is confusing, it's fine. It's the ham sandwich of naming conventions: flexible, satisfying to some, and you can make them in a bunch of different varieties.

Under the hood, TPUs actually flatten 3D tensors down into 2D matrix multiplications. I was surprised by this, but it makes total sense. The native size for a TPU is 8x128 -- you can think of it a bit like the native width of a CPU, except it's 2D. So if you have a 3x4x256 tensor, it actually gets flattened out to 12x256, then the XLA black box magic figures out how to split that across a certain number of 8x128 vector registers. Note they're called "vector registers" rather than "tensor registers", which is interesting. See https://cloud.google.com/tpu/docs/performance-guide

layer8 · on March 26, 2024

Thanks for the background! I still don’t think it’s appropriate to call a batch of matrices a tensor.

sillysaurusx · on March 26, 2024

You'd hate particle physics then. "Spin" and "action" and so on are terrible names, but scientists live with them, because convention.

Convention dominates most of what we do. I'm not sure there's a good way around this. Most conventions suck, but they were established back before there was a clear idea of what the best long-term convention should be.

layer8 · on March 26, 2024

At least in physics you can understand how the terms came about historically, where at some point they made sense. But “tensor” here, as note in sibling comments, seems to have been chosen primarily for marketing reasons.

FridgeSeal · on March 26, 2024

It comes from the maths, where tensors are generalisations of matrices/vectors. They got cribbed, because the ML stuff directly used a bunch of the underlying maths. It’s a novel term, it sounds cool, not surprised it also then got promoted up into a marketing term.

cowsandmilk · on March 26, 2024

> tensors are generalisations of matrices/vectors.

Is that what they are though? Because that really is not my understanding. Tensors are mappings which not all matrices and vectors are. Maybe the matrices in ML layers are all mappings, but a matrix in general is not, not is a vector always a mapping. So tensors aren’t generalizations of matrices and vectors.

kergonath · on March 26, 2024

> Tensors are mappings which not all matrices and vectors are.

A tensor in Physics is an object that follows some rules when changing reference frame. Their matrix representation is just one way of writing them. It’s the same with vectors: a list with their components is a representation of a vector, not the vector itself. We can think about it that way: the velocity of an object does not depend on the reference frame. Changing the axes does not make the object change its trajectory, but it does change the numerical values of the components of the velocity vector.

> So tensors aren’t generalizations of matrices and vectors.

Indeed. Tensors in ML have pretty much nothing to do with tensors in Maths or Physics. It is very unfortunate that they settled on the same name just because it sounds cool and sciency.

whimsicalism · on March 26, 2024

why not? multilinear mappings can be represented by “batches of matrices” and that’s all that a tensor is

dekhn · on March 26, 2024

I think to be a tensor, all the bases should be independent. The way I think of it is you use a tensor to describe the rotation of an asteroid around all its major axes (inertia tensor?)

WhitneyLand · on March 26, 2024

It is appropriate in ML and computer science. It’s not in pure math.

There are many terms in math and science where the definition changes based on the context.

parpfish · on March 26, 2024

just because an image is 2-D doesn’t mean that the model can’t use higher dimensional representations in subsequent layers.

For an image, you could imagine a network learning to push the image through a filter bank that does oriented local frequency decomposition and turns it into 4D {height}x{width}x{spatial freq}X{orientation} before dealing with color channels or image batches

bsdpufferfish · on March 26, 2024

higher dimensional vectors or matrices are still not tensors.

samstave · on March 26, 2024

For whatever reason, I have held a mental image of a Tensor as a Tesseract/HyperCube where the connections are like the Elastic workout bands where they have differing tensile resistances, and they pull on one another to create their encapsulated info-cluster - but I have no clue if thats truly an accurate depiction, but it works in my head....

https://upload.wikimedia.org/wikipedia/commons/thumb/1/1a/Or...

sillysaurusx · on March 26, 2024

I'm reluctant to tell people "no, don't think of it that way," especially if it works for you, because I don't know the best way to think of things. I only know what works well for me. But for me, it'd be ~impossible to use your mental model to do anything useful. That doesn't mean it's bad, just that I don't understand what you mean.

The most straightforward mental model I've ever found for ML is, think of it as 2D matrix operations, like high school linear algebra. Matrix-matrix, matrix-vector, vector-matrix, and vector-vector will get you through 95% of what comes up in practice. In fact I'm having trouble thinking of something that doesn't work that way, because even if you have an RGB image that you multiply against a 2D matrix (i.e. HxWxC multiplied by a mask) the matrix is still only going to apply to 2 of the channels (height and width), since that's the only thing that makes sense. That's why there's all kinds of flattening and rearranging everywhere in practice -- everyone is trying to get a format like N,C,H,W down to a 2D matrix representation.

People like to talk up the higher level maths in ML, but highschool linear algebra (or for the gamedevs in the audience, the stuff you'd normally do in a rendering engine) really will carry you most of the way through your ML journey without loss of generality. The higher level maths usually happens when you start understanding how differentiation works, which you don't even need to understand until way later after you're doing useful things already.

samstave · on March 26, 2024

>One reasonable guess is that the third dimension is time. Actually not. It turns out that time is pretty rare in ML, and it's only (relatively) recently that it's been introduced into e.g. video models.

WRT to ML - may time be better thought of where a thing lives in relation to other things that occurred within the same temporal window?

so "all the shit that happened in 1999 also has an expression within this cluster of events from 1999" - but the same information appears in any location where it is relationally contextual to the other neighbors, such as the SUBJECT of the information? Is this accurate to say why its 'quantum' because the information will show up depending on where the Observation (query) for it is occurring?

(sorry for my kindergarten understanding of this)

shrubble · on March 26, 2024

Tensor is from mathematics and was popularized over a century ago.

layer8 · on March 26, 2024

I know what a tensor is mathematically. However, as far as I can see, ML isn’t based on tensor calculus as such.

phkahler · on March 26, 2024

Something similar happens on Wikipedia, where topics that use math inevitably get explained in the highest level math possible. It makes topics harder to understand than they need to be.

jiggawatts · on March 26, 2024

As a helpful Wiki editor just trying to make sure that we don't lead people astray, I've made some small changes to clarify your statement:

In the virtual compendium of Wikipedia, an extensive repository of human knowledge, there is a discernible proclivity for the hermeneutics of mathematically-infused topics to be articulated through the prism of esoteric and sophisticated mathematical constructs, often employing a panoply of arcane lexemes and syntactic structures of Greek and Latin etymology. This phenomenon, redolent of an academic periphrasis, tends to transmute the exegesis of such subjects into a crucible of abstruse and high-order mathematical discourse. Consequently, this modus operandi obfuscates the intrinsic didactic intent, thereby precipitating an epistemological chasm that challenges the layperson's erudition and obviates the pedagogical utility of the exposition.

xarope · on March 26, 2024

scarily, I actually understood this.

whimsicalism · on March 26, 2024

multidimensional arrays are multilinear mappings, and that is how they are used in ml usually. it seems fine to me

a_wild_dandan · on March 26, 2024

Every tensor is just a stack of vectors wearing a trench coat.

IncreasePosts · on March 26, 2024

Sigh...learning about TPUs a decade ago made me invest heavily in $GOOG for the coming AI revolution...got that one 100% wrong. +400% over 10 years isn't bad but I can't help but feel shortchanged seeing nvidia/etc

brcmthrowaway · on March 26, 2024

An out and out cherrypicker complaining about missing out on gains.. couldnt write a better script

IncreasePosts · on March 26, 2024

Buying and holding stock for 10 years is not cherry picking as I understand it.

smallmancontrov · on March 26, 2024

+400% over 10 years isn't bad.

layer8 · on March 26, 2024

It’s almost 15% per year, quite a lot.

fragmede · on March 26, 2024

yeah, but nvda is up like 500% in 2 years, so if you’re naive enough to think you can time the market, you’d have fomo over having invested in the “wrong” thing.

bongodongobob · on March 26, 2024

Seeing the difference between GPT2 and GPT3 made me run to NVDA immediately. One of the few bets in my life I've ever been confident about. I think NVDA was a pretty reasonable bet on AI like 5+, maybe 10 years ago when deep learning was ramping up.

genidoi · on March 26, 2024

I don't think anybody in 2014 believed that the performance of GPT-4/Claude Opus/... was 10 years away. 25 years maybe, 50 years probably, but not 10.

bongodongobob · on March 26, 2024

It wasn't just that, it was also all the deep learning stuff. Atari games playing themselves, deep style and variants. There was some interesting image generation happening. AlphaGo was 2015, etc. that was really when things started accelerating imo.

Symmetry · on March 26, 2024

The potential usefulness of things like TPUs actually made me invest in Broadcomm, which helped Google design them and could potentially help Amazon or however else design their equivalents. But I'm also long NVidia and a half dozen other companies with AI exposure while still keeping most of my money in index funds.

astrange · on March 26, 2024

Could've bought $QQQ unless you expected one of the components to do especially badly.

brcmthrowaway · on March 26, 2024

Broadcom did the TPU

HarHarVeryFunny · on March 26, 2024

Not the whole design - the core processing part (systolic array - matrix multiplier) was designed by Google, but Broadcom designed all the highspeed chip I/O and mapped the design onto TSMCs tools/rules.