Intel Gen12/Xe Graphics Have AV1 Accelerated Decode – Linux Support Lands

ksec · on July 25, 2020

Well the title would be more accurate as GPGPU Accelerated Decode.

It gives the false assumption that AV1 is being Hardware Accelerated in Intel Xe GPU which often means dedicated decoding block to decode the video codec in lowest energy usage possible.

tagrun · on July 25, 2020

That's not GPGPU/compute though and Gen 12+ does have hardware accelerated decoding for AV1:

https://github.com/intel/media-driver/blob/master/README.md#...

"AV1 hardware decoding is supported from Gen12+ platforms."

fulafel · on July 26, 2020

"With some 33k lines of new code, hardware AV1 decode acceleration is in place for Intel Gen12 graphics" - that would be some tight GPU code!

GPGPU implementation of AV1 would be a useful thing though. Anyone know of implementations?

Sebb767 · on July 25, 2020

It's going to be highly interesting whether Intel can compete in this market. GPUs are quite hard and, compared CPUs, haven't stagnated much.

hydroreadsstuff · on July 25, 2020

If you factor out chip size and clock frequency, they do actually stagnate. They add new features, but legacy rendering performance is improved by single-digit percentage points, I believe.

If you look at A100 compared to V100 for e.g. FP32 FMA performance (not tensor). 14.1TFLOPS -> 19.5 = +38%, for 2x transistors (16->7nm), +35% SMs and 250W->400W is not that great. Note that NVIDIA uses boost clock for all A100 numbers and seems not have published any base clock so far. So their is a chance that actual sustained A100 performance is lower.

Turing GPUs have rather large dies. TU102 754nm vs GP102 471nm. So comparing them as is, isn't quite fair.

On the CPU front Intel used to use rather small dies for consumers (and even use die shrinks to just cram more chips onto a waver -> more $$$), but now that AMD forces their hand, they are giving in. But of course a lot this area goes into extra cores, not single threaded performance (diminishing returns there).

pizza234 · on July 25, 2020

Well, from the user perspective, chip size and clock are crucial factors, as GPU workloads tend to be parallelizable. Besides, chip size and clock don't come for free, so I think they shouldn't be discarded regardless.

AFAIK, with each generation, Nvidia has increased raw performance by 20-30%, which is significant.

On the other hand, the wall of physical limits is getting closer (which may restrict the chip size), but until then, GPUs are faring very well.

GPUs functionalities also have a very different nature (as you point out), but this can play well for the user. Realism depends on many functionalities, which have plenty of headroom for improvement (in the sense of hardware support); see ray tracing, which supposedly, is going to be significantly faster on Ampere.

derefr · on July 25, 2020

> legacy rendering performance is improved by single-digit percentage points, I believe.

I feel like one would expect this, given that the rendering engines of any given generation, are architected under specific assumptions about the optimal relative "shape" of their graphics pipeline.

The clearest example being old game consoles. You could write a SNES emulator for the SuperFX coprocessor, that used the host's GPU to render the polygons, but the rendering would not go any faster than it does on the SNES, because the draw commands are just being trickled out as the physics-engine work necessary to update their positions gets done in fits and starts per scan-line. That trickling-out was necessary on a SNES, because there wasn't time to run all that logic during VBLANK; but in the modern era, we have the opposite problem — that the logic could all be completed during VBLANK (with tons of room to spare), but instead is being "dragged out" across 512 HBLANKs, such that the GPU only gets the full picture of the completed scene at the last moment. (Despite the recent source-code leak, rewriting StarFox to make it render at 60FPS or more will not be a trivial process.)

The same thing is true, to a varying extent, of all legacy renderers. They're written for graphical pipelines that just don't match the one we have now. The one we have now is "wider" — more parallel — in so many places, but if it's just being used to recapitulate a long, serial set of fixed-function legacy pipeline stages, that width doesn't help it any.

tagrun · on July 25, 2020

Video encoding/decoding circuitry (which the post is about) is not a part of the GPU, so that's a bit tangential.

hddherman · on July 25, 2020

We have definitely moved on from the stagnation era of CPU development, if the advancements of AMD are anything to go by.

esgwpl · on July 25, 2020

Not to take away from their success but their recent progress have been more like them catching up to Intel. It will be interesting to see if they can truly pull ahead though.

vbezhenar · on July 25, 2020

Intel is already behind judging by core count in every segment. You can't buy 16-core consumer Intel CPU, you can't buy 64-core workstation or server Intel CPU.

pedrocr · on July 25, 2020

Or an 8-core laptop CPU. AMD has a win across the board right now.

johncalvinyoung · on July 25, 2020

Intel does have 8-core i9 laptop CPUs in the H bracket (45W). https://www.intel.com/content/www/us/en/products/processors/... released Q2'19.

What they don't have a SKU for quite yet is a 8C16T U-class (15W) part like the AMD Ryzen 7 4800U. https://www.amd.com/en/products/apu/amd-ryzen-7-4800u

pedrocr · on July 25, 2020

I stand corrected. I only shop for laptops with the U class parts.

johncalvinyoung · on July 25, 2020

Entirely fair. My usage is on the other end--I wouldn't be familiar with the 15W offerings if I hadn't been tracking Ryzen mobile development or shopping for a new machine for my father.

selectodude · on July 25, 2020

https://www.apple.com/newsroom/2019/05/apple-introduces-firs...

Hm?

dvfjsdhgfv · on July 25, 2020

Well, Threadripper 3990X CPU beats Xeon Platinum 8280 by a large margin. I can't imagine choosing the latter over the former for any reason, really.

hddherman · on July 25, 2020

Depends on how you look at it, with pure horsepower they have passed Intel in performance already, seeing how they can just put 64 cores on a CPU. The single thread gap has also been closing rapidly, and with Zen 3 on the horizon they could pass Intel in this segment, too, especially since Intel has been delaying 10nm forever.

pjmlp · on July 25, 2020

Except most devs don't have any idea of what to do with them, and they end up mostly managing OS processes, containers, GC and JIT background processing, VM instances.

pedrocr · on July 25, 2020

Maybe for 64 that's true but for mainstream usage getting 8 cores in a laptop instead of 4 is a massive gain. That's effectively what has happened in the current laptop lines with AMD having twice the cores with the same power usage and single-thread performance. Here's the comparison of the top-spec AMD and Intel CPUs available in the just released T14s:

https://www.cpubenchmark.net/compare/Intel-i7-10610U-vs-AMD-...

If your workload is running a single-app that may be underused but particularly in home-office mode I'm constantly doing VCs, having 4 or 5 browser tabs that are heavy, Office for documents, etc. It doesn't matter if all those are a single thread, I'd be filling up the 8 cores and probably taking advantage of the 16 threads from SMT as well.

pjmlp · on July 25, 2020

Except mainstream usage is mostly about Core i3 and i5 laptops, yes even for coding.

Those are the typical units that externs get from customers' IT, when it isn't some kind of cloud based VM.

And regular consumers don't even know what they own, rather what the guy at the store or some relative has given as advice.

pedrocr · on July 25, 2020

All Intel options on the T14s are 4 core. The only other AMD option is still 6 core. The slower AMD part is still twice as fast as the fastest Intel chip:

https://www.cpubenchmark.net/compare/Intel-i7-10610U-vs-AMD-...

AMD wins on price as well. That people don't usually make good buying choices is not an argument about CPU performance.

pjmlp · on July 25, 2020

It is because at the end of the day that dictates what stays on the market and what goes home.

I thought we have learned by now that it isn't the best tech that wins.

pedrocr · on July 25, 2020

Your contention was that the extra performance provided by AMD was not usable, not that it wouldn't win in the market. So we were indeed discussing who had the best tech, even if that's not all that matters for commercial success.

pjmlp · on July 26, 2020

Because those things go hand in hand, the large majority of developers aren't doing HPC, Fintech or work at FAANG, or even posting on HN and Reddit, rather they are the so called "dark matter developers" whose applications must run on those Core i3 and i5 used by the consumers at large.

As such most developers only bother to use what they already know and take very little effort for adding any form of parallelism or concurrency to their applications.

Android and UWP since the start have taken architecture decisions that forbid synchronous code, because both companies came to the conclusion that if that would be available, the developers would write single threaded code as they have been doing for years, so they took that option out of the platform.

pedrocr · on July 26, 2020

You've just switched to a totally different discussion. I don't agree with it either, because I see everyone around me in home-office, not even developers just workers in a normal corporate environment, that can definitely use that extra performance. But that wasn't the point being discussed.

pjmlp · on July 26, 2020

Me too I see them, using Core i3 and i5 rented from our customer IT department to integrate into their networks and development stacks.

I am yet to see anyone using one of those AMD CPUs that get so high praises on HN.

pedrocr · on July 26, 2020

They've only now started to roll out. Our corporate IT is now putting them in the default config for everyone's laptop.

eikenberry · on July 25, 2020

What do you mean? Those are all great uses for the increased core count. Run more things easier and faster is the general goal.

pjmlp · on July 25, 2020

Kind of, watch how much they spend idle because there isn't enough work to keep them really busy.