Wow, I've been out of the GPU game for a year or two now and it's clear how the market has shifted (or how NVIDIA wants it to move). Back in the day we'd keep asking NVIDIA and AMD for half precision support, looks like not only did they do that by now but there's 8-bit integer support too! I was about to say "well who the heck would be able to use 8-bit integers for much?" when I saw in TFA: "offering an 8-bit vector dot product with 32-bit accumulate."
In case it's not clear from the title (which used to read "...47 INT TOPS"), that's 47 [8-bit integer] tera-operations-per-second. Anandtech says it will "... offer a major boost in inferencing performance, the kind of performance boost in a single generation that we rarely see in the first place, and likely won’t see again." No kidding!
I've read a lot about fp16 being good enough for training, but the thing that people don't mention is that just swapping fp32 for fp16 will make things fail because your deep learning framework doesn't implement everything you use for fp16, and after you fix that, it will probably diverge because standard practices aren't used to dealing with such a limited range.
Which isn't to say that the Deep Learning stacks won't get there eventually, but at the moment it's not as easy as flipping a switch.
Fair enough; when I said most people I meant companies who are not AmaGooFaceSoft and their international equivalents. Though I can't quite tell if you're doing GPU batch predictions and storing them or doing them in realtime with Spark Streaming.
Unrelated question though: any chance you will do blog post/paper about how DSSTNE does automatic model parallelism and gets good sparse performance compared to cuSparse/etc?
I wish there was a website where i can type in my device, it gives me the FP32 FLOPS, and then tells me where that device would show up and place on past top 500s.
IIRC that was the kind of ad-hoc data discovery and munging that Wolfram Language was demoed as being good for. I thought W-L had an online IDE but I can't find one now.
For some reason my brain skipped "Nvidia" and I assumed this was about a budget/low-range version of the Tesla Model S. They already have the Tesla 60, 60D, 70, 70D, 75, 75D, 85, P85D, P90D, and now P100D. Why not add the P40?
Nvidia needs a "Ludicrous Mode" overclock setting for these cards. Push the button for 2.5 seconds of super-high frame rate! With some cool down time required.
>Nvidia needs a "Ludicrous Mode" overclock setting for these cards. Push the button for 2.5 seconds of super-high frame rate! With some cool down time required.
This actually is a thing with GPU Boost 3.0. The card boosts up automatically if the temperatures are low, though lately there were issues with the GPU clock frequency oscillating because there wasn't enough hysteresis.
(This, by the way, means that GPUs are thermally constrained rather than timing-constrained.)
Are there any IaaS' that offer access to GPU hardware. I have always wanted to play around with this stuff (as I'm completely ignorant of GPU tech these days) but I'm not interested in buying hardware.
EDIT... apparently when I last googled years ago this was not the case. I wish I could delete this comment :(
I don't know why, but no public cloud to my knowledge offered even the Maxwell GPUs.
As cool as these cards are, I really hope they become available on AWS soon. The current AWS GPU instances are so weak we're contemplating buying a physical desktop setup.
In case it's not clear from the title (which used to read "...47 INT TOPS"), that's 47 [8-bit integer] tera-operations-per-second. Anandtech says it will "... offer a major boost in inferencing performance, the kind of performance boost in a single generation that we rarely see in the first place, and likely won’t see again." No kidding!