Hacker Newsnew | past | comments | ask | show | jobs | submit | mikepapadim's commentslogin

Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...


TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393



I believe these are SIMD. Tensor cores require MMA family of instructions. Ask me how I know. :)

https://github.com/m4rs-mt/ILGPU/compare/master...lostmsu:IL...

Good article: https://alexarmbr.github.io/2024/08/10/How-To-Write-A-Fast-M...



Highlights Simpler execution via Java argfiles Improved performance on FP16/Int8 LLM Inference on hashtag#Nvidia GPUs Extended reduced precision type support for GPUs (Int8, fp16) Zero-copy object support through project Panama Support for compressed oops on modern JVMs New cross-platform SDK distribution (soon hashtag#SDKMAN! https://lnkd.in/d8pGHYy5) Official TornadoVM dependencies now published on Maven Central. (https://lnkd.in/dDRZj8ru)



  Location: UK
  Remote: Yes
  Willing to relocate: No
  Technologies: Java, C++, Python, Cuda, OpenCL, Docker, ONNXRT,Git, Apache TVM, Compilers, GPUs
  Résumé/CV:https://github.com/mikepapadim && https://www.linkedin.com/in/michalis-papadimitriou/
  Email:mpapadimitriou92 [ΑΤ] gmail.com


https://github.com/beehive-lab/GPULlama3.java

We took Llama3.java and we ported TornadoVM to enable GPU code generation. Apparrently, the first beta version runs on Nnvidia GPUs, while getting a bit more than 100 toks/sec for 3B model on FP16.

All the inference code offloaded to the GPU is in pure-Java just by using the TornadoVM apis to express the computation.

Runs Llama3 and Mistral models in GGUF format.

It is fully open-sourced, so give it a try. It currently run on Nvidia GPUs (OpenCL & PTX), Apple Silicon GPUs (OpenCL), and Intel GPUs and Integrated Graphics (OpenCL).


Java to OpenCL and PTX inference of Llama3 through TornadoVM


Llama Deck is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download different llama implementations and llama2-like transformer-based LLM models. We also provide some Docker images based on some implementations, which can be easily deploy and run through our tool.


A comprehensive analysis of the memory behavior of 30 Dacapo and Renaissance Java applications using a dual profiling methodology with NUMAProfiler and PerfUtil in MaxineVM, identifying various memory pressures and JVM impacts.


How is this related to the $RAD coin?


From a technical standpoint, Radicle (P2P git protocol) is not related to $RAD.

$RAD is the token of the organization that has been funding Radicle over the years.


If the RAD token has nothing to do with their product, why does it have value? Did/do they have some other product that uses the token?


There is governance value in the token. Whoever holds that token can vote on Radworks governance proposals.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: