More

mikepapadim · 2025-12-12T08:32:32 1765528352

Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...

lostmsu · 2025-12-12T10:12:35 1765534355

TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393

mikepapadim · 2025-12-12T13:27:06 1765546026

https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313

lostmsu · 2025-12-14T08:48:37 1765702117

I believe these are SIMD. Tensor cores require MMA family of instructions. Ask me how I know. :)

https://github.com/m4rs-mt/ILGPU/compare/master...lostmsu:IL...

Good article: https://alexarmbr.github.io/2024/08/10/How-To-Write-A-Fast-M...

mikepapadim · 2025-12-11T15:59:56 1765468796

https://github.com/beehive-lab/GPULlama3.java

mikepapadim · 2025-12-03T11:27:59 1764761279

Highlights Simpler execution via Java argfiles Improved performance on FP16/Int8 LLM Inference on hashtag#Nvidia GPUs Extended reduced precision type support for GPUs (Int8, fp16) Zero-copy object support through project Panama Support for compressed oops on modern JVMs New cross-platform SDK distribution (soon hashtag#SDKMAN! https://lnkd.in/d8pGHYy5) Official TornadoVM dependencies now published on Maven Central. (https://lnkd.in/dDRZj8ru)

mikepapadim · 2025-10-08T08:23:16 1759911796

https://github.com/beehive-lab/TornadoVM

mikepapadim · 2025-07-02T10:57:36 1751453856

  Location: UK
  Remote: Yes
  Willing to relocate: No
  Technologies: Java, C++, Python, Cuda, OpenCL, Docker, ONNXRT,Git, Apache TVM, Compilers, GPUs
  Résumé/CV:https://github.com/mikepapadim && https://www.linkedin.com/in/michalis-papadimitriou/
  Email:mpapadimitriou92 [ΑΤ] gmail.com

mikepapadim · 2025-06-13T13:49:41 1749822581

https://github.com/beehive-lab/GPULlama3.java

We took Llama3.java and we ported TornadoVM to enable GPU code generation. Apparrently, the first beta version runs on Nnvidia GPUs, while getting a bit more than 100 toks/sec for 3B model on FP16.

All the inference code offloaded to the GPU is in pure-Java just by using the TornadoVM apis to express the computation.

Runs Llama3 and Mistral models in GGUF format.

It is fully open-sourced, so give it a try. It currently run on Nvidia GPUs (OpenCL & PTX), Apple Silicon GPUs (OpenCL), and Intel GPUs and Integrated Graphics (OpenCL).

mikepapadim · 2025-05-30T09:45:31 1748598331

Java to OpenCL and PTX inference of Llama3 through TornadoVM

mikepapadim · on Nov 8, 2024

Llama Deck is a command-line tool for quickly managing and experimenting with multiple versions of llama inference implementations. It can help you quickly filter and download different llama implementations and llama2-like transformer-based LLM models. We also provide some Docker images based on some implementations, which can be easily deploy and run through our tool.

mikepapadim · on March 20, 2024

A comprehensive analysis of the memory behavior of 30 Dacapo and Renaissance Java applications using a dual profiling methodology with NUMAProfiler and PerfUtil in MaxineVM, identifying various memory pressures and JVM impacts.

mikepapadim · on March 5, 2024

How is this related to the $RAD coin?

bordumb · on March 5, 2024

From a technical standpoint, Radicle (P2P git protocol) is not related to $RAD.

$RAD is the token of the organization that has been funding Radicle over the years.

saurik · on March 5, 2024

If the RAD token has nothing to do with their product, why does it have value? Did/do they have some other product that uses the token?

gsaslis · on March 5, 2024

There is governance value in the token. Whoever holds that token can vote on Radworks governance proposals.