Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting - do you need to take any special measures to get OSS genAI models to work on this architecture? Can you use inference engines like Ollama and vLLM off-the-shelf (as Docker containers) there, with just the Radeon 8060S GPU? What token rates do you achieve?

(edit: corrected mistake w.r.t. the system's GPU)



I just use llama.cpp. It worked out of the box.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: