A reason I like it is I have an "older" AMD GPU which is no longer supported by ROCm (sort of AMDs version of Cuda) which means running locally I'm either trying to figure out older ROCm builds to use my GPU and running into dependency issues or using my CPU which isn't that great either. But with WebGPU I'm able to run these models on my GPU which has been much faster than using the .cpp builds.
Its also fairly easy to route a Flask server to these models with websockets, so with that I've been able to run python and pass data to the model to run on the GPU and pass the response back to the program. Again, there's probably a better way but its cool to have my own personal API for a LLM.
Its also fairly easy to route a Flask server to these models with websockets, so with that I've been able to run python and pass data to the model to run on the GPU and pass the response back to the program. Again, there's probably a better way but its cool to have my own personal API for a LLM.