We'll work on adding this to the repo, but I'll try to summarize here. Why shoul...

We'll work on adding this to the repo, but I'll try to summarize here.

Why should I use this? If you're looking to deploy computer vision models to production (on edge devices or self-hosting in the cloud), Roboflow Inference is the easiest way to get up and running quickly & to iteratively improve your model's performance over time.

It's also in very active use with many real-world customers in production. That means we've ironed out (and are continuing to find and fix) many of the bugs and edge-cases that you'd invariably encounter building something on your own or using something less mature.

When should you _not_ use this? For everything else.

* If you're prototyping & hacking on core model primitives there are likely too many layers of abstraction & you should wait to productionize the model until you're happy with the underlying architecture.

* If you're not doing computer vision you're probably better off with something focused on the specific needs of LLMs or other types of models.

* If you need the absolute highest speed possible you may want to use lower-level tools like DeepStream and Triton (for now). Out of the box we sacrifice a bit of speed to get a better interface & end-user experience. We do have a TensorRT provider that sacrifices some of that convenience for speed & are working on additional optimizations (including potentially integrating with DeepStream/Triton behind the scenes).

Edit: @zerojames updated the README.