Thank you for your comment! Inference is all about reducing the friction associa...

Aspos · on Aug 23, 2023

Genuine feedback: I guess your readme.md describes the HOW well, but does not touch the WHY. Why should I use this? Why not?

yeldarb · on Aug 23, 2023

We'll work on adding this to the repo, but I'll try to summarize here.

Why should I use this? If you're looking to deploy computer vision models to production (on edge devices or self-hosting in the cloud), Roboflow Inference is the easiest way to get up and running quickly & to iteratively improve your model's performance over time.

It's also in very active use with many real-world customers in production. That means we've ironed out (and are continuing to find and fix) many of the bugs and edge-cases that you'd invariably encounter building something on your own or using something less mature.

When should you _not_ use this? For everything else.

* If you're prototyping & hacking on core model primitives there are likely too many layers of abstraction & you should wait to productionize the model until you're happy with the underlying architecture.

* If you're not doing computer vision you're probably better off with something focused on the specific needs of LLMs or other types of models.

* If you need the absolute highest speed possible you may want to use lower-level tools like DeepStream and Triton (for now). Out of the box we sacrifice a bit of speed to get a better interface & end-user experience. We do have a TensorRT provider that sacrifices some of that convenience for speed & are working on additional optimizations (including potentially integrating with DeepStream/Triton behind the scenes).

Edit: @zerojames updated the README.