Show HN: Pip install inference, open source computer vision deployment

nl · on Aug 23, 2023

Looks interesting.

Missing a trick by not having some images in the README though. Especially since this post here is where I found out it can run CLIP instead of the readme.

yeldarb · on Aug 23, 2023

Thanks for the suggestion! Definitely agree, we’ve seen that work extremely well for Supervision[1] and Autodistill[2], some of our other open source projects.

There’s still a lot of polish like this we need to do; we’ve spent most of our effort cleaning up the code and documentation to prep for open sourcing the repo.

Next step is improving the usability of the pip pathway (the native Python interface was just added; the http server was all we had for internal use). Then we’re going to focus on improving the content and expanding the models it supports.

[1] https://github.com/roboflow/supervision

[2] https://github.com/autodistill/autodistill

zerojames · on Aug 23, 2023

I have added in a video showing inference into the README -- thank you for the feedback!

noman-land · on Aug 24, 2023

Thanks. The video blew my mind.

claytonjy · on Aug 23, 2023

How does this compare to Nvidia Triton? It's also a dockerized inference server capable of running ONNX, TRT, and many other compiled models. Automatic request queuing, HTTP & GRPC interfaces, configurable delays for dynamic batching, multiple models per GPU, integrations for metrics and tracing, etc. etc.

Is Inference looking to compete directly with Triton, or in a different niche (CV)? Does it make a different set of tradeoffs?

yeldarb · on Aug 23, 2023

Inference operates at a higher layer of abstraction (tasks & architectures vs weights), targets a different audience (software engineers vs deep machine learning experts), and is focused on computer vision specifically (which informs features like its data integrations & interfaces).

I could even imagine a future version of Inference using Triton as an option for its inference engine behind the scenes.

Edit: I'd also be remiss to mention that NVIDIA have been great partners to Roboflow and we also have integrations with Tao Toolkit[1] & Omniverse[2].

[1] https://blog.roboflow.com/nvidia-tao-toolkit-roboflow/

[2] https://developer.nvidia.com/blog/how-to-train-a-defect-dete...

tlack · on Aug 23, 2023

Love & recommend Roboflow. Thanks for releasing this yall!

pkiv · on Aug 23, 2023

Looks great! Congrats on the launch.

Aspos · on Aug 23, 2023

Interested, but after spending 2 min reading it I am still not sure what I am looking at exactly.

zerojames · on Aug 23, 2023

Thank you for your comment!

Inference is all about reducing the friction associated with deployment. Set up a Docker container (which works on a range of architectures, with TRT acceleration), load your model (or use SAM or CLIP, which come out of the box), then you can start sending HTTP requests to get predictions.

I'd love to know more about how we can improve the docs (I'm a big documentarian; feedback on docs is always sincerely appreciated).

Aspos · on Aug 23, 2023

Genuine feedback: I guess your readme.md describes the HOW well, but does not touch the WHY. Why should I use this? Why not?

yeldarb · on Aug 23, 2023

We'll work on adding this to the repo, but I'll try to summarize here.

Why should I use this? If you're looking to deploy computer vision models to production (on edge devices or self-hosting in the cloud), Roboflow Inference is the easiest way to get up and running quickly & to iteratively improve your model's performance over time.

It's also in very active use with many real-world customers in production. That means we've ironed out (and are continuing to find and fix) many of the bugs and edge-cases that you'd invariably encounter building something on your own or using something less mature.

When should you _not_ use this? For everything else.

* If you're prototyping & hacking on core model primitives there are likely too many layers of abstraction & you should wait to productionize the model until you're happy with the underlying architecture.

* If you're not doing computer vision you're probably better off with something focused on the specific needs of LLMs or other types of models.

* If you need the absolute highest speed possible you may want to use lower-level tools like DeepStream and Triton (for now). Out of the box we sacrifice a bit of speed to get a better interface & end-user experience. We do have a TensorRT provider that sacrifices some of that convenience for speed & are working on additional optimizations (including potentially integrating with DeepStream/Triton behind the scenes).

Edit: @zerojames updated the README.

yeldarb · on Aug 23, 2023

It’s an easy to use inference server for computer vision models.

The end result is a Docker container that serves a standardized API as a microservice that your application uses to get predictions from computer vision models (though there is also a native Python interface).

It’s backed by a bunch of component pieces:

* a server (so you don’t have to reimplement things like image processing & prediction visualization on every project)

* standardized APIs for computer vision tasks (so switching out the model weights and architecture can be done independently of your application code)

* model architecture implementations (which implement the tensor parsing glue between images & predictions) for supervised models that you've fine-tuned to perform custom tasks

* foundation model implementations (like CLIP & SAM) that tend to chain well with fine-tuned models

* reusable utils to make adding support for new models easier

* a model registry (so your code can be independent from your model weights & you don't have to re-build and re-deploy every time you want to iterate on your model weights)

* data management integrations (so you can collect more images of edge cases to improve your dataset & model the more it sees in the wild)

* ecosystem (there are tens of thousands of fine-tuned models shared by users that you can use off the shelf via Roboflow Universe[1], it plays nicely with our other open source tools like Autodistill[2] and Supervision[3], and there is a ton of content around fine-tuning models you can use with it[4])

Additionally, since it's focused on computer vision, it has specific CV-focused features (like direct camera stream input) and makes some different tradeoffs than other more general ML solutions (namely, optimized for small, fast models that run at the edge & need support for running on many different devices like NVIDIA Jetsons and Raspberry Pis in addition to beefy cloud servers).

[1] https://universe.roboflow.com

[2] https://github.com/autodistill/autodistill

[3] https://github.com/roboflow/supervision

[4] https://github.com/roboflow/notebooks

jvanillaaaa · on Aug 23, 2023

This is sick

thangngoc89 · on Aug 23, 2023

Is it possible to run this without supplying Roboflow API key? Why the requirement?

yeldarb · on Aug 23, 2023

Yes, but also not easily right now. We’ll be adding more supported ways to load models soon, but right now things largely assume using our model registry (and API Key is how it validates who is authorized to use which models).

The “why” is because this has been an internal project for years that we just open sourced, and while it was internal this wasn’t needed. But we’ll add it soon! (PRs also welcome; we’re happy to talk through how that might be architected if you start an issue/discussion on the repo.)