Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Pip install inference, open source computer vision deployment (github.com/roboflow)
71 points by zerojames on Aug 23, 2023 | hide | past | favorite | 16 comments
Deploying vision models is time consuming and tedious. Setting up dependencies. Fixing conflicts. Configuring TRT acceleration. Flashing (and re-flashing) NVIDIA Jetsons. A streamlined, developer-friendly solution for inference is needed.

We, the Roboflow team, have been hard at work open sourcing Inference, an open source vision deployment solution. Our solution is designed with developers in mind, offering a HTTP-based interface. Run models on your hardware without having to write architecture-specific inference code. Here's a demo showing how to go from a model to GPU inference on a video of a football game in ~10 minutes:

https://www.youtube.com/watch?v=at-yuwIMiN4

Inference powers millions of daily API calls for global sports broadcasts, one of the world’s largest railways, a leading electric car manufacturer, and multiple other Fortune 500 companies, along with countless hackers’ hobby and research projects. Inference works in Docker and supports CPU (ARM and x86), NVIDIA GPU, and TRT. Inference manages dependencies and the environment. All you need to do is make HTTP requests to the server.

YOLOv5, YOLOv8, YOLACT, CLIP, SAM, and other popular vision models are supported (some models need to be hosted on Roboflow first, see the docs; we're working on bring your own model weights!).

Try it out and tell us what you think!



Looks interesting.

Missing a trick by not having some images in the README though. Especially since this post here is where I found out it can run CLIP instead of the readme.


Thanks for the suggestion! Definitely agree, we’ve seen that work extremely well for Supervision[1] and Autodistill[2], some of our other open source projects.

There’s still a lot of polish like this we need to do; we’ve spent most of our effort cleaning up the code and documentation to prep for open sourcing the repo.

Next step is improving the usability of the pip pathway (the native Python interface was just added; the http server was all we had for internal use). Then we’re going to focus on improving the content and expanding the models it supports.

[1] https://github.com/roboflow/supervision

[2] https://github.com/autodistill/autodistill


I have added in a video showing inference into the README -- thank you for the feedback!


Thanks. The video blew my mind.


How does this compare to Nvidia Triton? It's also a dockerized inference server capable of running ONNX, TRT, and many other compiled models. Automatic request queuing, HTTP & GRPC interfaces, configurable delays for dynamic batching, multiple models per GPU, integrations for metrics and tracing, etc. etc.

Is Inference looking to compete directly with Triton, or in a different niche (CV)? Does it make a different set of tradeoffs?


Inference operates at a higher layer of abstraction (tasks & architectures vs weights), targets a different audience (software engineers vs deep machine learning experts), and is focused on computer vision specifically (which informs features like its data integrations & interfaces).

I could even imagine a future version of Inference using Triton as an option for its inference engine behind the scenes.

Edit: I'd also be remiss to mention that NVIDIA have been great partners to Roboflow and we also have integrations with Tao Toolkit[1] & Omniverse[2].

[1] https://blog.roboflow.com/nvidia-tao-toolkit-roboflow/

[2] https://developer.nvidia.com/blog/how-to-train-a-defect-dete...


Love & recommend Roboflow. Thanks for releasing this yall!


Looks great! Congrats on the launch.


Interested, but after spending 2 min reading it I am still not sure what I am looking at exactly.


Thank you for your comment!

Inference is all about reducing the friction associated with deployment. Set up a Docker container (which works on a range of architectures, with TRT acceleration), load your model (or use SAM or CLIP, which come out of the box), then you can start sending HTTP requests to get predictions.

I'd love to know more about how we can improve the docs (I'm a big documentarian; feedback on docs is always sincerely appreciated).


Genuine feedback: I guess your readme.md describes the HOW well, but does not touch the WHY. Why should I use this? Why not?


We'll work on adding this to the repo, but I'll try to summarize here.

Why should I use this? If you're looking to deploy computer vision models to production (on edge devices or self-hosting in the cloud), Roboflow Inference is the easiest way to get up and running quickly & to iteratively improve your model's performance over time.

It's also in very active use with many real-world customers in production. That means we've ironed out (and are continuing to find and fix) many of the bugs and edge-cases that you'd invariably encounter building something on your own or using something less mature.

When should you _not_ use this? For everything else.

* If you're prototyping & hacking on core model primitives there are likely too many layers of abstraction & you should wait to productionize the model until you're happy with the underlying architecture.

* If you're not doing computer vision you're probably better off with something focused on the specific needs of LLMs or other types of models.

* If you need the absolute highest speed possible you may want to use lower-level tools like DeepStream and Triton (for now). Out of the box we sacrifice a bit of speed to get a better interface & end-user experience. We do have a TensorRT provider that sacrifices some of that convenience for speed & are working on additional optimizations (including potentially integrating with DeepStream/Triton behind the scenes).

Edit: @zerojames updated the README.


It’s an easy to use inference server for computer vision models.

The end result is a Docker container that serves a standardized API as a microservice that your application uses to get predictions from computer vision models (though there is also a native Python interface).

It’s backed by a bunch of component pieces:

* a server (so you don’t have to reimplement things like image processing & prediction visualization on every project)

* standardized APIs for computer vision tasks (so switching out the model weights and architecture can be done independently of your application code)

* model architecture implementations (which implement the tensor parsing glue between images & predictions) for supervised models that you've fine-tuned to perform custom tasks

* foundation model implementations (like CLIP & SAM) that tend to chain well with fine-tuned models

* reusable utils to make adding support for new models easier

* a model registry (so your code can be independent from your model weights & you don't have to re-build and re-deploy every time you want to iterate on your model weights)

* data management integrations (so you can collect more images of edge cases to improve your dataset & model the more it sees in the wild)

* ecosystem (there are tens of thousands of fine-tuned models shared by users that you can use off the shelf via Roboflow Universe[1], it plays nicely with our other open source tools like Autodistill[2] and Supervision[3], and there is a ton of content around fine-tuning models you can use with it[4])

Additionally, since it's focused on computer vision, it has specific CV-focused features (like direct camera stream input) and makes some different tradeoffs than other more general ML solutions (namely, optimized for small, fast models that run at the edge & need support for running on many different devices like NVIDIA Jetsons and Raspberry Pis in addition to beefy cloud servers).

[1] https://universe.roboflow.com

[2] https://github.com/autodistill/autodistill

[3] https://github.com/roboflow/supervision

[4] https://github.com/roboflow/notebooks


This is sick


Is it possible to run this without supplying Roboflow API key? Why the requirement?


Yes, but also not easily right now. We’ll be adding more supported ways to load models soon, but right now things largely assume using our model registry (and API Key is how it validates who is authorized to use which models).

The “why” is because this has been an internal project for years that we just open sourced, and while it was internal this wasn’t needed. But we’ll add it soon! (PRs also welcome; we’re happy to talk through how that might be architected if you start an issue/discussion on the repo.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: