Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Super cool to see you here.

I've also looked at ray for running data pipelines before (at much much smaller scales) for the reasons you suggest (unstructured data, mixed CPU/GPU compute).

One thing I've wanted is an incremental computation framework (i.e., salsa [1]) built on ray so that I can write jobs that transparently reuse intermediate results from an object store if their dependents haven't changed.

Do you know if anyone has thought of building something like this?

[1] https://github.com/salsa-rs/salsa



I asked the same question to one of the core devs at a recent event and he (1) said that some people in finance have done related things and (2) suggested using the Ray slack to connect with developers and power users who might have helpful advice.

I agree this is a very interesting area to consider Ray for. There are lots of projects/products that provide core components that could be used but there’s no widely used library. It feels like one is overdue.


Other folks have built data processing libraries on top of Ray: Modin and Daft come to mind.

But I'm not aware of anything exactly like what you're referring to!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: