I'm Yann, one of the founders of Koyeb. Koyeb is a platform for developers and businesses to run serverless data processing apps in minutes.
We provide an easy to use platform to build production-grade workflows for all your data, including image, video, audio, or document processing.
To provide a little bit of context, we previously developed Scaleway (https://scaleway.com/), a European Cloud Service Provider, and started Koyeb initially around multi-cloud object storage (https://news.ycombinator.com/item?id=21005524)
We are now going a step further: we are trying to also provide an easy way to process data and to orchestrate distributed processing from various sources.
Currently, we provide an S3 compliant API to push your data, you can implement processing workflows using ready-to-use integrations (https://www.koyeb.com/catalog) and store results on the cloud storage provider of your choice (i.e. GCP, Azure Blob, AWS S3, Vultr, DigitalOcean, Wasabi, Scaleway, or even Minio servers).
We're working on adding support for Docker containers and custom functions to let our users combine catalog integrations with their own code in workflows. We will also add support for new data sources to send, ingest, and import data from different services.
We of course take care of all the infrastructure management and scaling of the platform.
The platform is in early access phase and I'd love to hear what you think, your impressions and feedback.
I wasn't able to find a reference for the yaml schema faster than I can open the "add comment" page, so apologies if this is address by some docs somewhere:
Given that `steps:` is a list, isn't having `after: video-clipping` redundant, since it already comes after the video-clipping step?
The `after` attribute is present to let you implement your processing logic. The steps list contains the processing actions but it is not a sequential execution. The workflow can have multiple processing branch and perform a series of processing on the result of a specific step.
I'm curious as to how it works to develop on it locally vs a stage server & vs prod? I.e. how to make sure the workflows are synched and are easy to reason about?
We’re currently working on a deep Git integration where you basically push your updated workflow configuration with the code of functions/docker tag reference.
We have some tooling to develop individual catalog integrations locally and test the integration with object storage works as expected. We plan to publicly release this tooling for all users to be able to test their functions locally before using them on a workflow.
For workflows environment, currently, right now, you have to create one processing Stack for each environment, i.e. dev, staging, and prod. Later on, we want to spawn environements for each Git branch.
We deal with both multi-cloud processing and storage with a managed platform when the Serverless framework simply allows you to configure and deploy functions on main cloud service providers.
We allow you to build, deploy and run (i.e. we operate the infrastructure) processing workflows using ready-to-use integrations, containers, or custom functions. We also provide a multi-cloud storage layer, you can use and push data stored on multiple cloud storage providers with a simple S3 interface instead of having to deal with each object provider implementation.
We also plan to be compatible with the serverless framework for the custom functions part.
YAML is the new XML. Monstrous configurations used by some magic runtime to glue pieces together.
Dependency injection frameworks were there already and decided against it at the end. I wish CI, data pipelines, config management, IaC tools and all other modern users of YAML-fits-everything approach learned from the past.
What we need is an ease of creating mini languages, but apparently it is not something industry/academia is aiming at, so this skill in not existent among common engineers, which often come up with ideas of new products.
HashiCorp is doing fine with config files in hcl or json. What’s your specific complain here? To me it seems your mini languages would anyway compile into something very similar to those yaml files. It’s all graph nodes in the end. But I do agree a language would be nicer when it comes to testing
Secondly, unless you're selling compiled software for others to configure, there really is no difference between config in config files and config in code, except that the former does not let you build such meaningful and useful abstractions. You're presumably delivering it all into production via CI/CD anyway, so what difference editing a config file vs. some actual source code, besides the lack of real type safety and IDE assistance?
Pulumi have figured this out.
(And I suspect that if your end users need to configure your software's use of Spring's Dependency Injection in XML config then you are probably doing it wrong.)
Aside from having an s3-compatible layer, that's where the similarities end. Minio is self-hosted, and doesn't provide the additional workflow processing, etc.
Minio is an awesome object storage solution with great features such as the S3 gateway but from what I know they don’t have a processing layer.
At Koyeb we also have an S3 compatible layer to let users send, process, and store (on any cloud or your own edge Minio) data. As @bdcravens said, its where similarities end, plus Koyeb is entirely managed.
We provide an easy to use platform to build production-grade workflows for all your data, including image, video, audio, or document processing.
To provide a little bit of context, we previously developed Scaleway (https://scaleway.com/), a European Cloud Service Provider, and started Koyeb initially around multi-cloud object storage (https://news.ycombinator.com/item?id=21005524) We are now going a step further: we are trying to also provide an easy way to process data and to orchestrate distributed processing from various sources.
Currently, we provide an S3 compliant API to push your data, you can implement processing workflows using ready-to-use integrations (https://www.koyeb.com/catalog) and store results on the cloud storage provider of your choice (i.e. GCP, Azure Blob, AWS S3, Vultr, DigitalOcean, Wasabi, Scaleway, or even Minio servers).
We're working on adding support for Docker containers and custom functions to let our users combine catalog integrations with their own code in workflows. We will also add support for new data sources to send, ingest, and import data from different services.
We of course take care of all the infrastructure management and scaling of the platform.
The platform is in early access phase and I'd love to hear what you think, your impressions and feedback.
Thanks a lot!