Hacker Newsnew | past | comments | ask | show | jobs | submit | mfdupuis's commentslogin

I think this is the challenge and the dissonance. For something to truly run autonomously you need to provide it some many constraints that it almost loses its usefulness. I've tried using AI, or at least looked into what I could use AI for to automate marketing tasks and I just don't think I can seriously set up a workflow in n8n or AgentKit that would produce sufficiently good results without me jumping in. That said, AI is incredibly helpful in this semi-autonomous mode with the right parameters, to the point of the parent comment.


Observability is a good guess, but I'd venture to guess that the conversations going on internally are about how to capture value across the entire stack. I wouldn't be surprised if we hear about them acquiring either a database/warehouse company and/or an analytics solution. Or vice versa, them getting acquired by a bigger player that wants to offer more connectors and data modeling functionality.


This is a really great comparison to draw. This actually made me think that this feeling of going from mastering a craft to working on large scale systems is probably how someone who was passionate about cars felt when they went from building cars one by one, knowing how the whole machine works to then having to take a job on an assembly line.

Fortunately I think anything pertaining to vibe coding/engineering/analytics is still more enjoyable and less grim than working on an assembly line, but the human feelings remain nonetheless.


I think the sentiment in this post is shared with most folks: We don't need yet another workflow tool. That said, I don't think no and low (or even all) code are mutually exclusive. Today these tools definitely speak to an audience that's technical enough to know how to get around in these tools but not so technical that they know how to deploy, monitor and maintain these workflows. What will happen with time is that any complexity that does exist in these low-code tools will be abstracted away with AI but you can dig into deeper layer if you are technical. This might be that very last point you're making?

> How can we make code generation models better at writing LLM powered workflows/agents

For context, the approach we've taken at Fabi is that you can manually build the workflows or you can ask the AI to entirely build it out for you and you can pick up where the AI left off at any point and edit the code yourself -- we're not a generic workflow solution like n8n or AgentKit, we're 100% focused on data analysis, but the same should be easily applicable to these solutions.


> The best way to avoid the red ocean is to build for an obscure and complex niche.

This seems like the counter-argument... You need to build something incredibly different. You need to message differently, you need to distribute differently.

If the argument is that convincing the world that you're different is harder than ever I buy that. So much fluff and noise out there that it's harder than ever to break through that noise and cut through the skepticism. But for that, it's more important than ever to be different.


Are you working towards this? What examples do you have in mind? (I agree with your thesis btw)


Disclosure, I'm a founder in the data space[1]

Have you thought about how you would handle much larger datasets? Or is the idea that since this is a spreadsheet, the 10M cell limit is plenty sufficient?

I find WASM really interesting, but I can't wrap my head around how this scales in the enterprise. But I figure it probably just comes down to the use cases and personas you're targeting.

[1] https://www.fabi.ai/


I am also very deeply invested in this question. It seems like the goto path for huge large data sets is text to sql (clickhouse, snowflake) etc. But all these juicy python data science libraries require code execution based on off the much small data payloads from the sql results. Feel free to reach out, what you are trying to achieve seems very similar to what I am trying to do in a completely different industry/usecase.


Fabi.ai | https://www.fabi.ai/| Senior front end engineer | Full-time | Hybrid SF or Remote (US)

We're looking for a senior front end engineer to join our mighty and growing team.

We're transforming the way data analysis is done in the enterprise and already have some amazing customers and are growing rapidly.

This person should have extensive React and Typescript experience be able to operate with minimal design supervision (we're a small team and we expect this person to have a sharp eye).

Full job description: https://www.linkedin.com/jobs/view/4093878394


This feels like a good opportunity for a startup. I've seen a lot of startups crop up around Snowflake cost management, I wonder what's in the AWS space.



Love DuckDB. Definitely a great place to start.

> A common pattern I’ve seen over the years have been folks in engineering leadership positions that are not super comfortable with extracting and interpreting data from stores

I think this extends beyond just engineering, and I wish more data teams made the raw data (or at least some clean, subset) more readily available for folks across organizations to explore. I've been part of orgs where I had access to read-only replicas, and I quickly got comfortable querying and analyzing data on my own, and I've been part of other orgs where everything had to go through the data team, and I had to be spoon-fed all the data and insights.


Totally agree. In my last job I was able to create my own ETL jobs as a PM to get data for my own analyses and figured out a fairly minor configuration change could save us $10M per year. It was from one of many random ETL jobs I created myself out of curiosity that, if I had been forced to rely on other people, I may not ever have created.


If you’d just had a business controller, you’d have x*$10M saved and have more time for your PM-role.

Yes, calling BS on leadership running their own SQL. Bring strategy and tactics, find good people, create clear roles and expectations and sure don’t get lost in running naive scripts you’ve written because you can do all roles better than the people actually occupying those roles.


Agreed, if you have the budget for it. There are often times where living off the land is necessary.


I know nothing about working in small firms. So that is probably very true. The smaller the firm, the more you do yourself. But ... if a company can save $ 10 mln. ... it can afford a set of financials.


This is actually one of the more interesting LLM observability platforms I've seen. Beyond addressing scaling issues, where do you see yourself going next?


Positioning/roadmap differs between the different project in the space.

We summarized what we strongly believe in here: https://langfuse.com/why Tldr: open apis, self-hostable, LLM/cloud/model/framework-agnostic, API first, unopinionated building blocks for sophisticated teams, simple yet scalable instrumentation that is incrementally adoptable

Regarding roadmap, this is the near-term view: https://langfuse.com/roadmap

We work closely with the community, and the roadmap can change frequently based on feedback. GitHub Discussions is very active, so feel free to join the conversation if you want to suggest or contribute a feature: https://langfuse.com/ideas


What are other potential platforms?


This is a good long-list of projects, although it is not narrowly scoped to tracing/evals/prompt-management: https://github.com/tensorchord/Awesome-LLMOps?tab=readme-ov-...


One missing in the list below is Agenta (https://github.com/agenta-ai/agenta).

We're oss, otel compliant with stronger focus on evals and the enabling collaboration between subject matter experts and devs.


Bunch of them : Langsmith, Lunary, Phoenix Arize, Portkey, Datadog and Helicone.

We also picked Langfuse - more details here: https://www.nonbios.ai/post/the-nonbios-llm-observability-pi...


Thanks, this post was insightful. I laughed at the reason why you rejected Arize Phoenix, I had similar thoughts while going through their site!=)

> "Another notable feature of Langfuse is the use of a model as a judge ... this is not enabled in the free version/self-hosted version"

I think you can add LLM-as-judge to the self-hosted version of Langfuse by defining your own evaluation pipeline: https://langfuse.com/docs/scores/external-evaluation-pipelin...


Thanks for the pointer !

We are actually toying with building out a prompt evaluation platform and were considering extending langfuse. Maybe just use this instead.


Thanks for sharing your blogpost. We had a similar journey. I installed and tried both Langfuse and Phoenix and ended up choosing Langfuse due to some versioning conflicts on the python dependency. I’m curious if your thoughts change after V3? I also liked that it only depended on Postgres but the scalable version requires other dependencies.

The thing I liked about Phoenix is that it uses OpenTelemetry. In the end we’re building our Agents SDK in a way that the observability platform can be swapped (https://github.com/zetaalphavector/platform/tree/master/agen...) and the abstraction is OpenTelemetry-inspired.


As you mentioned, this was a significant trade-off. We faced two choices:

(1) Stick with a single Docker container and Postgres. This option is simple to self-host, operate, and iterate on, but it suffers from poor performance at scale, especially for analytical queries that become crucial as the project grows. Additionally, as more features emerged, we needed a queue and benefited from caching and asynchronous processing, which required splitting into a second container and adding Redis. These features would have been blocked when going for this setup.

(2) Switch to a scalable setup with a robust infrastructure that enables us to develop features that interest the majority of our community. We have chosen this path and prioritized templates and Helm charts to simplify self-hosting. Please let us know if you have any questions or feedback as we transition to v3. We aim to make this process as easy as possible.

Regarding OTel, we are considering adding a collector to Langfuse as the OTel semantics are currently developing well. The needs of the Langfuse community are evolving rapidly, and starting with our own instrumentation has allowed us to move quickly while the semantic conventions were not developed. We are tracking this here and would greatly appreciate your feedback, upvotes, or any comments you have on this thread: https://github.com/orgs/langfuse/discussions/2509


So we are still on V2.7 - works pretty good for us. Havent tried V3 yet, and not looking to upgrade. I think the next big feature set we are looking for is a prompt evaluation system.

But we are coming around to the view that it is a big enough problem to have dedicated saas, rather than piggy back on observability saas. At NonBioS, we have very complex requirements - so we might just end up building it up from the ground up.


"Langsmith appeared popular, but we had encountered challenges with Langchain from the same company, finding it overly complex for previous NonBioS tooling. We rewrote our systems to remove dependencies on Langchain and chose not to proceed with Langsmith as it seemed strongly coupled with Langchain."

I've never really used Langchain, but setup Langsmith with my own project quite quickly. It's very similar to setting up Langfuse, activated with a wrapper around the OpenAI library. (Though I haven't looked into the metadata and tracing yet.)

Functionally the two seem very similar. I'm looking at both and am having a hard time figuring out differences.


We launched Laminar couple of months ago, https://www.lmnr.ai. Extremely fast, great DX and written in Rust. Definitely worth a look.


Congrats on the Launch!


apologies for hijacking your launch (congrats btw!)


thanks Marc :)


I'm a maintainer of Opik, an open source LLM evaluation and observability platform. We only launched a few months ago, but we're growing rapidly: https://github.com/comet-ml/opik


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: