Hacker Newsnew | past | comments | ask | show | jobs | submit | gkk's commentslogin

My trial co-founder and I built Unpitched as a 5-week run to test if we'd work well together. The twist: we're experienced software engineers yet AI wrote ~90% of the 24k lines of code in Unpitched.

The product analyzes customer interview transcripts to catch when founders slip into "pitch mode" instead of learning. It's based on principles from The Mom Test book - essentially a digital coach that flags your mistakes and gives you personalized advice how to do better.

Why this project for our trial:

- Real problem we'd witnessed (founders talking too much in user interviews) - Tight scope but production-grade requirement - Chance to push AI-accelerated development to its limits

Tech: Next.js 15, Supabase, Trigger.dev, GPT-4.1 via Vercel AI SDK. We used Cursor, Claude Code, V0, and (briefly) Grok for development.

Key learning: AI development requires adopting new working patterns. You can think of AI as a chaotic software engineering intern. You need to be highly intentional in guiding the AI to do the right thing. Just like with human teams, bad managers get bad output from their people and the same applies to managing AI.

If you're an experienced software engineer, you have a lot of implicit assumptions about how to build software, how to rate importance of tasks, etc. You need to transfer these to the AI, and we think we found early patterns how to do this well.

For example, we used "walking skeleton" and "tracer bullet" concepts to structure project planning we did with AI. We found the basic pattern of think-research-brainstorm, and plan before writing any code to dramatically improve the quality of AI coding, as the project gets more complex. E.g. we'd plan error handling with AI first, save it as a doc, then use that as context for implementation - this kept the AI consistent across the codebase.

We shared details of this approach at Warsaw AI Tinkerers (over 200 people attending) a couple of weeks ago.

The co-founder trial worked - we built a working mini-product in 5 weeks, found out how we approach this alien technology in the form of modern AI, and uncovered many interesting personal quirks of each other (everyone has them).

You can check out Unpitched at https://unpitched.app. Sadly, we require sign up as underlying LLM calls are a little expensive.

We wrote more about how we approached the cofounder trial process at https://unpitched.app/about. Let us know if you have any questions about our trial, maybe share your own stories of looking for cofounders, or have any feedback on the app!

PS. Shootout to Circleback team (YC W24) as the only note-taking app we found that has working webhooks that we could integrate with Unpitched.

-- gkk & ykka


If you think of open source as a protocol through which the ecosystem of companies loosely collaborate, then it's a big deal. E.g. Groq can work on inference without a complicated negotiations with Meta. Ditto for Huggingface, and smaller startups.

I agree with you on open source in the original, home tinkerer sense.


Hi hansonkd,

I'm working on Hotseat - a legal Q&A service where we put regulations in a hot seat and let people ask sophisticated questions. My experience aligns with your comment that vanilla GPT often performs poorly when answering questions about documents. However, if you combine focused effort on squeezing GPT's performance with product design, you can go pretty far.

I wonder if you have written about specific failure modes you've seen in answering qs from documents? I'd love to check whether Hotseat is handling them well.

If you'r curious, I've written about some of the design choices we've made on our way to creating a compelling product experience: https://gkk.dev/posts/the-anatomy-of-hotseats-ai/


Thanks for the response. I will check it out.

Specific failure modes can be something as simple as extraction of beneficiary information from a Trust document. Sometimes it works, but a lot of times it doesn't even with startups with AI products specific to extracting information from documents. For example it will have an incomplete list of beneficiaries, or if there are contingent beneficiaries, it won't know what to do. Not even a hard question about the contingency. Just making a simple list with percentages of if no-one dies what is the distribution.

Further trying to get an AI to describe the contingency is a crap shoot.

While I expect these options to get better and better, I have fun trying them out and seeing what basic thing will break. :)


Thanks for the response! I'm not familiar with Trust documents but I asked ChatGPT about them: https://chat.openai.com/share/c9d86363-b64a-4e44-9fd4-1d5b18...

If the example is representative, I see two problems: a simple extraction of information that is laid out bare (list of beneficiaries), and reasoning to interpret the section of contingent beneficiaries and connect it facts from other parts. Is that correct?

If that's the case, then Hotseat is miles ahead when it comes to analyzing regulations (from the civil law tradition, which is different from the US), and dealing with the categories of problems you mentioned.


Your post is very interesting. Thanks for sharing.

If your focus is narrow enough the vanilla gpt can still provide good enough results. We narrow down the scope for the gpt and ask it to answer binary questions. With that we get good results.

Your approach is better for supporting broader questions. We support that as well and there the results aren’t as good.


Thanks for reading it! I agree that binary questions are easy enough for vanilla GPT to answer. If your problem space fits them - great. Sadly, the space I'm in doesn't have an easy mode!


Is the licensing requirement actually in the bill? I've seen a confusion around the distinction between foundational and high-risk models - they're not the same.

(on a larger point of the AI Act leaving much to be desired, I agree)


It takes 90-120s to compute the answer. I just checked and the bot died mid-way computing answers earlier in the day and picked up from later point in the queue. I fixed it and you should get an answer soon.

Re email: the submission form has a second step where you can opt-in to leave your email address to get notified. Did it not show for you?


(author here)

One of the most non-obvious discoveries we made was that for such long documents, turning it into a Markdown (with marked headings), as opposed to plain text, made a night-and-day difference in LLM's reasoning performance. I have my guesses as to why this could be the case, but I'm curious to hear your hypothesis and whether you've seen similar effects in the wild?


Hi HN,

Today, we launch Hotseat AI: an AI-powered Q&A service for the 226-pages-long EU AI Act[0][1]. We launch Hotseat AI as a collaborative FAQ where anyone[2] can ask a question, and the bot will answer. The questions and answers will be public to build a high-quality community reference on AI regulation.

Hotseat is not your typical "chat-with-document". It started as such, and the earlier iteration of this project relied on embedding-based retrieval. We quickly found embeddings fall short of connecting a user question to relevant chunks of the regulation. Today's version doesn't use embeddings at all and is built on a bespoke pipeline of models. GPT4 is at the heart of Hotseat, and we heavily rely on function calling. We also use chain-of-thought and step-by-step reasoning to increase the working memory of LLM. We're performing whole-document reasoning first to make a plan for answering the question and proceed with the execution of that multi-step plan. All combined, Hotseat gets nuanced questions right.

My overarching lesson from this project is that to squeeze the most out of current LLMs, you need to focus on the retrieval and build upon that.

Our answers include a "legal trace": a series of AI Act quotes and explanatory comments. We're "pinning down" an LLM to reduce hallucinations by forcing direct quotes. This response format also reduced the chance of LLM taking a wrong turn when reasoning.

AI regulation is a hotly debated topic, and Hotseat can help folks poke at it with questions without plunging into legalese - plain language works great!

To wrap up, I'm wondering if this is a seed of a viable business. Would you find ‘directly ask the regulation’ useful, especially as a non-lawyer, like a startup founder or engineer. We had to cut a few corners to get Hotseat AI out, but it's unclear how much these matter in practice. Let me know if you find Hotseat useful to you or try to poke holes in it.

[0]: we're on the far end of "focus on one thing"

[1]: the latest AI Act version

[2]: I'll be doing light moderation to prevent spam and keep the quality high



I'd guess Anthropic considers these 2nd tier markets, so it's not a question whether it's too difficult but whether it's a priority at the moment.


Hey HN,

Author of the project here. Feel free to ask me any questions!


  --color-fg-default
You should change this to a darker color, it's impossible to read the text (AI comments).

(Yeah, so this is an issue only for dark mode which I just checked)


I didn't test the site in the dark mode and clearly it needs fixing. Thanks for catching this and reporting!

(the site is a little hacked together when it comes to styling -- I'm pushing streamlit's markdown support to the limit, and it seems like its built-in dark mode was the first to give in)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: