Hacker Newsnew | past | comments | ask | show | jobs | submit | smallerfish's commentslogin

The biggest problem is that when this company goes out of business in 5 years that it'll become a paperweight.

I'm still bitter at Logitech for screwing up Squeezebox.


Squeezeboxes still work great! Although that was unusually cool of Logitech to open source LMS.

A video demo would be useful. I can't really tell how much the application is doing from the screenshots. Is it a tool with some smart guidance, or is it doing deep magic?

I didn't think a video would be very exciting. It did feel like deep magic when I tested it though. For the scenario in the screenshots, I provided the question, "Did we really land a man on the moon?" and the null hypothesis "We landed on the moon in 1969", and the low value piece of evidence "My dad told me he saw Stanley Kubrick's moon landing set one time and he never lies." Literally everything else the LLM generated on demand for me based on its existing training data, offline. It gave me hypotheses, challenges, evidence, filled out the matrix, did the calculations, everything.

> Literally everything else the LLM generated on demand for me based on its existing training data, offline

That's a ton of scope for hallucinations, surely?


It would be enough to drive most local LLMs crazy if it tried to generate it all at once or if it was all part of one long session, but it's set up so the LLM doesn't have to produce much at a time. I only batch in small groups (like it will generate only 3 suggestions per request) and the session is refreshed between calls, and the output is generally force structured to fit correctly into the expected format. You can, however, ask for new batches of suggestions or conflicts or evidence more than once. Hallucinations can happen for any LLM use of course, but if they break the expected structure the output is generally thrown out. Even the matrix scoring suggestion - it works on the whole row, but behind the scenes the LLM is asked to return one response in one "chat" session per column, and then they are all entered at the same time once all of them have been individually returned. That way, if the LLM does hallucinate for the score, it outputs a neutral response for that cell and doesn't corrupt any of the neighboring cells.

If you use a smaller model with smaller context, it might be more prone to hallucinations and provide less nuanced suggestions, but the default model seems to be able to handle the jobs pretty well without having to regenerate output very often (it does happen sometimes, but it just means you have to run it again.) Also, depending on the model, you might get less variety or creativity in suggestions. It's definitely not perfect, and it definitely shouldn't be trusted to replace human judgement.


And the answer was... ? :)

Well, based on the evidence provided against our competing hypotheses, The least problematic hypothesis is that we landed on the moon in 1969. Second least problematic hypothesis was "The Apollo 11 mission was a hoax staged by NASA and the U.S. government for public relations and Cold War propaganda, but the moon landing itself was real — only the public narrative was fabricated." Third least problematic was "The Apollo 11 mission was a real event, but the moon landing was not achieved by humans — it was an automated robotic mission that was misinterpreted or falsely attributed to astronauts due to technical errors or media misreporting." - The winning hypothesis had a score of 0 (lower is better), second place had a score of 6 (out of possible 10 for our evidence set), and third place had a score of 8. There was also a tie for 4th place "It was just a government coverup to protect the firmament. There is no "outer space."" and "The Apollo 11 mission never occurred; all evidence — including photos, video, and lunar rocks — was fabricated in secret laboratories using early 20th-century special effects and staged experiments, possibly by a small group of scientists and engineers working under government contract." - both of these scored 10 out of 10, making them the most problematic. Sorry guys.

Yeah, I’m pretty sure you need to sharpen your pencil on this one if the conclusion was that the Apollo program was legitimate.

I'm sure if the right evidence were submitted and run against the right hypotheses a different frontrunner could emerge. Remember - this is a tool to help you investigate better and figure out what to look for, not a tool that tells you the answer. It helps you eliminate unlikely answers more than it ever points at the "right" answer, and even the most unlikely answers can still be the "right" ones! Hang in there

Somewhat ironically, this article has significant amounts of AI writing in it. (I've done a lot of AI writing in my own sites, and have been learning how to smother "the voice". This article doesn't do a good job of smothering.)

I like it, but the switch from metric to inches is confusing, and I think introduces a bug - there's no way a sea snail is 5-6 neurons high.

Some of your neurons stretch from your brain to your big toe. 1.5m, or more in a tall person.

There's no way a tardigrade is half a sea snail.

Correct, but not the one on the site.

You can change the units in the top corner. It defaults to metric for me, but if your browser language is "en-US" you get imperial by default.

Google Translate doesn't hold a candle to LLMs at translating between even common languages.

I'm a fan. Injecting a huge catalog into Netflix is a win for consumers who want just one subscription. And injecting studio talent into Netflix (assuming the merge gives WB creatives influence) can only help.

HBO's tech sucks. Apple is (in my experience) hard to get running in the Android ecosystem. Most of the other options are too narrow in catalog, or ad ridden.

Consolidating streaming services down to a handful of offerings will make price competition more fierce because they'll have richer catalogs to do battle with.


> Consolidating streaming services down to a handful of offerings will make price competition more fierce because they'll have richer catalogs to do battle with.

this is not how markets usually work.


Correct, but the current market is not working. 15+ streaming services is terrible for consumers. Catalogs are compromised. Bigger services can push prices up because they have more stuff. Clearly if there are too few players then there's less competition and no price pressure, but there's a sweet spot between what exists today and that.


This makes zero sense.

Can you name another scenario where consolidation helped the consumer? Where a sweet spot involved more consolidation?

Did Breyer’s ice cream get better when it was purchased by Unilever?

Did your local grocery store chain get better after it was acquired by Kroger or Albertsons?

Did the smartphone market get better when Microsoft acquired Nokia and HP acquired Palm?

What about Hashicorp? Sun Microsystems? Dark Sky? Red Hat? Slack? Nest? Any of these product markets get better post-consolidation?

I struggle to think of a single example of a product category that got better with industry consolidation.


Youtube, Android and Google Maps got better (and became financially viable at all) when Google bought them. Github got better and cheaper when Microsoft bought it.


I'm not necessarily talking about the product itself getting better, I'm talking about the overall consumer situation being better.

All these products were acquired very early in their lifespans, so them getting "better" was practically inevitable.

GitHub's acquisition effectively took at least one competitor off the market. Now, Microsoft doesn't have to seriously develop a competitor, they just bought their competitor and adopted it. They never had to improve Azure DevOps (VSTS) enough to be attractive, they just bought the market leader. If GitHub was never acquired, my company might be deciding between BitBucket, Gitlab, GitHub, and Azure Repos. Instead, Azure Repos is more of a niche offering where most of Microsoft's effort has focused on GitHub. Microsoft removed an option which likely raised prices or reduced user choice.

Google Maps was acquired in basically a prototype stage before it was ever a public product, so that case is irrelevant.

Android is worse in a number of ways due to Google's integration. Google Play Services APIs and other Google technologies have led to heavy Google lock-in. If Android continued as its own project, it would have been much more vendor-agnostic.

In the case of YouTube, I'd argue it's worse in a number of key ways: ads are wildly pervasive (sure, monetization would have had to happen anyway in some fashion), many of the platform changes are user-hostile (removed dislike count, background playback limited to premium subscription), content moderation more heavily influenced by Google's advertisement-based business model (e.g., if YouTube had continued on its own, it might have chosen a different monetization strategy less advertisement oriented, but Google is an advertisement company. Advertisers are more sensitive to their products being presented next to objectionable content) and competitors were snuffed out due to ecosystem integration (YouTube videos as Google search results rather than agnostic video results).

Remember the era where YouTube got extremely badly integrated in to Google+ and basically forced you to use it? That was a pretty terrible user experience.


Piracy is seeing a big uptick because streaming increasingly sucks. 10+ years ago before studios started chasing their own streaming platforms, and Netflix was the only game in town, it was an excellent deal. $10ish, as opposed to $50+ for cable (might be low on the cable subscription - I never had one).

If you wanted an equivalent catalog today, you'd need at least 3 or 4 streaming services, and you're paying $50+ or so. Netflix + WB (inc HBO) surely gets them back to roughly where they were. Will Netflix jack up their rates on the back of this acquisition? Inevitably, but I think they'll have a very hard time approaching a similar monthly rate. My gut says that they'll have a hard time getting beyond $30, with Disney and Youtube anchoring in the low teens. So, for the consumer, it's a win. For competing studios, of course, not so much.

You're assuming a free market working perfectly would bring the price down, but the free market is kneecapped by stupid and arbitrary licensing and IP games, which is the result desperate overreach of an industry hanging on by its fingernails as its business model has been upended multiple times over during the past 2 1/2 decades. But as we used to say about the music industry while happily napstering, your broken business model is not my problem.


It's bad for everyone. Fewer buyers = less content made and lower budgets, fewer voices being heard.


Netflix have never been a streaming service to put loads of good content on their service and keep it there. I would imagine they will use this injection of content to drip feed and slowly rotate movie franchises in order to keep users interested.


Because PostHog's "Talk to a human" chat instead gets a grumpy gatekeeping robot (which also doesn't know how to get you to a working urgent support link), and there's nothing prominently on their home page or github about this:

Hey PostHog! What version do we need to avoid?


co-founder here. We mentioned it in the main thread about this: https://news.ycombinator.com/item?id=46032650 and on status.posthog.com

- posthog-node 4.18.1, 5.13.3 and 5.11.3

- posthog-js 1.297.3

- posthog-react-native 4.11.1

- posthog-docusaurus 2.0.6

If you make sure you're on the latest version you should be good.


Thanks. Also - maybe change "talk to a human" to "talk to a grumpy robot" :)


Hm did you click on "help" (on the right side) -> "Email our support engineer" when logged in?


Ahhh, TBH I didn't look on the right. I dug through the menu on the left, thinking the right hand bar (which has the rotated labels) was all getting-started/docs related things. In my defense I have a fairly wide monitor and tend to full-screen the browser.


Your status page isn't clear, but are all versions between the compromised and "safe to install" versions compromised or just the ones listed?

For example I installed `posthog-react-native` version `4.12.4` which is between the `4.11.1` version which is compromised and the safe to install version `4.13.0`. Is that version compromised or not?


The only compromised versions are the ones listed. Any other versions are fine.


Thank you for the confirmation. I have updated to 4.13.0 anyway.


This is now the main thread. Though dang likes to merge dupes.


Have a slack channel with them, these are the versions they mentioned: posthog-node 4.18.1 posthog-js 1.297.3 posthog-react-native 4.11.1 posthog-docusaurus 2.0.6


I tried a prompt that consistently gets Gemini to badly hallucinate, and it responded correctly.

Prompt: "At a recent SINAC conference (approx Sept 2025) the presenters spoke about SINAC being underresourced and in crisis, and suggested better leveraging of and coordination with NGOs. Find the minutes of the conference, and who was advocating for better NGO interaction."

The conference was actually in Oct 2024. The approx date in parens causes Gemini to create an entirely false narrative, which includes real people quoted out of context. This happens in both Gemini regular chat and Gemini Deep Research (in which the narrative gets badly out of control).

Kagi reasonably enough answers: "I cannot find the minutes of a SINAC conference from approximately September 2025, nor any specific information about presenters advocating for better NGO coordination at such an event."


I also tried two hallucinations that I've seen in the latest Gemini and Claude. The first one is the famous "how many r in strawberry" and another one is asking about king sacrifice in chess where most LLMs tell you it's a very powerful move and tell you how to do it (obviously a nonsense since the point of the game is to protect the king).

Kagi actually got those two correct, explaining how losing your king means losing your chess game, which is quite impressive.


I tried both these prompts (along with the SINAC one as per GP) in Sonnet 4.5 and Gemini 3, and they both answered correctly for all three. Both also provided context on the chess question as well.


All of this will depend on the settings on the model (reasoning effort, temperature, top_k,etc) as well.

Which is why you should have benchmarks that are a bit broader generally (>10 questions for a personal setup) otherwise you overfit to noise


You might be interested to know your comment caused Gemini 3 to go meta on this question. This is what I found in its thinking:

``` Evaluating Date Discrepancies

I've just uncovered a significant discrepancy: a "Hacker News" result flags a date error originating from the very format of parenthetical dates. The system appears to be misinterpreting these, leading to demonstrably false information. This warrants further investigation to understand the root cause and ensure data integrity.

Analyzing Date Ambiguity

I'm now grappling with the user's date. The Hacker News comment suggests the user's prompt might be flawed, as it contradicts the actual date of an event I've indexed. Specifically, the conference referenced occurred in October 2024, not September 2025 as the prompt implies. This leads me to question the user's understanding of the timeline or if the prompt refers to an unrelated event in September 2025. This inconsistency demands further clarification.

...

Investigating User Prompt Intent

I've just realized that the Hacker News snippet reveals this might be a specific problematic test prompt. My strategy is to focus on the October 2024 forum, a strong match, organized by Kattia Cambronero and CCT. I'll verify if there was a 2025 event, but the 2024 event fits the described "crisis" and "NGO coordination". If the user says "approx Sept 2025," they likely mean a recent event. ```


Ah yes we have some benchmarks on this sort of misguided prompt trap, so it should perform well on this


I think Claude Code makes TilingWMs doable. I've always given up in the past because there have been reams of configs necessary to get to a fairly usable state, and I don't love reading manuals for desktop environments - but now it's all pretty easy to crank out.

I've been on HyprLand for a week now and haven't hit any blockers yet that'd force me to go back to KDE.


I think Claude Code is the crutch for being unable to go through docs.


Who wants to spend their weekends reading desktop environment docs? If that's your hobby, fine. I just want a working DE.


I believe it saves time, the documentation is available and easily digestible, and there's thousands of existing dotfiles to take from. I'd be disappointed to hear Claude couldn't do this.

I'm mostly pointing out that the documentation is very easy to read and implement for most tiling WMs, without the need for a coding agent.


Who wants to read thousands of existing dotfiles?

But I'm primarily talking about the missing pieces that most tiling window managers have that you need to implement yourself, or the annoying bugs that are buried in github.

I need a lock screen; fine, hyperidle. How is it configured? Once it works it works.15 seconds with Claude or 2-3 minutes googling and implementing. Why the hell would you not use it?

QT apps have fuzzy fonts in Hyprland. Turns out that's because I was using 1.5 fractional scaling on my 4k monitor, which was information buried in some github that has barely any traffic, which Claude found while I was doing actual work.

The google meet PIP window strobes because who the hell knows why, but that too was solved by Claude finding the right github ticket and applying opacity 0.999 instead of 1.0 for that window specifically. Where is that documented in the hyprland manual?

The point is that tiling window managers _in my experience_ always have rough edges, and I've been dipping in and out of them for 20 years. Now that many people (I guess not including your good self) are using LLMs all day every day to move faster in producing code, you can apply the same tooling to bring the tiling environment up to the same level of quality that we're used to with the bigger DMs that have a lot more resources and eyes on them.


The copy reeks of being AI written, which is ironic given:

> It’s a compelling story. And like most of the AI influencer bullshit that fills my timeline, it glosses over the inconvenient details.


Haha, nice catch


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: