More

afiodorov · 2025-11-28T18:23:13 1764354193

I've been embedding all HN comments since 2023 from BigQuery and hosting at https://hn.fiodorov.es

Source is at https://github.com/afiodorov/hn-search

kylecazar · 2025-11-28T18:42:01 1764355321

I appreciate the architectural info and details in the GH repo. Cool project.

tim333 · 2025-11-29T00:00:00 1764374400

That's cool - it gave me quite a good answer when I tried it. Does it cost you much to run?

I tried "Who's Gary Marcus" - HN / your thing was considerably more negative about him than Google.

afiodorov · 2025-11-29T00:31:36 1764376296

The running costs are very low. Since posting it today we burned 30 cents in DeepSeek inference. Postgres instance though costs me $40 a month on Railway; mostly due to RAM usage during to HNSW incremental update.

victorbuilds · 2025-11-29T08:51:00 1764406260

That's cool! Some immediate UI feedback after search button is clicked would be nice, I had to press it several times until I noticed some feedback. Maybe just disable it once clicked, my 2 cents

simlevesque · 2025-11-28T22:42:36 1764369756

I have a question: what hardware did you use and how long did you need to generate the embeddings ?

afiodorov · 2025-11-29T00:22:14 1764375734

Daily updates I do on my m4 mac air: takes about 5 minutes to process roughly 10k fresh comments. Historic backfill was done on an Nvidia GPU rented on vast.ai for a few dollars. If I recall correctly took about an hour or so. It’s mentioned in the README.md on GitHub.

shortrounddev2 · 2025-11-29T18:59:08 1764442748

What mechanisms do you have to allow people to remove their comments from your databae

Gerardo1 · 2025-11-28T21:07:09 1764364029

Can users here submit an issue to have data associated with their account removed?

vilocrptr · 2025-11-28T21:22:54 1764364974

GDPR still holds, so I don’t see why not if that’s what your request is under.

However, it’s out there- and you have no idea where, so there’s not really a moral or feasible way to get rid of it everywhere. (Please don’t nuke the world just to clean your rep.)

dangus · 2025-11-28T22:04:38 1764367478

The law (at least, in the EU) grants a legal right to privacy, and the motivation behind it is really none of anyone’s business.

Maybe commenters face threats to safety. Maybe commenters didn’t think AI companies profiting off of their non-commercial conversations would ever exist and wouldn’t have put data out there if that was disclosed ahead of time.

Corporations have an unlimited right to bully and threaten to take down embarrassing content and hide their mistakes, they have greatly enhanced leverage over copyright enforcement compared to individuals, but then if individuals do a much less egregious thing to try and take down their content they don’t even get paid for it’s immoral.

This community financially benefits YCombinator and its portfolio companies. Without our contributions, readership, and comments, their ability to hire and recruit founders is diminished. They don’t provide a delete button for profit-motivated reasons, and privacy laws like GDPR guard against that.

(As you might guess, I am personally quite against HN’s policy forbidding most forms of content deletion. Their policy and solution involving manual modifications via the moderation team makes no sense - every other social media platform lets you delete your content)

ls-a · 2025-11-29T05:08:18 1764392898

Finally someone mentioned it. I'm surprised all the "tech enthusiasts" here turn a blind eye when it's their own community, but if it's someone else's then it's atrocious.

rubenvanwyk · 2025-11-29T05:34:50 1764394490

Very cool, well done!

afiodorov · 2025-10-14T20:42:38 1760474558

Apparently Persian and Russian are close. Which is surprising to say the least. I know people keep getting confused about how Portuguese from Portugal and Russian sound close yet the Persian is new to me.

CGMthrowaway · 2025-10-14T21:05:33 1760475933

Idea: Farsi and Russian both have simple list of vowel sounds and no diphtongs. Making it hard/obvious when attempting to speak english, which is rife with them and many different vowel sounds

dashtiarian · 2025-10-15T09:02:48 1760518968

While Persian has only two diphtongs and 6-8 vowels, Other Languages of Iran are full of them(e.g. Southern Kurdish speakers can pronounce 12+1 vowels and 11 diphtongs). I find it funny if all Iranians are speaking English with the Persian accent.

ilyausorov · 2025-10-14T21:43:25 1760478205

Yeh they seem to be in the same "major" cluster, although Serbian/Croatian, Romanian, Bulgarian, Turkish, Polish and Czech are all close.

Turkish and Persian seem to be the nearest neighbors.

zehaeva · 2025-10-14T20:50:23 1760475023

When I went to Portugal I was struck by how much Portuguese there does sound like Spanish with a Russian accent!

oscarfree · 2025-10-14T21:00:39 1760475639

Part of this is the "dark L" sound

BalinKing · 2025-10-14T21:02:12 1760475732

I’d guess that the sibilants, consonant clusters, and/or vowel reduction would play a big role.

binary132 · 2025-10-14T21:01:24 1760475684

I thought I was the only one who perceived an audible similarity between Portuguese and Russian.

dgan · 2025-10-15T12:03:48 1760529828

I am native Russian speaker, and work/visited Portugal. It definitely tricks me when not paying attention, its very similar sounding

mh- · 2025-10-14T22:34:57 1760481297

I speak neither, and both also sound similar to me depending on the accents of the speakers.

djmips · 2025-10-14T22:54:52 1760482492

I had that too but it was Brazillian Portuguese where I noticed it.

maleldil · 2025-10-15T13:41:23 1760535683

The characteristics that make pt-PT sound similar to Russian are largely absent in pt-BR.

afiodorov · 2025-10-14T17:26:49 1760462809

I've found that building my side projects to be "scalable" is a practical side effect of choosing the most cost-effective hosting.

When a project has little to no traffic, the on-demand pricing of serverless is unbeatable. A static site on S3 or a backend on Lambda with DynamoDB will cost nothing under the AWS free tier. A dedicated server, even a cheap one, is an immediate and fixed $8-10/month liability.

The cost to run a monolith on a VPS only becomes competitive once you have enough users to burn through the very generous free tiers, which for many side projects is a long way off. The primary driver here is minimizing cost and operational overhead from day one.

mejutoco · 2025-10-15T10:42:44 1760524964

> A dedicated server, even a cheap one, is an immediate and fixed $8-10/month liability.

Personally, I am more worried about the infinitely-scalable service potentially (liability) sending a huge bill after the fact. This "liability" of $8-10 is predictable, like a Netflix subscription.

afiodorov · 2025-10-01T21:57:19 1759355839

Data all-rounder with 10 years building everything from low-latency Go microservices to training ML models to large-scale AWS data pipelines. Looking for a senior, autonomous role at a small company/startup.

  Location: Las Palmas, Spain
  Remote: Yes
  Willing to relocate: No
  Technologies: Go, Python, SQL, Kubernetes, Docker, AWS (S3, EMR, RDS, Aurora, Athena), Apache Spark, Apache Airflow, TypeScript, React, gRPC, REST APIs, PostgreSQL, Google BigQuery, LangChain, LangGraph, RAG, faster-whisper
  Résumé/CV: https://cv.fiodorov.es
  Email: hn@fiodorov.es

afiodorov · 2025-10-01T14:58:26 1759330706

RAG search that contains all HN comments since 2023

https://hn.fiodorov.es

I treat it more like a homework exercise for a Coursera course but I like the result.

afiodorov · 2025-09-23T21:44:24 1758663864

> It was uncomfortable at first. I had to learn to let go of reading every line of PR code. I still read the tests pretty carefully, but the specs became our source of truth for what was being built and why.

This is exactly right. Our role is shifting from writing implementation details to defining and verifying behavior.

I recently needed to add recursive uploads to a complex S3-to-SFTP Python operator that had a dozen path manipulation flags. My process was:

* Extract the existing behavior into a clear spec (i.e., get the unit tests passing).

* Expand that spec to cover the new recursive functionality.

* Hand the problem and the tests to a coding agent.

I quickly realized I didn't need to understand the old code at all. My entire focus was on whether the new code was faithful to the spec. This is the future: our value will be in demonstrating correctness through verification, while the code itself becomes an implementation detail handled by an agent.

lunarcave · 2025-09-23T22:36:15 1758666975

> Our role is shifting from writing implementation details to defining and verifying behavior.

I could argue that our main job was always that - defining and verifying behavior. As in, it was a large part of the job. Time spent on writing implementation details have always been on a downward trend via higher level languages, compilers and other abstractions.

PUSH_AX · 2025-09-24T08:59:29 1758704369

Tell that to all the engineers that want to argue over minutia for days in a PR

nine_k · 2025-09-23T21:53:19 1758664399

> My entire focus was on whether the new code was faithful to the spec

This may be true, but see Postel's Law, that says that the observed behavior of a heavily-used system becomes its public interface and specification, with all its quirks and implementation errors. It may be important to keep testing that the clients using the code are also faithful to the spec, and detect and handle discrepancies.

patrickmay · 2025-09-23T22:15:42 1758665742

I believe that's Hyrum's Law.

cm2012 · 2025-09-23T21:51:05 1758664265

Claude Plays Pokemon showed that too. AI is bad at deciding when something is "working" - it will go in circles forever. But an AI combined with a human to occasionally course correct is a powerful combo.

wagwang · 2025-09-24T00:10:55 1758672655

If you actually define every inch of behavior, you are pretty much writing code. If there's any line in the PR that you can't instantly grok the meaning of, you probably haven't defined the full breadth of the behavior.

afiodorov · 2025-07-23T11:19:56 1753269596

  sign of serious organizational disfunction.

You're not wrong, but it's a "dysfunction" that many successful tech companies have learned to leverage.

The reality is, most engineers spend far less than half their time writing new code. This is where the 80/20 principle comes into play. It's common for 80% of a company's revenue to come from 20% of its features. That core, revenue-generating code is often mature and requires more maintenance than new code. Its stability allows the company to afford what you call "dysfunction": having a large portion of engineers work on speculative features and "big bets" that might never see the light of day.

So, while it looks like a bug from a pure "coding hours" perspective, for many businesses, it's a strategic feature!

jameshart · 2025-07-23T12:17:38 1753273058

I suspect a lot of that organizational dysfunction is related to a couple of things that might be changed by adjusting individual developer coding productivity:

1) aligning the work of multiple developers

2) ensuring that developer attention is focused only on the right problems

3) updating stakeholders on progress of code buildout

4) preventing too much code being produced because of the maintenance burden

If agentic tooling reduces the cost of code ownership, annd allows individual developers to make more changes across a broader scope of a codebase more quickly, all of this organizational overhead also needs to be revisited.

afiodorov · 2025-07-18T23:03:36 1752879816

I live next to an abandoned building from the Spanish property boom. It's now occupied illegally. Hype's over yet the consequence is staring at me every day. I am sure it'll eventually be knocked down or repurposed yet it'd be better had the misallocation never happened.

Aeolun · 2025-07-18T23:15:37 1752880537

> It's now occupied illegally.

So it’s still being used now. That’s good right?

tim333 · 2025-07-21T15:13:15 1753110795

I bought a flat in the Spanish property boom. It was empty a while with ~80% of flats in the area vacant, then I had a squatter, now kicked out. Now most of the property is occupied and the Spanish government are bringing in huge restrictions to ease the property shortage. These things go in cycles. The boom and bust isn't very efficient but there you go.

afiodorov · 2025-07-18T22:57:00 1752879420

The crux of the article is asking whether such a large investment is justified; downplaying the article saying it's only X% of the GDP compared to Y doesn't address the issue.

afiodorov · 2025-07-18T22:53:30 1752879210

Nice clip yet it does not make it clear why 9% is the good value of GDP. Why not 7%?

airstrike · 2025-07-19T00:06:36 1752883596

Why not 50%?