Thanks for the clarification. Bit ironic given the talk’s subject. It is quite a bit of effort, but there’s something to say for going through and manually writing up the transcript like a journalist. Sometimes you can’t beat human effort ;)
That’s a great approach! That’s what I meant to convey if I had been a bit more articulate. I assume journalists do exactly that. Takes away some laborious work while retaining accuracy.
nice catch! the original transcript kept saying dogs instead of docs. thats the only thing i fixed (until your r's find now) after laughing at it for a while
For context - I was in the audience when Karpathy gave this amazing talk on software 3.0. YC has said the official video will take a few weeks to release, by which Karpathy himself said the talk will be deprecated.
Do the talk's predictions about the future of the industry project beyond a few weeks? If so, I'd expect the salient points of the talk to remain valid. Hmm...
not quite, i compiled the slides within a few hours of the talk yesterday well before your transcript was available. the slides are my main output/contribution. a full slides+transcript is too long for substack. i've linked your transcript prominently for people to find, and used it to fix slide ordering because twitter people took terrible notes for the purpose of exact talk reconstruction.
i exepct YC to prioritize publishing this talk so propbably the half life of any of this work is measured in days anyway.
100% of our podcast is published for free, but we still have ~1000 people who choose to support our work with a subscription (it does help pay for editors, equipment, and travel). I always feel bad that we dont have much content for them so i figured i'd put just the slide compilation up for subscribers. i'm trying to find nice ways to ramp up value for our subs over time, mostly by showing "work in progress" things like this that i had to do anyway to summarize/internalize the talk properly - which again is what we published entirely free/no subscription required
Looks like you are putting a derivative behind a paywall though, no? I think quid pro quo let pudiklubi publish your work too? Some kind of open license?
This is wild. I've been creating my own dataset of trending articles and ironically this is how I came across your post. I'm doing a similar project for my uni thesis.
I set out with similar hypotheses and goals like you (on a slightly different scale though, haha) but I've been completely stuck on the interactive map part. Definitely getting a lot of pointers from how you handled this!
Maybe one key difference in approach is that I've put more emphasis on trying to extract key topics as keywords.
For ex:
article (title): "Useful Uses of cat"
keywords: ['Software design', 'Contraction', 'Code changes', 'Modularity', 'Ease of extension']
My hypothesis is this will be a faster search solution than using the embeddings, but potentially not as accurate. Not that far yet to really prove this though.