Hacker Newsnew | past | comments | ask | show | jobs | submit | thicknavyrain's commentslogin

This is so cool. I used to work on urban heat island analysis and now work in natural catastrophe modelling, and in both cases knowing the average heights/volumes of buildings is a very handy thing to have but is surprisingly difficult information to retrieve. Even a coarse estimate available at annual resolution has some really awesome use cases, very excited to see this.


I know it's a reductive take to point to a single mistake and act like the whole project might be a bit futile (maybe it's a rarity) but this example in their sample is really quite awful if the idea is to give AI better epistemics:

    {
        "causal_relation": {
            "cause": {
                "concept": "vaccines"
            },
            "effect": {
                "concept": "autism"
            }
        }
    },
... seriously? Then again, they do say these are just "causal beliefs" expressed on the internet, but seems like some stronger filtering of which beliefs to adopt ought to be exercised for an downstream usecase.


In the precision dataset, there are the sentences that led to this, some are:

>> "Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

>> On 28 February 1998 Horton published a controversial paper by Dr. Andrew Wakefield and 12 co-authors with the title "Ileal-lymphoid-nodular hyperplasia, non-specific colitis, and pervasive developmental disorder in children" suggesting that vaccines could cause autism.

>> He was opposed by vaccine critics, many of whom believe vaccines cause autism, a belief that has been rejected by major medical journals and professional societies.

All that I've seen don't actually say that vaccines cause autism


Oh, ouch, yeah. We already know that misinformation tends to get amplified, the last thing we need is a starting point full of harmful misinformation. There are lots of "causal beliefs" on the internet that should have no place in any kind of general dataset.


It's even worse than that, because the way they extract the causal link is just a regex, so

"vaccines > autism"

because

"Even though the article was fraudulent and was retracted, 1 in 4 parents still believe vaccines can cause autism."

I think this could be solved much better by using even a modestly powerful LLM to do the causal extraction... The website claims "an estimated extraction precision of 83% " but I doubt this is an even remotely sensible estimate.


I'm a popular science writer with eight year's experience doing exactly this (SciShow, Crash Course, Veritasium and recent winner of the Wellcome Collection Non Fiction Awards) without AI. Done right, the right coverage of even a pre-print reached hundreds of thousands/millions of people. But I've experimented with every SOTA model since 2022 with the most detailed and specific prompting I can think of (including multiple examples of transcripts of work already in the public domain) to see if it can replicate good quality science communication.

The content is usually reasonably strong but the tone is always off and it never quite understands what it is a reader/viewer needs to really get to grips with the topic if they don't already have a prior foundational understanding (though I notice this about a lot of other media outlets with professional science communicators too). It also has poor editorial thinking around what bits are most likely to be interesting and cohesive when considered as part of the whole piece.

But I'm still reasonably convinced as AI improves it ought to be able to replace me with the right workflow/context/prompting. I think there will always be a demand for my (and many other writers') talents as they are so it doesn't really bother me, but it'd be great to extend the work to all the many scientific discoveries that don't get the same attention. If anyone is serious about developing something like this, I'd be interested in partnering with them as someone with domain expertise on science communication and familiar with prompt engineering (email in bio).


That's super cool, I love the SciShow videos.

I think you're right about the editorial thinking + what do people find interesting parts. But that doesn't have to be solved by directly by AI, it's easy enough to sidestep the problem and provide a nice interface for the human-in-the-loop part. I'd imagine that would save you a ton of time by having a nice starting point depending on how much you have to rewrite for tone.


That's true, it could just turn the writer's role into more of an editorial role. The main time-saving I have so far is being able to upload papers and get it to fact check for me. The editorial guidelines at SciShow are stricter than any academic journal I've published in: any non-trivial statement has to be supported by a direct, findable quote in (most-of-the-time) peer-reviewed scientific literature. I once had to find a citation for the idea that heat + fuel + oxygen generates a fire! (for this video: https://www.youtube.com/watch?v=BEcaE0e0CZg)

LLMs make that much easier. As I collect primary sources during my drafting/writing phrase, I can type up any non-trivial claims I'm making in my script in a separate document, share that with the LLM and say "Quoting directly from the set of attached PDFs, identifying which document, and on which page the quote comes from, find content which directly supports each of these assertions" and it generally goes a great job. At any rate, I have to check each of those quotes for accuracy but the help in _finding_ those quotes in order to pass a stringent fact checking procedure is a huge help if I didn't scribble down the supporting quotes during my research phase. This is also, by the way, stricter than the fact checking process for most non-fiction publishing.


>SciShow are stricter than any academic journal I've published in

Now there's a testimonial. I look forward to browsing the source links with each video!


Feels like there might be an accuracy issue as well. Although that might make it perfectly suited to replacing whoever writes university press releases...


>I've experimented with every SOTA model since 2022

>The content is usually reasonably strong but the tone is always off and it never quite understands what it is a reader/viewer needs

A SOTA model fine-tuned with your choice of transcripts could probably get you most of the way there. There might be a customized, open-weight model already on Huggingface that meets your needs.


One question is whether the audience is discerning enough to care about the issues you're raising. This seems like it could be a variation of why the umpteenth Marvel movie beats out indie masterpieces at the box office. The audience for high quality becomes increasingly niche as the market's relatively low bar for quality is satisfied.


> as AI improves it ought to be able to replace me with the right workflow/context/prompting

The bitter lesson may very well come for us all...


Urban heat island analysis. The physical volumes of buildings is an essential input parameter into calculating the estimated impact of the built environment and possible interventions (e.g. greening, reducing traffic) against local temperature rises. It is notoriously difficult to obtain that data at fine spatial resolution. This would be a game changer. True to a lesser degree for air pollution modelling as well, building volume is a significant input for land use regression models.


Just want to say, I recognised your name immediately when I saw it. Saw a talk you gave at the Institute of Education in Bloomsbury back in 2010 when I was a teenager and it is still, to this day, one of the best popular mathematics talks I've ever witnessed (obviously helped by the immense juggling talent).

I've gone on to do a PhD in Physics and write lots of popular science for some big YouTube channels (SciShow, Veritasium) and among some of the more long term influences on my career I definitely count your talk as one of them!


That's very kind ... thank you ... and excellent that you are carrying on the task of making science accessible.

Keep it up!

PS: I see from the data you've put on the site formerly (and to be honest, still) known as Twitter that I pass reasonably close to you sporadically and irregularly, but every month or so on average. If you'd like to meet for coffee[0] and cake at some point, my contact details are in my profile.


Well said, and not to mention the importance of common knowledge as a driving impetus for enacting a change. "Everyone knows Hollywood is full of abuse" was true for decades but when the Weinstein allegations finally came out into the open, some (if not enough) action finally started happening against it. Saying the obvious thing loudly and openly is a coordinating mechanism.


Land is a Big Deal is a great place to start (and his three articles summarising Georgism for SSC, also on his substack named after Henry George's book "Progress and Poverty", which contain much of the same content).


The Identity Trap by Yascha Mounk, a mostly even handed, sensible and necessary read for right now.


Skyfield is very cool. For my wedding, I wanted to give my groomspeople personalised thank you gifts. The idea I came up with was to find or ask for precise dates significant to them each (their own anniversaries, birthdays, special events...) and convert that into an abstraction of the relative positions of the planets on that day using Skyfield: https://imgur.com/l7G0att (and with the moon's orbit if they wanted: https://imgur.com/CYqfLR4).

I then got them engraved onto cufflinks https://imgur.com/bhxJVGo

They were super happy with the result, and we all looked great on the day. I wonder if there's a market for these but it feels a little niche, I'm glad skyfield exists to help projects like these.


Very nice. I love this kind of thing.

I make multispectral Sun images (https://imgur.com/UMQhcw6) for people occasionally based on a time & date, I started messing around with the idea for my son, and then as a gift for a friend and she then helped me get a page up https://www.theremarkablz.com/thesun

It's often for sciency people who have just had a kid, because with the SDO images you can get really close to the specific time of birth.


That is a very cool gift.

While not exactly the same, I got my wife a framed print of the stars over our wedding day as an anniversary gift a few years after we were married. It may be a little niche, but I think there is a market for that kind of thing.


It's remarkable how many problems seemingly come back to Land Value Taxation.


maybe the only real technical problem is "whose is that?"


If nobody is to he found to tax then you can just take the land. Super lax version of use it or loose it


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: