More

rfurmani · 2025-07-21T17:59:38 1753120778

Dollars of compute at market rate is what I'd like to see, to check whether calling this tool would cost $100 or $100,000

rfurmani · 2025-07-21T17:56:57 1753120617

Sounds like it did not:

> This year, our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit

simonw · 2025-07-21T18:26:44 1753122404

I interpreted that bit as meaning they did not manually alter the problem statement before feeding it to the model - they gave it the exact problem text issued by IMO.

It is not clear to me from that paragraph if the model was allowed to call tools on its own or not.

jonahx · 2025-07-21T20:16:22 1753128982

As a side question, do you think using tools like Lean will become a staple of these "deep reasoning" LLM flavors?

It seems that LLMs excel (relative to other paradigms) in the kind of "loose" creative thinking humans do, but are also prone to the same kinds of mistakes humans make (hallucinations, etc). Just as Lean and other formal systems can help humans find subtle errors in their own thinking, they could do the same for LLMs.

simonw · 2025-07-21T21:30:55 1753133455

I was surprised to see them not using tools for it, that feels like a more reliable way to get useful results for this kind of thing.

I get the impression not using tools is as part of the point though - to help demonstrate how much mathematical "reasoning" you can get out of just a model on its own.

edanm · 2025-07-22T13:26:05 1753190765

Yes, I'm similarly surprised. Intuitively I'd think that it's much better to train on using Lean, since it's much easier to do RL on it (Lean gives you an objective metric on whether you achieved your objective). It also seems more useful in some ways.

But all the model providers are putting emphasis on the "this is only using natural language" angle, which I think is interesting both from a "this is easier for humans to actually use" perspective, but also comes from a place of "look how general the model is".

modeless · 2025-07-21T18:24:52 1753122292

Yes, that quote is contained in my comment. But I don't think it unambiguously excludes tool use in the internal chain of thought.

I don't think tool use would detract from the achievement, necessarily. I'm just interested to know.

KoolKat23 · 2025-07-21T18:49:27 1753123767

End to end in natural language would imply no tool use, I'd imagine. Unless it called another tool which converted it but that would be a real stretch (smoke and mirrors).

rfurmani · 2025-06-24T15:29:26 1750778966

Only for that particular board, in general it will be very complex and depend on the shape of the colored regions

rfurmani · 2025-06-15T03:40:49 1749958849

I've had a couple bad experiences with Lyft recently, including one time the driver must have clicked that they picked me up while a block away, because I could see the lyft driving to the destination without me. I tried to get a refund since I was obviously waiting my start location the whole time, but the system claimed the drive went from start to finish (even though I wasn't in the car), so no refund.

z2 · 2025-06-15T04:22:36 1749961356

Same thing happened to me, and the support system automatically decided nothing was wrong whatsoever despite my phone certainly sending a very different location from the driver. And the madness was I couldn't even book another ride as I was technically in one.

So I ended up getting it resolved via the security panic button which did put me through to a real person who was empathetic to the issue.

SOLAR_FIELDS · 2025-06-15T06:36:15 1749969375

Is this some sort of a scam? The driver cannot even mark the ride as completed without being in the area right? So they have to drive it anyway. I can’t imagine they would be on the platform for long if this happened on a regular basis. I would say it’s probably an accident but how could this behavior be accidental? Someone might accidentally say that they picked you up, but they couldn’t accidentally then drive an empty car to the destination.

dgoldstein0 · 2025-06-15T07:01:29 1749970889

Maybe they picked up the wrong person and neither of them realized?

SOLAR_FIELDS · 2025-06-16T03:47:47 1750045667

Entirely possible, people do get into wrong rideshare vehicles. Especially late night after people have been drinking. A decent driver will confirm the name when you’re in a place with a lot of pickups happening but if the language barrier is strong that might not happen.

firesteelrain · 2025-06-15T12:44:42 1749991482

My experience in DC is GPS can be spotty due to the buildings and the app glitches when it says you are in one spot but you are not there.

Also DC has rules for certain streets on what side of road you are allowed to be picked up on.

Thorentis · 2025-06-15T07:01:46 1749970906

Has anybody tried "driving" for one of these companies using GPS spoofing? You could fake the location of your phone. I suppose it'd only work a few times before the number of reports gets you banned, but I wonder whether on a laragr enough (and automated) scale it would be profitable for scammers

eisa01 · 2025-06-15T09:13:15 1749978795

I had a driver commit GPS spoofing on me: I was standing outside and there were no car to be seen anywhere even though the app showed the driver was there and had been "driving" to it

I tried to report a security incident to Uber, but not sure what happened. It would likely be easier to complain today, as now all taxis (which Uber technically is in Norway) need to be part of a Taxi dispatch central

MisterBastahrd · 2025-06-15T09:22:27 1749979347

Given that they track you every inch of your route, it'd be a pain in the butt to attempt to fake it.

I've gotten a refund on food before because my driver picked up my food and then went spend a half hour in a gas station before returning to their route even though my home was 2 minutes away.

carcinklko · 2025-06-15T09:41:04 1749980464

>Given that they track you every inch of your route, it'd be a pain in the butt to attempt to fake it.

Pain for a single app developer when no such app exists, but a spoofing app will dutifully draw anyone any number and length of travel.

aianus · 2025-06-15T07:10:06 1749971406

I had to go in person to verify my documents to drive for Uber

JumpCrisscross · 2025-06-15T14:52:14 1749999134

> ended up getting it resolved via the security panic button which did put me through to a real person who was empathetic to the issue

For both Uber and Lyft this is what I do. Which is wild since the only other company I auto-escalate-to-cancellation with is Comcast.

Waymo isn’t winning because it’s automated. It’s winning because the major players left the premium segment of the market for grabs.

harmmonica · 2025-06-15T19:30:42 1750015842

Can it be both? Maybe semantics, but a lot of folks are taking Waymo because there's no human driver. Now "no human driver" may now be considered "premium," but saying that automation is not a significant factor doesn't quite ring true. As a single point of reference, the automation is a big part of what makes it attractive to me as a rider, both because there's no human driver (not super critical to my experience, but I prefer being in the car solo) and, more importantly, because of the driving behavior; it just feels like a better driver than most drivers on the road and that's due to the automation.

RajT88 · 2025-06-15T16:33:40 1750005220

Comcast gives you the illusion of being able to talk to a human being if you are persistent enough.

What ends up happening is at some point they send you a link to talk to their support bot and tell you they are hanging up on you.

Threatening cancelation is the only way. The only reason they will not care is because of their captive markets. This is what you get with no competition.

jonny_eh · 2025-06-15T06:04:18 1749967458

Uber lets you enable a PIN for each ride. The driver can't say they picked you up until they punch in the random 4 digit PIN the app gave you for the ride.

harvey9 · 2025-06-15T09:54:44 1749981284

This is good, but why can't these firms determine when your phone and the drivers phone are far apart?

michaelt · 2025-06-15T10:39:57 1749983997

It's not unusual to call a taxi for another person. Or to make a multi-stop journey where some people get out before others. You can even send a parcel across town in a taxi.

Checking phone proximity might be helpful in some cases, but it's not a silver bullet.

blindriver · 2025-06-15T10:31:45 1749983505

Too many people order Ubers for other people so it won’t work.

vachina · 2025-06-15T11:25:40 1749986740

GPS does not work everywhere, and not every device support BLE beacons.

andrepd · 2025-06-15T12:03:41 1749989021

I never give location permissions to any app if I can avoid it (indeed I don't even have the spyware app if I can avoid it; e.g. I use the web to order an Uber)

alistairSH · 2025-06-15T13:43:05 1749994985

I don’t know why they don’t require it. Every Uber in Porto Rico uses the PIN but I’ve only had one in the mainland USA ask for it.

jonny_eh · 2025-06-15T15:22:01 1750000921

You need to enable it on your account.

alistairSH · 2025-06-16T17:49:22 1750096162

Exactly - I believe it should be required for safety, limits shenanigans, etc. Apparently, it is required in Puerto Rico, but I don't know if drivers have to enable it themselves or if the app knows where the driver is operating. Are you saying the rider can also turn it on all the time? If so, that's good - I've only ever it seen driver's request it (all in PR, and one in mainland US, everywhere else, no PIN).

jonny_eh · 2025-06-17T22:01:04 1750197664

Right, YOU need to enable it. I assumed it was always up to the rider, since they're the beneficiary of it.

stahtops · 2025-06-15T04:51:47 1749963107

I waited 40 minutes for a Lyft at an airport because the driver made up a story about an accident and traffic, in the airport. No one else seemed to be affected by this traffic- so eventually I tried booking an Uber. It arrived 3 minutes later.

20 minutes after that the Lyft driver keeps texting me “where are you?!”. Their turn to wait!

Saw later they just started the ride without me and drove to my hotel.

Lyft said “this trip was completed, no refund”. Welp, app deleted.

dgoldstein0 · 2025-06-15T07:14:27 1749971667

I've had several cases of drivers just not picking me up. Reading their time to move anywhere at all, driving away and keep getting further and further away, it driving towards me only to turn some other direction. I always just cancel on them and have never had to pay a cancellation fee. I think once or twice they "picked me up" a block away. I'm pretty sure I was able to cancel or end the ride on that too, definitely was never charged though I don't recall if I had to use the support. But I never let it actually complete the trip when I wasn't riding. But I was always very miffed when anything like that happened as I did not appreciate them wasting my time.

johnmaguire · 2025-06-15T15:32:42 1750001562

On Uber I paid for priority pickup and watched as a driver drove within two blocks of my home and then sat in a neighborhood for 10 minutes. I finally message "Everything OK?" and get no reply but they finish their journey to my place.

The car reeked of weed.

ctxc · 2025-06-15T05:39:32 1749965972

That's must be annoying to say the least. In India drivers require an OTP to start a ride.

The OTP is the same for a user across rides, so I have mine memorised which is nifty. No fiddling with the phone during boarding.

On security: exploiting this would require the driver to stay in my vicinity the next time I book a ride, and also get the ride assigned to them. In a high population density area, it's rare - I've never had the same driver twice.

dheerajvs · 2025-06-15T11:33:50 1749987230

Uber in India gives me a different OTP for each ride. A different ride-hailing app I use occasionally uses a PIN tied to a user.

OTPs are a simple solution to fraudulent rides that it's surprising it's not implemented universally, given all the complaints in this thread.

cwalv · 2025-06-15T05:54:20 1749966860

An OTP that's reused?

labster · 2025-06-15T08:56:33 1749977793

Omni-time password

csomar · 2025-06-15T06:37:04 1749969424

It solves the problem for 99.99% of the time. Drivers are not going to memorize your OTP; and it is unlikely that an OTP list will be leaked/used anytime soon.

broken-kebab · 2025-06-15T07:26:49 1749972409

Maybe, but there's OT in OTP. So if it's not changing then it's not OTP, just P.

travisjungroth · 2025-06-15T09:12:47 1749978767

It changes every time. You can also just have it at night, which I have. Prevents drunk wrong riders.

wildzzz · 2025-06-15T12:21:31 1749990091

For that driver, it's effectively an OTP although probably not very pseudorandom.

mrloop · 2025-06-15T08:38:35 1749976715

An MTP

ctxc · 2025-06-15T05:57:46 1749967066

I mean it _technically_ isn't an OTP, but you know what I mean - just a code only the user knows that they need to share with the rider.

The threat model is sufficiently low to justify the much better UX of not having to look the code up everytime.

Propelloni · 2025-06-15T10:42:14 1749984134

The acronym you are looking for is "PIN", a Personal Identification Number.

dheera · 2025-06-15T06:03:54 1749967434

Charge back with your credit card if Lyft isn't willing to help you. Keep businesses in check.

bqmjjx0kac · 2025-06-15T12:13:01 1749989581

In my experience, you should prepare for retaliation when you do a charge back.

bigstrat2003 · 2025-06-15T15:47:02 1750002422

Whether one cares depends very strongly on what "retaliation" means. If they ban your account, not a big deal - you were getting bad service and didn't want to do business with them anyway. If they send an armed hit squad to kill you, that would be worth being concerned about though.

bqmjjx0kac · 2025-06-15T16:20:26 1750004426

I "purchased" a digital game once on the PlayStation Store. It wasn't clear from the description that it was completely useless without an active subscription to PSN, so I tried to return it. They said no way, sales are final and you've already launched the game. I did a chargeback, and they basically locked down my account until I filed a support ticket and had to lie, saying someone else made a purchase on my account.

Zambyte · 2025-06-15T21:51:55 1750024315

If they don't want you as a customer, it's not wise to fight them on that.

mortenjorck · 2025-06-15T14:01:16 1749996076

If you really want to delete the app, a chargeback is the surest way to permanently remove yourself from the platform.

dheera · 2025-06-16T04:14:24 1750047264

The business was in the wrong, so unless they rectify and refund my money I wouldn't use their platform again anyway.

teekert · 2025-06-15T10:52:01 1749984721

I’ve heard the story from the other side as well: App reports ride is arriving, people get in, they go the wrong way and see their original ride stating that you are not there and leave again.

So it may not be intentional. Just coincidence and poor verification.

immibis · 2025-06-15T12:20:51 1749990051

Companies that cheap out by not performing the basic obligations of business end up paying more for small claims court - provided their ripped-off customers actually take them to small claims court. Did you?

rfurmani · 2025-04-04T19:51:18 1743796278

Completely agree on both counts! I loved those two games and felt Conquests of the Longbow didn't get the recognition it deserves.

On the second point, when I read his book (https://kensbook.com/) I was disappointed to not hear about the magic of the games themselves and the creative process behind them. It became clear that his primary goal was to grow a business, he thought being a game distributor was more exciting, but then was disrupted by Steam, shareware, and online distribution.

rfurmani · 2025-04-02T21:17:11 1743628631

I'm building such tools at https://sugaku.net, right now there's chatting with a paper and browsing similar papers. Generally arXiv and other repositories want you to link to them and not embed their papers, which makes it hard to build inline reading tools, but it's on my roadmap to support that for uploaded papers. Would love to hear if you have some feature requests there

amelius · 2025-04-03T09:43:11 1743673391

One feature could be that it automatically fetches the papers that it refers to and also feeds them through the llm. And maybe apply that recursively. This could give the AI a better overview of the related literature.

rfurmani · 2025-03-26T03:57:39 1742961459

After I opened up https://sugaku.net to be usable without login, it was astounding how quickly the crawlers started. I'd like the site to be accessible to all, but I've had to restrict most of the dynamic features to logged in users, restrict robots.txt, use cloudflare to block AI crawlers and bad bots, and I'm still getting ~1M automated requests per day (compared to ~1K organic), so I think I'll need to restrict the site to logged in users soon.

keyle · 2025-03-26T04:42:59 1742964179

Has someone made honeypot for AI yet?

Take all regular papers and change their words or keywords to something outrageous and watch it feed it to users.

MIC132 · 2025-03-26T07:25:03 1742973903

This kinda fits, though it's on a personal blog level:

https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scr...

puchatek · 2025-03-26T08:23:25 1742977405

If there was a non-profit dedicated do this, I would donate

karlgkk · 2025-03-26T04:05:29 1742961929

One thing that worked well for me was layering obstacles

It really sucks that this is the way things are, but what I did was

10 requests for pages in a minute, you get captchad (with a little apology and the option to bypass it by logging in). asset loads don’t count

After a captcha pass, 100 requests in an hour gets you auth walled

It’s really shitty but my industry is used to content scraping.

This allows legit users to get what they need. Although my users maybe don’t need prolonged access ahem.

nomel · 2025-03-26T04:20:55 1742962855

What happens if you use the proper rate limiting status of 429? It includes a next retry time [1]. I'm curious what (probably small) fraction would respect it.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

karlgkk · 2025-03-28T06:41:10 1743144070

Probably makes sense for a b2b app where you publish status codes as part of the api

Bad actors don’t care and annoying actors would make fun of you for it on twitter

rfurmani · 2025-03-26T04:40:12 1742964012

I've wanted to but wasn't sure how to keep track of individuals. What works for you? IP Addresses, cookies, something else?

karlgkk · 2025-03-28T06:43:13 1743144193

I use IP addy. Users behind cgnat are already used to getting captcha the first time around

There’s some stuff you can do, like creating risk scores (if a user changes ip and uses the same captcha token, increase score). Many vendors do that, as does my captcha provider.

nukem222 · 2025-03-26T05:59:15 1742968755

> This allows legit users to get what they need.

Of course they could have just used the site directly.

karlgkk · 2025-03-28T06:41:45 1743144105

If bots and scrapers respected the robots and tos, we wouldn’t be here

It sucks!

GoblinSlayer · 2025-03-26T22:02:56 1743026576

Or just buy cloudflare :)

LPisGood · 2025-03-26T04:32:02 1742963522

What is your website?

rfurmani · 2025-02-25T18:36:52 1740508612

Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)

rfurmani · 2025-02-16T19:46:03 1739735163

I'm serving AI models on Lambda Labs and after some trial and error I found having a single vllm server along with caddy, behind cloudflare dns, to work really well and really easy to set up

vllm serve ${MODEL_REPO} --dtype auto --api-key $HF_TOKEN --guided-decoding-backend outlines --disable-fastapi-docs &

sudo caddy reverse-proxy --from ${SUBDOMAIN}.sugaku.net --to localhost:8000 &

homebrewer · 2025-02-17T00:53:41 1739753621

It's really best to avoid running web servers as root. It's easy to forward the port 80 with iptables, change the kernel knob to let unprivileged users use port 80 and above, or set the network capability on the binary.

https://stackoverflow.com/questions/413807/

delduca · 2025-02-16T19:53:44 1739735624

You can use Cloudflare Tunnel, which is even better and simple than having an extra service.

rfurmani · 2025-02-15T00:15:09 1739578509

As a former mathematician, I found research to be a very winding path. While that can be fun, I felt there's a lot of opportunity to train LLMs and ML models on the corpus of math papers, to try to make research more deliberate and less reliant on talking to the right person at the right time.

This is very much a work in progress but so far you can:

* Browse through similar papers

* Get recommendations for new papers and collaborators

* Chat with papers and ask questions to all the major reasoning models

* Have it come up with future paper ideas (along with references) giving a potential title or collaborators.

My focus very much is on the exploratory stages since that's where a lot of the time is spent, but I intend to integrate more tools for problem solving, writing, and computation.

ccppurcell · 2025-02-17T17:11:51 1739812311

I think you should have some "about us" section on your webpage if you want people to give their email addresses. I already get loads of spam that knows my email address belongs to someone with a PhD (though they are often shaky on the details). I looked at your site and there's no information about who is doing it and why.

rfurmani · 2025-02-17T19:02:11 1739818931

That's fair, though there's not much to say since I'm building it out myself as a benefit corporation. I also have strict opt-out for any communications and a proper privacy policy.

I've also tried to keep as much as I can accessible without login, but I want to protect some of the more expensive features from being spammed.

Without signup up you can:

* explore works (but not chat with them) https://sugaku.net/oa/W4206400500/

* explore authors https://sugaku.net/oa/A5059543195/

* see and share AI answers (eg https://sugaku.net/qna/4e59662a-a938-404e-8c0b-b9dc79e37c29/ and https://sugaku.net/qna/517930ff-42ad-47c5-9d9c-e807d06a8453/)

* prompt for new paper ideas https://sugaku.net/current/papergen/

* see and share these ideas https://sugaku.net/current/papergen/idea/719aed36-8dcd-4fd1-...

paulpauper · 2025-02-17T17:40:50 1739814050

No matter how original you think you are, it's almost always already been done. You think you found a new theorem and then you check some old pdf from 20+ years ago and it's already been done.

If you can pull it off, and the result is actually novel and not trivial, you can get a PhD. that is how hard it is.

rfurmani · 2025-02-17T18:55:02 1739818502

The flipside of that is seeing hints of a result that would be really helpful. I still remember how excited I was to stumble on a book from 1931 (The Taylor Series by Dienes) since it had the only english-language proofs of some results by Szego and Polya that I felt could unblock my research. My hope is that this discovery problem can be largely solved.

This is also why I'm not as excited by the focus on pure reasoning and olympiad problem solving in the math and AI space. It's like the early career phase of trying to solve Collatz and Riemann but just repeating work from decades ago.