Hacker Newsnew | past | comments | ask | show | jobs | submit | rfurmani's commentslogin

Dollars of compute at market rate is what I'd like to see, to check whether calling this tool would cost $100 or $100,000


Sounds like it did not:

> This year, our advanced Gemini model operated end-to-end in natural language, producing rigorous mathematical proofs directly from the official problem descriptions – all within the 4.5-hour competition time limit


I interpreted that bit as meaning they did not manually alter the problem statement before feeding it to the model - they gave it the exact problem text issued by IMO.

It is not clear to me from that paragraph if the model was allowed to call tools on its own or not.


As a side question, do you think using tools like Lean will become a staple of these "deep reasoning" LLM flavors?

It seems that LLMs excel (relative to other paradigms) in the kind of "loose" creative thinking humans do, but are also prone to the same kinds of mistakes humans make (hallucinations, etc). Just as Lean and other formal systems can help humans find subtle errors in their own thinking, they could do the same for LLMs.


I was surprised to see them not using tools for it, that feels like a more reliable way to get useful results for this kind of thing.

I get the impression not using tools is as part of the point though - to help demonstrate how much mathematical "reasoning" you can get out of just a model on its own.


Yes, I'm similarly surprised. Intuitively I'd think that it's much better to train on using Lean, since it's much easier to do RL on it (Lean gives you an objective metric on whether you achieved your objective). It also seems more useful in some ways.

But all the model providers are putting emphasis on the "this is only using natural language" angle, which I think is interesting both from a "this is easier for humans to actually use" perspective, but also comes from a place of "look how general the model is".


Yes, that quote is contained in my comment. But I don't think it unambiguously excludes tool use in the internal chain of thought.

I don't think tool use would detract from the achievement, necessarily. I'm just interested to know.


End to end in natural language would imply no tool use, I'd imagine. Unless it called another tool which converted it but that would be a real stretch (smoke and mirrors).


Only for that particular board, in general it will be very complex and depend on the shape of the colored regions


I've had a couple bad experiences with Lyft recently, including one time the driver must have clicked that they picked me up while a block away, because I could see the lyft driving to the destination without me. I tried to get a refund since I was obviously waiting my start location the whole time, but the system claimed the drive went from start to finish (even though I wasn't in the car), so no refund.


Same thing happened to me, and the support system automatically decided nothing was wrong whatsoever despite my phone certainly sending a very different location from the driver. And the madness was I couldn't even book another ride as I was technically in one.

So I ended up getting it resolved via the security panic button which did put me through to a real person who was empathetic to the issue.


Is this some sort of a scam? The driver cannot even mark the ride as completed without being in the area right? So they have to drive it anyway. I can’t imagine they would be on the platform for long if this happened on a regular basis. I would say it’s probably an accident but how could this behavior be accidental? Someone might accidentally say that they picked you up, but they couldn’t accidentally then drive an empty car to the destination.


Maybe they picked up the wrong person and neither of them realized?


Entirely possible, people do get into wrong rideshare vehicles. Especially late night after people have been drinking. A decent driver will confirm the name when you’re in a place with a lot of pickups happening but if the language barrier is strong that might not happen.


My experience in DC is GPS can be spotty due to the buildings and the app glitches when it says you are in one spot but you are not there.

Also DC has rules for certain streets on what side of road you are allowed to be picked up on.


Has anybody tried "driving" for one of these companies using GPS spoofing? You could fake the location of your phone. I suppose it'd only work a few times before the number of reports gets you banned, but I wonder whether on a laragr enough (and automated) scale it would be profitable for scammers


I had a driver commit GPS spoofing on me: I was standing outside and there were no car to be seen anywhere even though the app showed the driver was there and had been "driving" to it

I tried to report a security incident to Uber, but not sure what happened. It would likely be easier to complain today, as now all taxis (which Uber technically is in Norway) need to be part of a Taxi dispatch central


Given that they track you every inch of your route, it'd be a pain in the butt to attempt to fake it.

I've gotten a refund on food before because my driver picked up my food and then went spend a half hour in a gas station before returning to their route even though my home was 2 minutes away.


>Given that they track you every inch of your route, it'd be a pain in the butt to attempt to fake it.

Pain for a single app developer when no such app exists, but a spoofing app will dutifully draw anyone any number and length of travel.


I had to go in person to verify my documents to drive for Uber


> ended up getting it resolved via the security panic button which did put me through to a real person who was empathetic to the issue

For both Uber and Lyft this is what I do. Which is wild since the only other company I auto-escalate-to-cancellation with is Comcast.

Waymo isn’t winning because it’s automated. It’s winning because the major players left the premium segment of the market for grabs.


Can it be both? Maybe semantics, but a lot of folks are taking Waymo because there's no human driver. Now "no human driver" may now be considered "premium," but saying that automation is not a significant factor doesn't quite ring true. As a single point of reference, the automation is a big part of what makes it attractive to me as a rider, both because there's no human driver (not super critical to my experience, but I prefer being in the car solo) and, more importantly, because of the driving behavior; it just feels like a better driver than most drivers on the road and that's due to the automation.


Comcast gives you the illusion of being able to talk to a human being if you are persistent enough.

What ends up happening is at some point they send you a link to talk to their support bot and tell you they are hanging up on you.

Threatening cancelation is the only way. The only reason they will not care is because of their captive markets. This is what you get with no competition.


Uber lets you enable a PIN for each ride. The driver can't say they picked you up until they punch in the random 4 digit PIN the app gave you for the ride.


This is good, but why can't these firms determine when your phone and the drivers phone are far apart?


It's not unusual to call a taxi for another person. Or to make a multi-stop journey where some people get out before others. You can even send a parcel across town in a taxi.

Checking phone proximity might be helpful in some cases, but it's not a silver bullet.


Too many people order Ubers for other people so it won’t work.


GPS does not work everywhere, and not every device support BLE beacons.


I never give location permissions to any app if I can avoid it (indeed I don't even have the spyware app if I can avoid it; e.g. I use the web to order an Uber)


I don’t know why they don’t require it. Every Uber in Porto Rico uses the PIN but I’ve only had one in the mainland USA ask for it.


You need to enable it on your account.


Exactly - I believe it should be required for safety, limits shenanigans, etc. Apparently, it is required in Puerto Rico, but I don't know if drivers have to enable it themselves or if the app knows where the driver is operating. Are you saying the rider can also turn it on all the time? If so, that's good - I've only ever it seen driver's request it (all in PR, and one in mainland US, everywhere else, no PIN).


Right, YOU need to enable it. I assumed it was always up to the rider, since they're the beneficiary of it.


I waited 40 minutes for a Lyft at an airport because the driver made up a story about an accident and traffic, in the airport. No one else seemed to be affected by this traffic- so eventually I tried booking an Uber. It arrived 3 minutes later.

20 minutes after that the Lyft driver keeps texting me “where are you?!”. Their turn to wait!

Saw later they just started the ride without me and drove to my hotel.

Lyft said “this trip was completed, no refund”. Welp, app deleted.


I've had several cases of drivers just not picking me up. Reading their time to move anywhere at all, driving away and keep getting further and further away, it driving towards me only to turn some other direction. I always just cancel on them and have never had to pay a cancellation fee. I think once or twice they "picked me up" a block away. I'm pretty sure I was able to cancel or end the ride on that too, definitely was never charged though I don't recall if I had to use the support. But I never let it actually complete the trip when I wasn't riding. But I was always very miffed when anything like that happened as I did not appreciate them wasting my time.


On Uber I paid for priority pickup and watched as a driver drove within two blocks of my home and then sat in a neighborhood for 10 minutes. I finally message "Everything OK?" and get no reply but they finish their journey to my place.

The car reeked of weed.


That's must be annoying to say the least. In India drivers require an OTP to start a ride.

The OTP is the same for a user across rides, so I have mine memorised which is nifty. No fiddling with the phone during boarding.

On security: exploiting this would require the driver to stay in my vicinity the next time I book a ride, and also get the ride assigned to them. In a high population density area, it's rare - I've never had the same driver twice.


Uber in India gives me a different OTP for each ride. A different ride-hailing app I use occasionally uses a PIN tied to a user.

OTPs are a simple solution to fraudulent rides that it's surprising it's not implemented universally, given all the complaints in this thread.


An OTP that's reused?


Omni-time password


It solves the problem for 99.99% of the time. Drivers are not going to memorize your OTP; and it is unlikely that an OTP list will be leaked/used anytime soon.


Maybe, but there's OT in OTP. So if it's not changing then it's not OTP, just P.


It changes every time. You can also just have it at night, which I have. Prevents drunk wrong riders.


For that driver, it's effectively an OTP although probably not very pseudorandom.


An MTP


I mean it _technically_ isn't an OTP, but you know what I mean - just a code only the user knows that they need to share with the rider.

The threat model is sufficiently low to justify the much better UX of not having to look the code up everytime.


The acronym you are looking for is "PIN", a Personal Identification Number.


Charge back with your credit card if Lyft isn't willing to help you. Keep businesses in check.


In my experience, you should prepare for retaliation when you do a charge back.


Whether one cares depends very strongly on what "retaliation" means. If they ban your account, not a big deal - you were getting bad service and didn't want to do business with them anyway. If they send an armed hit squad to kill you, that would be worth being concerned about though.


I "purchased" a digital game once on the PlayStation Store. It wasn't clear from the description that it was completely useless without an active subscription to PSN, so I tried to return it. They said no way, sales are final and you've already launched the game. I did a chargeback, and they basically locked down my account until I filed a support ticket and had to lie, saying someone else made a purchase on my account.


If they don't want you as a customer, it's not wise to fight them on that.


If you really want to delete the app, a chargeback is the surest way to permanently remove yourself from the platform.


The business was in the wrong, so unless they rectify and refund my money I wouldn't use their platform again anyway.


I’ve heard the story from the other side as well: App reports ride is arriving, people get in, they go the wrong way and see their original ride stating that you are not there and leave again.

So it may not be intentional. Just coincidence and poor verification.


Companies that cheap out by not performing the basic obligations of business end up paying more for small claims court - provided their ripped-off customers actually take them to small claims court. Did you?


Completely agree on both counts! I loved those two games and felt Conquests of the Longbow didn't get the recognition it deserves.

On the second point, when I read his book (https://kensbook.com/) I was disappointed to not hear about the magic of the games themselves and the creative process behind them. It became clear that his primary goal was to grow a business, he thought being a game distributor was more exciting, but then was disrupted by Steam, shareware, and online distribution.


I'm building such tools at https://sugaku.net, right now there's chatting with a paper and browsing similar papers. Generally arXiv and other repositories want you to link to them and not embed their papers, which makes it hard to build inline reading tools, but it's on my roadmap to support that for uploaded papers. Would love to hear if you have some feature requests there


One feature could be that it automatically fetches the papers that it refers to and also feeds them through the llm. And maybe apply that recursively. This could give the AI a better overview of the related literature.


After I opened up https://sugaku.net to be usable without login, it was astounding how quickly the crawlers started. I'd like the site to be accessible to all, but I've had to restrict most of the dynamic features to logged in users, restrict robots.txt, use cloudflare to block AI crawlers and bad bots, and I'm still getting ~1M automated requests per day (compared to ~1K organic), so I think I'll need to restrict the site to logged in users soon.


Has someone made honeypot for AI yet?

Take all regular papers and change their words or keywords to something outrageous and watch it feed it to users.


This kinda fits, though it's on a personal blog level:

https://www.brainonfire.net/blog/2024/09/19/poisoning-ai-scr...


If there was a non-profit dedicated do this, I would donate


One thing that worked well for me was layering obstacles

It really sucks that this is the way things are, but what I did was

10 requests for pages in a minute, you get captchad (with a little apology and the option to bypass it by logging in). asset loads don’t count

After a captcha pass, 100 requests in an hour gets you auth walled

It’s really shitty but my industry is used to content scraping.

This allows legit users to get what they need. Although my users maybe don’t need prolonged access ahem.


What happens if you use the proper rate limiting status of 429? It includes a next retry time [1]. I'm curious what (probably small) fraction would respect it.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...


Probably makes sense for a b2b app where you publish status codes as part of the api

Bad actors don’t care and annoying actors would make fun of you for it on twitter


I've wanted to but wasn't sure how to keep track of individuals. What works for you? IP Addresses, cookies, something else?


I use IP addy. Users behind cgnat are already used to getting captcha the first time around

There’s some stuff you can do, like creating risk scores (if a user changes ip and uses the same captcha token, increase score). Many vendors do that, as does my captcha provider.


> This allows legit users to get what they need.

Of course they could have just used the site directly.


If bots and scrapers respected the robots and tos, we wouldn’t be here

It sucks!


Or just buy cloudflare :)


What is your website?


Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)


I'm serving AI models on Lambda Labs and after some trial and error I found having a single vllm server along with caddy, behind cloudflare dns, to work really well and really easy to set up

vllm serve ${MODEL_REPO} --dtype auto --api-key $HF_TOKEN --guided-decoding-backend outlines --disable-fastapi-docs &

sudo caddy reverse-proxy --from ${SUBDOMAIN}.sugaku.net --to localhost:8000 &


It's really best to avoid running web servers as root. It's easy to forward the port 80 with iptables, change the kernel knob to let unprivileged users use port 80 and above, or set the network capability on the binary.

https://stackoverflow.com/questions/413807/


You can use Cloudflare Tunnel, which is even better and simple than having an extra service.


As a former mathematician, I found research to be a very winding path. While that can be fun, I felt there's a lot of opportunity to train LLMs and ML models on the corpus of math papers, to try to make research more deliberate and less reliant on talking to the right person at the right time.

This is very much a work in progress but so far you can:

* Browse through similar papers

* Get recommendations for new papers and collaborators

* Chat with papers and ask questions to all the major reasoning models

* Have it come up with future paper ideas (along with references) giving a potential title or collaborators.

My focus very much is on the exploratory stages since that's where a lot of the time is spent, but I intend to integrate more tools for problem solving, writing, and computation.


I think you should have some "about us" section on your webpage if you want people to give their email addresses. I already get loads of spam that knows my email address belongs to someone with a PhD (though they are often shaky on the details). I looked at your site and there's no information about who is doing it and why.


That's fair, though there's not much to say since I'm building it out myself as a benefit corporation. I also have strict opt-out for any communications and a proper privacy policy.

I've also tried to keep as much as I can accessible without login, but I want to protect some of the more expensive features from being spammed.

Without signup up you can:

* explore works (but not chat with them) https://sugaku.net/oa/W4206400500/

* explore authors https://sugaku.net/oa/A5059543195/

* see and share AI answers (eg https://sugaku.net/qna/4e59662a-a938-404e-8c0b-b9dc79e37c29/ and https://sugaku.net/qna/517930ff-42ad-47c5-9d9c-e807d06a8453/)

* prompt for new paper ideas https://sugaku.net/current/papergen/

* see and share these ideas https://sugaku.net/current/papergen/idea/719aed36-8dcd-4fd1-...


No matter how original you think you are, it's almost always already been done. You think you found a new theorem and then you check some old pdf from 20+ years ago and it's already been done.

If you can pull it off, and the result is actually novel and not trivial, you can get a PhD. that is how hard it is.


The flipside of that is seeing hints of a result that would be really helpful. I still remember how excited I was to stumble on a book from 1931 (The Taylor Series by Dienes) since it had the only english-language proofs of some results by Szego and Polya that I felt could unblock my research. My hope is that this discovery problem can be largely solved.

This is also why I'm not as excited by the focus on pure reasoning and olympiad problem solving in the math and AI space. It's like the early career phase of trying to solve Collatz and Riemann but just repeating work from decades ago.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: