I know there were at least a few kernel devs who "validated" this bug, but did anyone actually build a PoC and test it? It's such a critical piece of the process yet a proof of concept is completely omitted? If you don't have a PoC, you don't know what sort of hiccups would come along the way and therefore can't determine exploitability or impact. At least the author avoided calling it an RCE without validation.
But what if there's a missing piece of the puzzle that the author and devs missed or assumed o3 covered, but in fact was out of o3's context, that would invalidate this vulnerability?
I'm not saying there is, nor am I going to take the time to do the author's work for them, rather I am saying this report is not fully validated which feels like a dangerous precedent to set with what will likely be an influential blog post in the LLM VR space moving forward.
IMO the idea of PoC || GTFO should be applied more strictly than ever before to any vulnerability report generated by a model.
The underlying perspective that o3 is much better than previous or other current models still remains, and the methodology is still interesting. I understand the desire and need to get people to focus on something by wording it a specific way, it's the clickbait problem. But dammit, do better. Build a PoC and validate your claims, don't be lazy. If you're going to write a blog post that might influence how vulnerability researchers conduct their research, you should promote validation and not theoretical assumption. The alternative is the proliferation of ignorance through false-but-seemingly-true reporting, versus deepening the community's understanding of a system through vetted and provable reports.
Thank you! I'm really happy to hear you did that. But why not mention that in your blog post? I understand not wanting to include a PoC for responsible disclosure reasons, but including it would have added a lot of credibility to your work for assholes like me lol
I honestly hadn’t anticipated someone would think I hadn’t bothered to verify the vulnerability is real ;)
Since you’re interested: the bug is real but it is, I think, hard to exploit in real world scenarios. I haven’t tried. The timing you need to achieve is quite precise and tight. There are better bugs in ksmbd from an exploitation point of view. All of that is a bit of a “luxury problem” from the PoV of assessing progress in LLM capabilities at finding vulnerabilities though. We can worry about ranking bugs based on convenience for RCE once we can reliably find them at all.
I'm too much of a skeptic to not do so lol. Great post though overall, don't let my assholery dissuade you! I was pleasantly surprised that it was actually a researcher behind the news story and there was some real evidence / scientific procedure. I thought you had a lot of good insights into how to use LLMs in the VR space specifically, and I'm glad you did benchmarking. It's interesting to see how they're improving.
Yeah race conditions like that are always tricky to make reliable. And yeah I do realize that the purpose of the writeup was more about the efficacy of using LLMs vs the bug itself, and I did get a lot out of that part, I just hyper-focused on the bug because it's what I tend to care the most about. In the end I agree with your conclusion, I believe LLMs are going to become a key part of the VR workflow as they improve and I'm grateful for folks like yourself documenting a way forward for their integration.
Anyways, solid writeup and really appreciate the follow-up!
PoCs should at least trigger a crash, overwrite a register, or have some other provable effect, the point being to determine:
1) If it is actually a UAF or if there is some other mechanism missing from the context that prevents UAF.
2) The category and severity of the vulnerability. Is it even a DoS, RCE, or is the only impact causing a thread to segfault?
This is all part of the standard vulnerability research process. I'm honestly surprised it got merged in without a PoC, although with high profile projects even the suggestion of a vulnerability in code that can clearly be improved will probably end up getting merged.
Even a rudimentary exploit can be a significant time investment, it is absolutely not common practice to develop, publish or to demand such exploits from researchers to demonstrate memory corruption vulnerabilities. Everyone thinks they are an expert in infosec its so funny.
Well, in another subthread the author said he did in fact make a crashing PoC. I guess it depends on the customer's standards, but I would say in the vast majority of cases (especially for nuanced memory corruptions in which the ability to make something exploitable depends on your ability to demonstrate control of the heap) a crashing PoC is the bare minimum. In most VDPs, BBPs, or red team engagements you are required to provide some sort of proof to claim, otherwise you'll be laughed out of the room.
I'm curious which sector of infosec you're referring to in which vulnerability researchers are not required to provide proofs of concept? Maybe internal product VR where there is already an established trust?
The part of the prompt that suggests its the 15th of December is a GET param, which just means wherever this link was retrieved from is where that date is coming from.
The PDF could have been authored at any time.
Looks like the created date embedded in the metadata is as follows:
2023-12-18T21:21:19.000Z
Created with MS Word. But even that isn't definitive.
What's to stop them from having hooks in their app that can bundle up all the decrypted messages, re-encrypt, and phone home? Certainly it wouldn't be default behavior, but its possible and would allow them to answer warrants.
Agreed. I think their bottom line probably is built off of how it would affect their user base. My hunch is given the immensity of the user base, it wouldn't cause enough of a significant exodus for Meta to care either way. But that's speculation, not sure if that can be backed up with evidence from past events.
An "advert" for a BSD-licensed open-source codebase? Pointers to a comparable OSS networking project, implemented in memory-safe golang or rust, would be appreciated. There is https://router7.org, but for a narrow use case.
Hi -- this is the SPR team, we actually did not push this on ycombinator and are happy to see it being discussed. We've previously made one post about SPR here, under Show HN:
The post in the link does not pertain to the user PSK but it is about the difficult trade offs that users have when they need to chain routers together.
Imagine someone has a router that they want to put all the IOT stuff that does not get security updates and has poor code quality compared to the rest of a network.
Should that router be the first router that has access to the internet? Or should it be connected to the router that does. The answer is not so simple and that's what the blog post discusses.
In SPR we provide users a mechanism to block upstream RFC1918 addresses by default and selectively enable them.
We have also found numerous flaws in Guest WiFi systems that totally break isolation between the Guest Network and the main network. This affects many routers on the market today, in particular when a medium is bridged between wired and wireless, but also in general.
As seibol commented -- VLAN tagging per SSID is a valid approach as well if a router supports it. Thats a lot stronger than how many routers implement their guest isolation.
As for Multi-PSK -- the use case is creating micro-segmentation in a network with zero-trust, where the identity on the network is rooted in that password.
Without Multi-PSK, if it's not clear, every device that has the WiFi password can sniff encrypted traffic with WPA2, make a Rogue AP to attack WPA3 in case its in use, and can perform ARP spoofing on the network to interfere with other devices.
My approach is just setting proper firewall rules on a dedicated ESSID with a dedicated VLAN. A device on a restricted VLAN shouldn't be able talk to anything. The downside is its more work, but the plus side is it can be done on trusted firmware (OpenWRT) and not something that would require an entire code audit to determine if there are any logic flaws.
This doesn’t isolate the devices from each other, though. (Well, maybe if you have isolation set up on the AP and the devices are all connected to the same radio or isolation happens to work across radios and no one exploits any of the myriad ways in which Ethernet, on the same broadcast domain, is not a secure protocol.)
Lack of usable support from a lot of access points and management systems. Do any of the major multi-AP systems support it? UniFi has no support. I don’t think any of the Ruckus products support it.
(Also, “push the button” is a bit of an awkward concept with multiple APs.)
edit: it’s also a disaster due to a proliferation of crappy client devices that more or less require it.
I see. I'm using a normal router in bridge mode as an extender and that's been working well enough and comes with WPS built in so for instance, I can turn it on there if the printer is closer but of course it would be nice to turn it on in one place and have all the extenders have it on as well.
Its important to note their firmware and especially their cloud infrastructure should absolutely not be trusted. Their hardware is probably fine, so just flash OpenWRT.
The reponse "<name> tries to... remember they are a god. They are a god. They <do some godlike action to survive>" seems to work very well. But also results in some hilarious deaths.
Threat intel and analysis is just like any other analysis, it is taking a heuristic approach to finding answers.
Can it be bypassed? Yes.
Are the researchers whose entire company hinges on the correctness of their analysis doing their absolute best to attribute the attack to a threat actor? Yes.
So to your point, somebody could indeed reuse malware or attempt to replicate it. However, the researchers are likely analyzing the disassembly and bytecode, and replicating complex malware to perfectly imitate a known family of malware is exceptionally difficult and statistically very unlikely. This is how threat intel is able to make any sort of claim of attribution.
Up front, I believe Mullvad is the best commercial VPN solution and is doing a great job at making good privacy more accessible.
However, a lot of the comments here seem to be hailing VPNs in general as the solution to privacy on the internet.
I would like to remind people that VPNs only really protect you against two things: your ISP and the endpoint. And that's assuming that your ISP isn't doing some shady analytics.
That being said, knocking those two things off the board is a huge benefit to privacy and absolutely should be done.
It is my understanding that many ISPs and backbone providers sell or otherwise disclose full detailed packet metadata, including precision timestamps, and that there are companies that aggregate this data across the entire Internet.
At which point your VPN becomes just another hop in the trace.
VPNs, no matter how secure they themselves are, are effective for accessing lightly geo-locked content and defeating unsophisticated analytics and tracking. They are really not a serious privacy solution in any sense, unfortunately.
I don't understand this area well enough, I think. Doesn't a VPN encrypt the routing information that tells the packet where to ultimately end up? I.e. my ISP can see the traffic going to the VPN, but can't look inside it, and can't see where it goes from there?
Correct, but the destination ISP chain (and of course the destination service itself) can equally see the traffic coming from the VPN, and if you have packet metadata (precise timing and packet sizes) from two sources on either side of the VPN, it is trivial to correlate those two streams.
Note that Mullvad's WireGuard settings offer a "multihop" feature, meaning the VPN destination your ISP sees and the VPN endpoint the end service sees differ.
I'm not sure how that protects you though. ISP sees your traffic going into WG1. They know all of Mulvad's IPs, so isn't it just as easy to correlate that traffic when you exit through WG2?
Assuming the ISP monitors the entire network graph (your computer, the VPN server's activity, and the end service's server), you wouldn't. At that point, it's game over unless you're using mixnets or something.
If they merely monitor your computer and the end service, the correlation weakens a little with plausible deniability.
The real win is when the ISP adversary is monitoring your computer and the WG servers and NOT the end service. In that case, say they see you go to WG1, and then they see WG1 going to an end service. This is also correlation, and pretty undeniable. But say they see you go to WG1, then they see WG1 go to WG2, and they have no visibility of WG2's traffic. Then the tracking's broken; the footprints run off into the surf.
So multiple hops buy you defense in depth assuming it eventually gets you outside your adversary's monitoring range.
> VPNs, no matter how secure they themselves are, are effective for accessing lightly geo-locked content and defeating unsophisticated analytics and tracking
Circling back to this statement: aren't they also useful on public Wifi?
the reason the uk wants an encryption backdoor is because it's expensive to do statistical analysis of encrypted traffic. there's ways to make it more difficult, but if you own the certificate that a tls endpoint uses you can just open it and reencrypt it for the destination. this is called break and inspect. if a vpn uses different certificates and is built well, there would have to be a flaw (spyware, vulnerability, etc) on one of the endpoints for anyone other than you and the vpn to read the encrypted data.
Why would they even do so ?
Large ISPs are public, so this activity would appear as extra revenue (if they sell traffic data) in their financial reports and annual reports.
The most likely is that ISPs are just respecting the local laws, and doing the minimum retention as required by the law (because more data storage = more costs),
and that their actual fear is that someone leaks this data and causes reputation damage, so they'd avoid storing anything if they can.
ISPs are also in the business of analytics [1, 2], and a significant percentage of customers hiding their traffic reduces the value of their analytic products.
This view is extremely western, not all ISPs are obligated to show "financial reports", and "shady analytics" does not imply a user's complete network traffic record into perpetuity. And even if your arguments were valid, this is not limited to the ISPs financial gain, but surveillance which occurs in every country.
At some point of paranoia people should really look into selfhosting a VPN service. Sure, your VPS provider can see one side of the traffic so its not bullet proof, but that can be mitigated.
Mullvad is a nice middle ground for those who don't see that as worth their time or don't know how. Its good to see they're at the very least trying to keep up appearances.
I doubt that's the better way. How is self-hosting helping with the paranoia vs. using Mullvad?
I don't really see how it's more secure to run some software that you haven't audited on a VPS somewhere at a provider you haven't audited. I'd trust a company with resources to run their own hardware, investing into a more secure setup [1] and contributing to more open infrastructure [2] much more than I trust myself to run something securely which isn't my sole occupation.
Self-hosting also makes you vulnerable to the network hosting you (not only the hosting server itself, but also the internet transit provider) and of course the website you are visiting, as you are the only user from that source IP (rendering a VPN practically useless).
Can do various mixing and matching if you have more than one VPS. Again, it rearranges rather than removing the vulnerabilities, and it's pure window dressing against an organised, financed actor.
I've done this as an intellectual challenge more than anything else.
I do this, mostly for the static IP that isn't linked directly to me and my approximate location, with mullvad exit only for 'sensitive' stuff. The degree of separation is nice even if the breadcrumbs are there. Best if the VPS allows crypto or cash payments.
But what if there's a missing piece of the puzzle that the author and devs missed or assumed o3 covered, but in fact was out of o3's context, that would invalidate this vulnerability?
I'm not saying there is, nor am I going to take the time to do the author's work for them, rather I am saying this report is not fully validated which feels like a dangerous precedent to set with what will likely be an influential blog post in the LLM VR space moving forward.
IMO the idea of PoC || GTFO should be applied more strictly than ever before to any vulnerability report generated by a model.
The underlying perspective that o3 is much better than previous or other current models still remains, and the methodology is still interesting. I understand the desire and need to get people to focus on something by wording it a specific way, it's the clickbait problem. But dammit, do better. Build a PoC and validate your claims, don't be lazy. If you're going to write a blog post that might influence how vulnerability researchers conduct their research, you should promote validation and not theoretical assumption. The alternative is the proliferation of ignorance through false-but-seemingly-true reporting, versus deepening the community's understanding of a system through vetted and provable reports.