> Going fast enough sideways so you stay up there is the tricky bit.
nah, thats the simple part. getting up there efficiently is the difficulty. once we're up, its just a matter of force over time to create a nice orbit.
The faster you go, the more friction you face, and the more heat and vibration your equipment must endure.
Going slower reduce friction and stress but use more energy just negating gravity. Slow rocket is inefficient rocket.
So we wanna leave the atmosphere as soon as possible, but not so fast that the rocket melts or engines collapse. Prefferably just below the sound barrier.
once we're up, its pretty chill... until you wanna go down again. Slow rocket is alive rocket.
We had this question come up frequently during our fundraise.
Our customers' risk profile is such that having the model provider also be the source of truth for model performance is objectionable. There's value to having an independent third party that ensures their AI is doing what they intend it to, especially if that software is on-prem.
On the credit point, that's not necessarily what we're after in these deployments. This is a happy alignment of relatively esoteric research that personally excited me and a real business problem around the non-deterministic nature of GenAI. Our customers typically come to us with a need to solve that for one reason or another.
There is a third option besides replacing your window manager with EXWM or a simpler tiling window manager: to manage desktop windows from within Emacs using your existing X11 window manager or Wayland compositor. This means
- you can position and resize all desktop windows,
- you can switch between Emacs and desktop windows by moving to the left, right, up, down window and
- you can switch back and forth between a named desktop app like Firefox, okular etc. and Emacs.
You need to install just the Emacs package Emacs Desktop Window Manager (dwin) https://github.com/lsth/dwin, for example from MELPA. Currently it works with X11 window managers as well as with KDE/KWin on Wayland or X11 (using xdotool and kdotool, resp.). I am using it all day myself on KDE/KWin Wayland in my standard setup and there it works fine.
I think people are still fooling themselves about the relevance of 3GL languages in an AI dominated future.
It is similar to how Assembly developers thought about their relevance until optimising compilers backends turned that into a niche activity.
It is a matter of time, maybe a decade who knows, until we can produce executables directly from AI systems.
Most likely we will still need some kind of formalisation tools to tame natural language uncertainties, however most certainly they won't be Python/Rust like.
We are moving into another abstraction layer, closer to the 4GL, CASE tooling dreams.
It is the first model to get partial-credit on an LLM image test I have. Which is counting the legs of a dog. Specifically, a dog with 5 legs. This is a wild test, because LLMs get really pushy and insistent that the dog only has 4 legs.
In fact GPT5 wrote an edge detection script to see where "golden dog feet" met "bright green grass" to prove to me that there were only 4 legs. The script found 5, and GPT-5 then said it was a bug, and adjusted the script sensitivity so it only located 4, lol.
Anyway, Gemini 3, while still being unable to count the legs first try, did identify "male anatomy" (it's own words) also visible in the picture. The 5th leg was approximately where you could expect a well endowed dog to have a "5th leg".
That aside though, I still wouldn't call it particularly impressive.
As a note, Meta's image slicer correctly highlighted all 5 legs without a hitch. Maybe not quite a transformer, but interesting that it could properly interpret "dog leg" and ID them. Also the dog with many legs (I have a few of them) all had there extra legs added by nano-banana.
This vulnerability is basically the worst-case version of what people have been warning about since RSC/server actions were introduced.
The server was deserializing untrusted input from the client directly into module+export name lookups, and then invoking whatever the client asked for (without verifying that metadata.name was an own property).
return moduleExports[metadata.name]
We can patch hasOwnProperty and tighten the deserializer, but there is deeper issue. React never really acknowledged that it was building an RPC layer. If you look at actual RPC frameworks like gPRC or even old school SOAP, they all start with schemas, explicit service definitions and a bunch of tooling to prevent boundary confusion. React went the opposite way: the API surface is whatever your bundler can see, and the endpoint is whatever the client asks for.
My guess is this won't be the last time we see security fallout from that design choice. Not because React is sloppy, but because it’s trying to solve a problem category that traditionally requires explicitness, not magic.
> Anthropic occupies a peculiar position in the AI landscape: a company that genuinely believes it might be building one of the most transformative and potentially dangerous technologies in human history, yet presses forward anyway. This isn't cognitive dissonance but rather a calculated bet—if powerful AI is coming regardless, Anthropic believes it's better to have safety-focused labs at the frontier than to cede that ground to developers less focused on safety (see our core views).
Ah, yes, safety, because what is more safe than to help DoD/Palantir kill people[1]?
No, the real risk here is that this technology is going to be kept behind closed doors, and monopolized by the rich and powerful, while us scrubs will only get limited access to a lobotomized and heavily censored version of it, if at all.
Apple has a video understanding model too. I can't wait to find out what accessibility stuff they'll do with the models. As a blind person, AI has changed my life.
This is the real cause. At the enterprise level, trust outweighs cost. My company hires agencies and consultants who provide the same advice as our internal team; this is not to imply that our internal team is incorrect; rather, there is credibility that if something goes wrong, the decision consequences can be shifted, and there is a reason why companies continue to hire the same four consulting firms. It's trust, whether it's real or perceived.
The best advice I think I've heard so far against procrastination is: First, find a small sub-task of the thing you want to do, and lay it out clearly. Then commit a small amount of time to starting on that sub-task. Not on finishing it, just on starting it. Usually once you started the thing, you will have no trouble seeing it through in time. Rinse, repeat.
Chain-of-code is better than chain-of-thought because it's more grounded, more specific, and achieves a lot of useful compression. But my bet is that the proposed program-of-thought is too specific. Moving all the way from "very fuzzy specification" to "very concrete code" skips all of the space in the middle, and now there's no room to iterate without a) burning lots of tokens and b) getting bogged down in finding and fixing whatever new errors are introduced in the translated representations. IOW, when there's an error, will it be in the code itself or in the scenario that code was supposed to be representing?
I think the intuition that lots of people jumped to early about how "specs are the new code" was always correct, but at the same time it was absolutely nuts to think that specs can be represented in good ways with natural language and bullet-lists in markdown. We need chain-of-spec that's leveraging something semi-formal and then iterating on that representation, probably with feedback from other layers. Natural-language provides constraints, guess-and-check code generation is sort at the implementation level, but neither are actually the specification which is the heart of the issue. A perfect intermediate language will probably end up being something pretty familiar that leverages and/or combines existing formal methods from model-checkers, logic, games, discrete simulations, graphs, UML, etc. Why? It's just very hard to beat this stuff for compression, and this is what all the "context compaction" things are really groping towards anyway. See also the wisdom about "programming is theory building" and so on.
I think if/when something like that starts getting really useful you probably won't hear much about it, and there won't be a lot of talk about the success of hybrid-systems and LLMs+symbolics. Industry giants would have a huge vested interest in keeping the useful intermediate representation/languages a secret-sauce. Why? Well, they can pretend they are still doing something semi-magical with scale and sufficiently deep chain-of-thought and bill for extra tokens. That would tend to preserve the appearance of a big-data and big-computing moat for training and inference even if it is gradually drying up.
There's an entire semi-formal language on promises, called promise theory. This includes promises autonomous agents (humans, back when this was conceived) make for other autonomous agents. Promise Theory was the basis for CFEngine, which spawned Puppet and Chef, but it's applicability is much broader. The kind of promises examined within this article can be described and analyzed by promise theory.
The central insight is understanding that promises are not obligations, and why and how that matters. From there, interesting things can be analyzed -- using types and contracts in a development team, unit tests, integration tests, specs, user interface and user experience, compliance, signaling, APIs, etc.
I think it is particularly useful now in the age of LLMs, agenic AIs, and autonomous robots that have to navigate spaces shared with humans.
There already is a pretty major effort around the prolog community to build everything as much as possible around pure, monotonic prolog, and to provide a means to support multiple search strategies depending on the best fit for the problem. CLP libraries are also pretty common and the go-to for representing algebraic expressions relationally and declaratively.
I wouldn't say that the logic or relational way of describing effects is a bad thing either. By design it allows for multiple return values (foo/1, foo/2, ...) you can build higher level predicates that return multiple resources, which is pretty common for many programs. It makes concatenative (compositional) style programming really straightforward, especially for more complex interweaving, which also ends up being quite common. Many prolog implementations also support shift/reset, so that you can easily build things like conditions and restarts, algebraic effects, and/or debugging facilities on top. Prolog is also homoiconic in a unique way compared to lisp, and it's quite nice because the pattern matching is so powerful. Prolog really is one of the best languages I ever learned, I wish it was more popular. I think prolog implementations need a better C FFI interop and a nicer library ecosystem. Trealla has a good C FFI.
I think logic programming is the future, and a lot of these problems with prolog are fixable. If it's not going to be prolog, it'll probably be something like kanren and datalog within a lisp like scheme or clojure(script).
Who would have thought that having access to the whole system can be used to bypass some artificial check.
There are tools for that, sandboxing, chroots, etc... but that requires engineering and it slows GTM, so it's a no-go.
No, local models won't help you here, unless you block them from the internet or setup a firewall for outbound traffic. EDIT: they did, but left a site that enables arbitrary redirects in the default config.
Fundamentally, with LLMs you can't separate instructions from data, which is the root cause for 99% of vulnerabilities.
Security is hard man, excellent article, thoroughly enjoyed.
That's also why I don't use these tools that much. You have big AI companies, known for harvesting humongous amount of data, illegally, not disclosing datasets. And they you give them control of your computer, without any way to cleanly audit what's going in and out. It's seriously insane to me that most developers seem to not care about that. Like, we've all been educated to not push any critical info to a server (private key and other secrets), but these tools do just that, and you can't even trust what it's gonna be used for. On top of that, it's also giving your only value (writing good code) to a third party company that will steal it to replace you with it.
which I think stands up just fine against pretty much any other classical piece baroque or not.
Personally I have a very big soft spot for his organ works, as I play (badly) some organ myself, and among those I don't see the trio sonatas recommended nearly often enough (here is a live recital of all of them, which is super impressive)
among those I probably enjoy the most the vivace of BWV 530. Other favorite pieces are the passacaglia and fugue https://www.youtube.com/watch?v=nVoFLM_BDgs the toccata adagio and fugue in C major https://www.youtube.com/watch?v=Klh9GiWMc9U (the adagio especially is super nice), but there's so many. Among organists I often come back to Helmut Walcha, and am always amazed at how he was able to learn everything just by listening, him being blind.
I'm not the GP but I can recommend Bach's Partita in D minor, said to have been composed after returning from travel to find that his wife had died and been buried in his absence.
Brahms said of it: "On one stave, for a small instrument, the man writes a whole world of the deepest thoughts and most powerful feelings. If I imagined that I could have created, even conceived the piece, I am quite certain that the excess of excitement and earth-shattering experience would have driven me out of my mind."
I think that's true about Bach's instrumental music, but his big sacred works like his Passions and the Mass in B minor are as "romantic" as the Baroque period gets. Like OP, I think of these works as basically the pinnacle of human artistic achievement. They somehow have all the nuance and complexity you're referring to -- while also telling a deeply emotional story, and just being heart-wrenchingly beautiful even if you don't know the story.
Bach is the greatest composer and perhaps the greatest artist in human history. Full stop. He is able to condense so much complexity into his works, and he speaks to the heart as equally as he speaks to the intellect. He is proof that the mind and the heart do not have to be at cross purposes, but can be wholly engaged together when stimulated by sublime works of art.
I lead a team on a large data project at an enormous bank, hundreds of devs on the project across 3 continents. My team took care of the integration and automation of the sdlc process. We moved from several generations of ETL applications (9 applications) netezza/teradata/mainframes/hive map reduce all to spark. The project was a huge cost savings and great success. Massive risk reduction by getting these systems all under 1 roof. We found a lot of issues with the original data. We automated the lineage generation, data quality, data integrity, etc. We developed a frame work that made everything batteries included. Transformations were done as linear set of SQL steps or a DAG of sql steps if required. You could do more complicated things in reusable plugins if needed. We had a rock solid old school scheduler application also. We had thousands of these jobs. We had an automated data comparison tool that cataloged old data and ran the original code vs the new code on the same set of data. I don't think it's impossible to pull off but it was a hard project for sure. Grew my career a ton.
To me, enterprise low code feels like the latest iteration of the impetus that birthed COBOL, the idea that we need to build tools for these business people because the high octane stuff is too confusing for them. But they are going the wrong way about it; we shouldn't kiddie proof our dev tools to make them understandable to mere mortals, but instead we should make our dev tools understandable enough so that devs don't have to be geniuses to use them. Given the right tools I've seen middle schoolers code sophisticated distributed algorithms that grad students struggle with, so I'm very skeptical that this dilemma isn't self-imposed.
The thing about LLMs being only good with text is it's a self-fulfilling prophecy. We started writing text in a buffer because it was all we could do. Then we built tools to make that easier so all the tooling was text based. Then we produced a mountain of text-based code. Then we trained the AI on the text because that's what we had enough of to make it work, so of course that's what it's good at. Generative AI also seems to be good at art, because we have enough of that lying around to train on as well.
This is a repeat of what Seymour Papert realized when computers were introduced to classrooms around the 80s: instead of using the full interactive and multimodal capabilities of computers to teach in dynamic ways, teachers were using them just as "digital chalkboards" to teach the same topics in the same ways they had before. Why? Because that's what all the lessons were optimized for, because chalkboards were the tool that was there, because a desk, a ruler, paper, and pencil were all students had. So the lessons focused around what students could express on paper and what teachers could express on a chalk board (mostly times tables and 2d geometry).
And that's what I mean by "investment", because it's going to take a lot more than a VC writing a check to explore that design space. You've really gotta uproot the entire tree and plant a new one if you want to see what would have grown if we weren't just limited to text buffers from the start. The best we can get is "enterprise low code" because every effort has to come with an expected ROI in 18 months, so the best story anyone can sell to convince people to open their wallets is "these corpos will probably buy our thing".
I was greatly inspired by his work. After getting enough skills, I even built my own IDE with live coding and time traveling.
Its practical use is questionable, and it seems like nobody is really interested in such tools.
> The problem with Postgres' NOTIFY is that all notifications go through a single queue!
> Even if you have 20 database connections making 20 transactions in parallel, all of them need to wait for their turn to lock the notification queue, add their notification, and unlock the queue again. This creates a bottleneck especially in high-throughput databases.
If you have any experiences of actual workload where you are currently experiencing performance/scalability problems, I would be interested in hearing from you, to better understand the actual workload. In some workloads, you might only listen to a single channel. For such single-channel workloads, the current implementation seems hard to tweak further, given the semantics and in-commit-order guarantees. However, for multi-channel workloads, we could do a lot better, which is what the linked patch is about. The main problem with the current implementation for multi-channel workloads, is that we currently signal and wake all listening backends (a backend is the PostgreSQL processes your client is connected to), even if they are not interested in the specific channels being notified in the current commit. This means that if you have 100 connections open in which each connect client has made a LISTEN on a different channel, then when someone does a NOTIFY on one of those channels, instead of just signaling the backend that listen on that channel, all 100 backends will be signaled. For multi-channel workloads, this could mean an enormous extra cost coming from the context-switching due to the signaling.
I would greatly appreciate if you could please reply to this comment and share your different workloads when you've had problems with LISTEN/NOTIFY, to better understand approximately how many listening backends you had, and how many channels you had, and the mix of volume on such channels. Anything that could help us do better realistic simulations of such workloads, to improve the benchmark tests we're working on. Thank you.
> If they were actually well trained on what was really bad, it would probably be a lot harder to unlearn.
That's not really how training works.
Here's the general problem. Stipulate that Ukraine is good and Russia is bad. Now suppose that you want it to help you do something. It doesn't even matter what it is. If you're Ukrainian it should help you and if you're Russian it shouldn't. But the answer that helps you do it doesn't depend on which one you are, and it has no way of knowing which one you are.
This is why alignment is nonsense. Technical questions only have accurate answers, not moral ones, and we don't even have a consistent set of morals to imbue it with to begin with.
Full disclosure, we have a contract with AMD to get Llama 405B training on MI350X on MLPerf.
Things are turning around for AMD. If you have an AMD card, go to pytorch.org, click Linux+ROCm and install PyTorch. 3 years ago, this was hopeless. Today, most mainline things work. I ran nanochat on MI300X and it just worked. I think that's true about MI350X now too. The MI350X machine is stable.
They are clearly behind NVIDIA, nobody doubts that. And a lot of investment into software will be required to catch up, ecosystem, compiler, and driver. But 2 years ago they seemed hopeless, now they don't. Things take time. HipKittens is a great codebase to study to see where AMD's LLVM backend is still lacking; compare it to the CUDA Kittens.
For training, it's NVIDIA and Google in first. AMD in second. And nobody in third. Intel and Tenstorrent are not remotely close. Huawei examples segfaulted. Groq gave up selling chips. Cerebras isn't available anywhere. Trainium had a 5 day wait time to get one instance and I lost interest.
The countries that comprise the EU had been among the biggest warmongers for centuries. The EU is the most successful peace project the continent has ever seen. And the reason for that is that every country refrained from trying to be a superpower on the continent.
The European mentality has real, tangible upsides for its continent. Unfortunately, it doesn’t work well in a larger world where other actors don’t share the same experience and values.
Yep I think the value of the experiment is not clear.
You want to use Spark for a large dataset with multiple stages. In this case, their I/O bandwidth is 1GB/s from S3. CPU memory bandwidth is 100-200GB/s for a multi-stage job. Spark is a way to pool memory for a large dataset with multiple stages, and use cluster-internal network bandwidth to do shuffling instead of storage.
Maybe when you have S3 as your backend, the storage bandwidth bottleneck doesn't show up in perf, but it sure does show up in the bill. A crude rule of thumb: network bandwidth is 20X storage, main memory bandwidth is 20X network bandwidth, accelerator/GPU memory is 10X CPU. It's great that single-node DuckDB/Polars are that good, but this is like racing a taxiing aircraft against motorbikes.
nah, thats the simple part. getting up there efficiently is the difficulty. once we're up, its just a matter of force over time to create a nice orbit.
The faster you go, the more friction you face, and the more heat and vibration your equipment must endure.
Going slower reduce friction and stress but use more energy just negating gravity. Slow rocket is inefficient rocket.
So we wanna leave the atmosphere as soon as possible, but not so fast that the rocket melts or engines collapse. Prefferably just below the sound barrier.
once we're up, its pretty chill... until you wanna go down again. Slow rocket is alive rocket.