I imagine that one of the points of a solid protocol buffers library would be to align the types even across programming languages. E.g. explicitly force a 64-bit integer rather than "int" relying on the platform. And to have some custom "string" type which is always UTF-8 encoded in memory rather than depending on the platform-specific encoding.
(I have no idea if that is the case with protobuf, I don't have enough experience with it.)
Again, the problem has more to do with the programming languages themselves, rather than with protobufs or parsing.
Protobuf has both signed and unsigned integers - the initial use case was C++ <-> C++ communication
Java doesn't have unsigned integers
Python has arbitrary precision integers
JavaScript traditionally only had doubles, which means it can represent integers up to 53 bit exactly. It has since added arbitrary size integers -- but that doesn't mean that the protobuf libraries actually use them
---
These aren't the only possibilities -- every language is fundamentally different
As long as a language has bytes and arrays, you can implement anything on top of them, like unsigned integers, 8-bit strings, UTF-8 strings, UCS-2, whatever you want. Sure it won't be native types, so it will probably be slower and could have an awkward memory layout, but it's possible
Granted, if a language is so gimped that it doesn't even have integers (as you mentioned JavaScript), then that language will not be able to fully support it indeed.
Unfortunately that doesn't solve the problem -- it only pushes it around
I recommend writing a protobuf generator for your favorite language. The less it looks like C++, the more hard decisions you'll have to make
If you try your approach, you'll feel the "tax" when interacting with idiomatic code, and then likely make the opposite decision
---
Re: "so gimped" --> this tends to be what protobuf API design discussion are like. Users of certain languages can't imagine the viewpoints of users of other languages
e.g. is unsigned vs. signed the way the world is? Or an implementation detail.
And it's a problem to be MORE expressive than C/C++ -- i.e. from idiomatic Python code, the protobuf data model also causes a problem
Even within C/C++, there is more than one dialect -- C++ 03 versus C++ 11 with smart pointers (and probably more in the future). These styles correspond to the protobuf v1 and protobuf v2 APIs
(I used both protobuf v1 and protobuf v2 for many years, and did a design review for the protobuf v3 Python API)
In other words, protobufs aren't magic; they're another form of parsing, combined with code generation, which solve some technical problems, and not others. They also don't resolve arguments about parsing and serialization!
> you're guaranteed a consistent ser/de experience
Are there that many implementations of protobuf? How many just wrap the C lib and proto compiler? Consistency can be caused by an underlying monoculture, although that's turtles all the way down because protobuf is not YAML is not JSON, etc.
Off in the weeds already, and all because I implemented a pure Python deserializer / dissector simply because there wasn't one.
I think you can get similar benefits here from writing an RPC style JSON API into an OpenAPI spec and generating structs and route handlers from that. That's what I do for most of my Go projects anyway.
But since the article isn't really about parser bugs, I don't think using a different data format will save you from most of the problems described there.
I peered down the ComfyUI rabbit hole [1] and it is shockingly powerful. Did Adobe drop the ball on image generation? What are they doing over there? There has to be a better, more secure way to bundle up all this imagegen logic.
Adobe makes practical pipelines for creatives, not prototyping tools. ComfyUI is mostly for prototyping and ML nerds (I don't mean this in a bad way). There are more practical interfaces to get things done built on top of it, such as Krita Diffusion [1] and many others.
I would say that the "more secure way" is to just use ComfyUI without installing any obscure nodes from unknown developers. You can do pretty much anything using just the default nodes and the big node packs.
Back in December, I ran a DCF on https://www.gurufocus.com/dcf-calculator?ticker=NVDA and got a fair value estimate of around $700 based on 25%/yr EPS growth for 10 years, followed by 15 years of 5% growth and a discount rate of 8-10% (that's based on the expected rate of return you can get from the market long term).
AI stocks are like crypto in 2021. It's getting manic. The media makes it seem like companies and governments are begging to buy GPUs like there's no tomorrow. Hardware is ultimately a commodity. It's only a matter of time before supply meets demand. There's no way giant tech companies with their own ASICs will continue buying so many Nvidia GPUs.
giant tech companies are building their own ASICs... but mostly for inference. And that's the easy part.
What the market is quickly coming to realize is that actually there is no "80-20 solution" here, raw matrix math is not enough, you pretty much need the flexibility of a general-purpose GPU to do training. And there's really only a few companies on the planet in any sort of position to move into that market - NVIDIA, AMD, Intel, Tesla (Jim Keller designed Dojo for them a few years ago), and maybe a few others.
In short - GPGPU is the 20% solution already. And that's the same mistake AMD has made over and over in the GPU market as a whole too - thinking that NVIDIA obviously must be doing it wastefully and there is some cheap solution they can slide in and cover most of the benefit at less cost. And NVIDIA has usually done the math (every mm2 of silicon is tens of millions of dollars of profit lost) and is in fact not being wasteful, so the competitor comes in with a crappy solution instead of a "cheap and cheerful" one. That is what will happen with training too, for everyone besides maybe AMD (MI300X looks good) and Intel.
Also, because of the long-term ecosystem-building that NVIDIA has done, when the shit hits the fan, everyone falls back to their products. When the next major innovation comes out, it will come out on NVIDIA GPUs first, and then the market will spend another year scrambling to catch up all over again. So yeah, during the market-equilibrium periods they will make less... and then all hell breaks loose and they will be showered in money again.
GPU is just a much tougher nut to crack than it appears at first glance. It's ecosystem, ecosystem, ecosystem, but you also have to have very competitive performance (otherwise the cost doesn't work out) and power consumption etc. And NVIDIA is the first to market by 15 years (AMD still doesn't have ROCm to the starting line, as geohotz recently pointed out again) and continues to be the best and easiest. All you have to do is buy a 4060 Ti or 4090 and start playing. The competition being 10% cheaper doesn't matter if it doesn't work.
Just to have the assurance that, regardless of programming language, you're guaranteed a consistent ser/de experience.