Wasi co-chair and Wasmtime maintainer here: we agree! Wasi Preview 1, which this article is about, was a first attempt at porting some of these Unix ideas to Wasm. We found pretty quickly that unix isn't the right abstraction for Wasm. Not only is it not really portable to platforms like Windows without reinventing a compatibility layer like cygwin, it also doesn't really make sense in a Web embedding, where users end up implementing something like a unix kernel in Javascript.
Wasi Preview 2, which we are aiming to launch by the end of the year, rebases Wasi on the Component Model proposal, which enables composition of Wasm programs, including those which are written in different languages, and which do not trust each other. Wasi is now specified in the Wit IDL, which has a strong type system for representing records, variants, lists, strings, and best of all, external resources, including sugar for constructors, methods, and destructors.
Instead of basing everything on the filesystem abstraction, the core Wasi primitives are the `input-stream`, `output-stream`, and `pollable` resource types, for readable and writable bytestreams, and a pseudo-future: you can `poll-oneoff` on a `list<pollable>` and it will block until one is ready, and return a `list<bool>` indicating the set which are ready. `wasi:filesystem/types.{descriptor}` is the resource for files, but if you need to read, write, or append to a file, you can do so by calling a method on `descriptor` that returns a `input-stream` or `output-stream`.
Preview 2 is also adding networking: wasi-sockets for platforms which support sockets, and wasi-http for those which don't, like the Web.
I've found algorithmic (I prefer this to "automatic") differentiation strangely slippery to explain, given the concept is essentially rather simple. I think the main reason for this is the way people often think of what the "derivative" of a function is.
In high school you're taught that the derivative of the function f(x) = x^2 is the function f'(x) = 2x. That is, the derivative of the function is another function that computes its gradient. Algorithmic differentation is very confusing if you think in these terms.
When learning multivariable calculus, and when getting your head around AD, it's better to think of the derivative as a linear approximation of the original function around a particular point. In the single-variable case that means we interpret the derivative not as (a function that gives you) the gradient of the tangent line at a point, but as the tangent line itself at a particular point.
In the case of f(x) = x^2, then, the derivative at x=3 is the tangent line to the curve at the point x=3, y=9. It's best to define this in terms of offsets from the point where we're evaluating the derivative, so (y - 9) = 2 * (x - 3).
Key points to note:
1. This derivative is a linear function. Specifically, in this case in defining the tangent line we get y-9 as a linear function of x-3. (Incidentally an alternative and perhaps more obvious representation of this tangent line would give y in terms of x, i.e. y = 2x+3. This is a less useful representation though because y is not a linear function of x, it's merely an affine function - i.e. the line doesn't pass through the origin.)
2. The linear function we get depends on which value of x we evaluate it for. We have a different derivative at each point.
3. When we evaluate this linear function we must give it an offset, a value of (x-3) for which to compute the corresponding value of (y-9).[1]
This concept of derivative generalises naturally to multiple input and output variables. The derivative is still a linear function that approximates the original function around a particular (multidimensional) point. Being a linear function it can be represented as a matrix: the Jacobian matrix. It's worth nothing, though, that the Jacobian matrix is merely a representation of the linear function we are calling the derivative, and it is far from the only one possible. In a computer we can (and usually do in AD) represent the derivative as a (computer) function that takes an offset and returns an offset, and make a call to it to evaluate it rather than ever forming a matrix.
With this way of looking at things, it's relatively easy to understand AD. The key insight is that derivatives of functions compose in the same way as the original functions. If we compose two functions and we know their individual derivatives (also functions), we can evaluate the derivative of the composition by composing the two derivatives in the same way. More generally, if you have a program consisting of a call graph of numeric functions and you can make derivative functions that correspond to those individual functions they will form a call graph with the same structure, allowing you to evaluate the derivative function of the entire program. This is forward mode AD.[2]
This is related to the chain rule but in AD calling it that somewhat obscures the point and hides its intuitive obviousness. We shouldn't be surprised that derivatives compose in the same way as the original functions given they are linear approximations of those original functions!
Anyway, personally I found this to be the insight that let me get my head round AD. I hope it helps someone!
[1] This input exists in AD code when we evaluate the derivative, and its meaning is often very unclear to beginners. In forward mode AD we actually evaluate this linear approximation function. That is, we choose a point around which to linearly approximate and an offset from that point for which to evaluate the derivative function. See [2] for the purpose we use that offset for.
[2] Adjoint/reverse mode uses a different linear function which composes the opposite way round to the original function, making it less convenient to compute but bringing big computational complexity benefits in typical real-world cases where you have a small number of inputs but a large number of outputs. In forward mode a single evaluation of the graph of derivatives can give you the sensitivities of a single output to each of the inputs. If we want the sensitivities of all the outputs (i.e. to compute the entire Jacobian) we need one evaluation for each output. The purpose we use the offset discussed in [1] for is selecting which output we're currently computing the sensitivities for. In adjoint mode this is reversed, and we need to perform one evaluation of the graph for each input, of which there are usually many fewer than outputs.
Moved from self-employed for the odd side gig, to UK LTD co, to VAT registered UK LTD co.
The "business services mafia" will like to frighten you into believing you have to drop hundreds of £ on their services - plus £120/yr for an accounts package, more for payroll, business banking, etc.
Because your mind is too tiny to comprehend the ways of the priests. e.g: Don't ever ask a question in a business forum, you'll get swamped with the clergy telling you the question implies you're definitely going to prison because you're going to get it all wrong.
Do not believe them.
It is perfectly possible - perhaps even 'easy' - and if you understand the basics of double-entry accounting, all entirely learnable without too much effort. And best of all, you can do it almost for free!
Don't drop hundreds per year on Xero, use https://www.quickfile.co.uk. Cost: £0 (upsell on receipt scanning, but I don't use that). Does VAT, with electronic submission. I literally don't understand why you'd spend money on Xero - the extent to which it's advertised over here is insane; you're just paying for TV advertising.
Business banking: Tide. 20p/per transaction (no monthly fees!)
The hardest parts are
- discipline of entering the transactions and not letting them build up
- in the UK, understanding capital allowances vs depreciation
- when doing payroll how to split the payments into the correct accounts (shape does the sums, you have to account for it correctly)
- Make sure you pay all your HMRC bills
- Doing the year end as a micro-entity (the CT600 corporation tax return) is actually surprisingly easy, and amounts to putting numbers in about 6 boxes.
By all means pay for all of this stuff if it isn't of interest to you. For me, as I intend my co to run long after I finish working, I didn't want to have annual fees for services that I really wouldn't be getting the use out of.
Wasi Preview 2, which we are aiming to launch by the end of the year, rebases Wasi on the Component Model proposal, which enables composition of Wasm programs, including those which are written in different languages, and which do not trust each other. Wasi is now specified in the Wit IDL, which has a strong type system for representing records, variants, lists, strings, and best of all, external resources, including sugar for constructors, methods, and destructors.
Instead of basing everything on the filesystem abstraction, the core Wasi primitives are the `input-stream`, `output-stream`, and `pollable` resource types, for readable and writable bytestreams, and a pseudo-future: you can `poll-oneoff` on a `list<pollable>` and it will block until one is ready, and return a `list<bool>` indicating the set which are ready. `wasi:filesystem/types.{descriptor}` is the resource for files, but if you need to read, write, or append to a file, you can do so by calling a method on `descriptor` that returns a `input-stream` or `output-stream`.
Preview 2 is also adding networking: wasi-sockets for platforms which support sockets, and wasi-http for those which don't, like the Web.
We are closing in on shipping Wasi Preview 2 but its not quite fully baked yet - changes related to resources are slated to land in the net few weeks. The spec definitions are on github: https://github.com/WebAssembly/wasi-io/blob/main/wit/streams... https://github.com/WebAssembly/wasi-filesystem/blob/main/wit... . Stay tuned for much more approachable documentation, tutorials, and so on, once we are confident it is a stable target ready for users.