More

fmap · on Feb 14, 2019

You're right. Classical measure theory lives in a model divorced from physical reality. You can show that all of the fancy counterexamples which necessitate the complicated constructions of measure theory are artificial (e.g., the characteristic function of a non-measurable set is uncomputable).

There are better approaches to measure theory which live in different "foundations". For example, you can build measure and probability theory based on the locale of valuations on a locale instead of a sigma-algebra on a topological space. You can do even better by starting in a constructive metatheory and adding some anti-classical assumptions which are modeled by all computable functions.

The reason we are teaching classical measure theory as the foundation of probability theory is historical and because there are no good expositions available for most alternative approaches. It is really not the most straightforward approach.

---

Before you accuse me of being overly negative: classical measure theory offers a consistent approach to probability theory which is well understood and for which carefully written textbooks are available. If you really need to go back to the definitions to derive something then you need to know at least one consistent set of definitions. So it is useful to teach measure theory, even if it is more complicated than it has to be...

psykotic · on Feb 14, 2019

>Classical measure theory lives in a model divorced from physical reality.

Everything in mathematics is divorced from reality. Unbounded integers are divorced from reality. (Once you move beyond naive realism, bounded integers are divorced from reality, but that's a deeper philosophical debate.) The only question is whether these models are more or less effective for their various theoretical and applied purposes.

> There are better approaches to measure theory which live in different "foundations". For example, you can build measure and probability theory based on the locale of valuations on a locale instead of a sigma-algebra on a topological space.

Better by what definition? According to the practical needs of students, pure and applied mathematicians, etc? I've studied some topos theory and know a little bit about locales from the Topology Via Logic book, but it's hard for me to see that as anything more than a fun curiosity when considering the practical needs of mathematics as a whole. In my mind that kind of thing is much closer to navel-gazing than something like measure theory.

> It is really not the most straightforward approach.

The onus is on critics to do better. Dieudonne/Bourbaki made a valiant and elegant attempt even if they intentionally snubbed the needs of probability theory. And "better" will obviously be judged by the broader community.

fmap · on Feb 14, 2019

> Everything in mathematics is divorced from reality.

Mathematics is an abstraction, but it is still useful for talking about concrete problems. Your mathematical assumptions can be either close or far away from your problem domain. Sometimes we introduce idealized objects, such as unbounded integers, in order to abstract further and simplify our reasoning.

These ideal objects can then either be "compiled away" in specific instances, or really do ignore corner cases which might invalidate your results.

For an example of the former, you can assume that there is an algebraically closed field containing a specific field, give an argument in terms of this closure and then translate this argument to one which does not construct the closure explicitly. The translation is mechanical and does not represent additional assumptions you made.

The second kind of ideal object is something like the real numbers applied to physics. We can think of a real number as an arbitrarily good approximate result. In practice we can only ever work with finite approximations. At the scales we are operating on the difference is usually not relevant, but there might, for example, be unstable equilibria in your solutions which are not physically realizable.

> Better by what definition?

Informally, better because it is "simpler". There are fewer corner cases to consider, theorems are more inclusive, constructions are more direct.

Formally, the theory has more models and is therefore more widely applicable. Theorems have fewer assumptions (but talk about a different and incompatible type of objects).

> The onus is on critics to do better. Dieudonne/Bourbaki made a valiant and elegant attempt even if they intentionally snubbed the needs of probability theory. And "better" will obviously be judged by the broader community.

Oh, sure, but that's not what I want to argue about.

I can tell you with certainty that classical measure theory is complicated by the interplay of excluded middle and the axiom of choice. This is a technical result. You can see this yourself in textbooks every time the author presents an intuitive "proof idea" which then has to be refined because of problems with the definitions. In alternative models, or in a metatheory with alternative assumptions, the simple proof idea usually works out fine.

amvalo · on Feb 14, 2019

> Everything in mathematics is divorced from reality. Unbounded integers are divorced from reality. (Once you move beyond naive realism, bounded integers are divorced from reality, but that's a deeper philosophical debate.) The only question is whether these models are more or less effective for their various theoretical and applied purposes.

It's a matter of degree. You can't point to a particular large integer as being "too large" to matter in real life, but there are plenty of objects in measure theory (like unmeasurable sets) that blatantly violate physical intuitions and don't seem to exist in any sense in real life.

fmap · on Jan 31, 2019

You may be right that it is common to dismiss arguments that invoke Gödel's theorem. I've been guilty of this myself. However, just like with quantum mechanics the reason is that there are just so many people who invoke Gödel's theorem without understanding it as a technical result.

I'm not talking about "pseudointellectuals" in this case, unlike with quantum mechanics. I'm talking about actual working mathematicians who use oblique references to Gödel to, e.g., dismiss formal methods or constructive logic as worthless.

This is extremely annoying, since Gödel's theorem's, as the article rightfully points out, are precise statements about formal systems. There's nothing mysterious about any of it. When someone invokes Gödel you can usually take it to mean "I don't want to engage with this topic further" and not as a serious argument.

> This is the man who "proved" God's existence with modal logic,

I'm pretty sure that was a joke. The final assumption he makes in this proof is logically equivalent to "god exists".

jeromebaek · on Jan 31, 2019

What final "assumption" are you referring to, a definition or an axiom?

fmap · on Jan 31, 2019

I'm referring to the axiom that being godlike is a positive property. You can show that being godlike is a positive property iff god exists.

fmap · on Jan 22, 2019

There is actually a neat way around this by using typed assembly language/proof carrying code. If you had a type preserving compiler down to machine code you could use a separate proof checker (which you presumably wrote by hand in machine code) to convince yourself that the resulting binary implements its spec. :)

The spec of a compiler is also relatively simple, so there isn't a lot of room for backdoors there.

fmap · on Jan 22, 2019

It's not misleading in this case. It's used in practice.

You have to understand that CompCert's main competition in this space was an ancient version of GCC without any optimizations. This is mostly an issue of certification. CompCert got the same certification and is already a great improvement just by virtue of having a register allocator...

jng · on Jan 23, 2019

Good to know, thanks for explaining.

A register allocator! :)

fmap · on Jan 22, 2019

There's a separate project that solves this problem (CertiCoq https://www.cs.princeton.edu/~appel/certicoq/). It's making progress, but these things take time. :)

fmap · on Jan 22, 2019

> Before we prove our code is correct, we need to know what is “correct”. This means having some form of specification

I suspect that most bugs are introduced because programmers do not have a clear idea of what their code is supposed to do. There are simply too many levels to keep track of at the same time. E.g., in C you are manually keeping track of resource ownership and data representation at the low-level, while at the same time keeping the overarching architecture present in your head. The latter typically has several moving pieces (different threads, processes, machines) and keeping a clear picture of all possible interactions is a nightmare.

Formal specifications can help with this, by first specifying the full system and then deriving precise local (i.e. modular) specifications. In my experience, programming is typically easy once I have a problem broken down into isolated pieces with clear requirements. It's just that most systems are so large that getting to this point takes serious effort.

Here you run into a cultural problem. You need to convince people to put effort into something that they do not implicitly regard as valuable. It makes sense to me that people aren't using formal methods, unless there is very little friction when getting started.

fmap · on Jan 22, 2019

> So what exactly is the standard we are working to? Is proof reasonably superior to more effort testing?

Empirically, yes it is (see e.g. "Finding and understanding bugs in C compilers"). Testing can show the presence of bugs, proofs show the absence of (a class of) bugs. This is a worthwhile exercise, even if your model is not perfect.

This is not an all-or-nothing proposition. There are lightweight forms of formal specification and proofs which you are probably already using. One example are types. If your program is well-typed it might still crash with a division by zero or some other runtime error, but it will not crash because you tried to execute code at address 42 after you mixed up your integers and code pointers.

fmap · on Jan 21, 2019

> All formalisations of non-classical logic operate in the framework of first-order logic, in the sense that the informal meta-language in which the non-classical logics are explained in is traditional first-order logic

What gave you this idea? It is frequently useful to work over different base logics to obtain models of non-classical logics. E.g., when you work in categorical logic you usually work over an intuitionistic logic which can then be interpreted in an arbitrary topos. This gives you the ability to internalize any constructions you do "on the outside".

There is a large community of mathematicians fixated on classical first-order logic, but this is just because of tradition. For models of classical first-order set theory it just doesn't make much of a difference. This is not true of "all formalizations of non-classical logic".

YorkshireSeason · on Jan 21, 2019

> work over an intuitionistic logic

Whoops, yes, you are right.

That said, a lot of the development of category theory that I'm aware of, is taking place with set theory. Certainly the two books [1, 2] that I learnt category theory from do. But that was a while back, and it might be the case that this is different now, I have not kept up with this research direction. For example is there a full, unrestricted formalisation of category theory in HoTT?

[1] J. Adamek, H. Herrlich, G. E. Strecker, Abstract and Concrete Categories: The Joy of Cats.

[2] S. Mac Lane, Categories for the Working Mathematician.

fmap · on Jan 22, 2019

Textbooks are usually supposed to be accessible to a wide audience so it makes sense when discussing foundations to start from a (hopefully) familiar set theory. It's usually a trade-off, since you end up repeating yourself when it comes to "internalized" constructions. "Sketches of an Elephant" is a good example of a textbook that pretty much presents everything twice. Once in an ambient set theory and once internally.

What I meant specifically is work such as the following:

  Internal Universes in Models of Homotopy Type Theory - https://arxiv.org/abs/1801.07664

which explicitly works in an extensional type theory with some axioms to simplify and generalize a complicated model construction.

> For example is there a full, unrestricted formalisation of category theory in HoTT?

You can formalize category theory in HoTT as presented in the book. This has some advantages over a presentation in extensional type theory or set theory (being able to work up-to equivalence) and some disadvantages (universes in categories have to be constructed explicitly, since the ambient universe is not truncated). In my opinion, it's not the case that one is strictly superior - in the end it always depends on what you want to do.

fmap · on Jan 10, 2019

Elm is different to ML in that it is explicitly designed to be a pure language. Purity is important for the Elm architecture and allowing arbitrary side effects in your apps would break everything.

That said, you could add effects and a JavaScript FFI using monads or algebraic effects. You could also get rid of a lot of boilerplate in the Elm architecture using existential types. It's a safe bet that the Elm developers know this. But all of these extensions would make the language less approachable. I think one of the reasons why Elm is doing so well is because it's not Haskell. :)

fmap · on Dec 13, 2018

Ok, my first reaction is that this is that it's really wonderful work - straightforward and with a big payoff at the end.

But this really begs the question: why hasn't this been done before? People have been throwing resources at machine learning for a decade now, and somehow nobody has thought to perform instruction selection before executing a model to optimize the kernels used?

What other low-hanging fruit is out there? Automatic partitioning of networks over several GPUs and CPUs? Such dynamic load balancing algorithms have been available in the HPC literature since there was HPC literature. Fusing multiple primitives to simpler kernels? That's what linear algebra libraries have been doing for decades. Optimizing internal data layout (although that seems to be part of this paper)? Optimizing scheduling decisions to minimize data movement?

---

Also since the author seems to be reading this thread: Have you tried measuring the tree-width of the instruction selection DAGs you generate for the PBQP problem? The heuristics for solving these problems in llvm are applicable to tree-width <= 2, but could be extended to, e.g., tree-width <= 4 without too much slowdown. I wonder if there is still an iota of performance to be gained here. :)

andrew-wja · on Dec 13, 2018

Hi, author here. There is a staggering amount of low hanging fruit. I have been half-seriously blaming GEMM in correspondence. When you have a problem that looks like GEMM, it's such an attractive hammer to pick up that people just don't look beyond it to other techniques!

To answer your other questions: we already have auto load balancing and primitive fusion, albeit rudimentary, but optimizing scheduling is the obvious next step. We've extended this stuff to use ILP, and we're on our way to press at the moment!

Re: tree width: the tree widths are huge, but the solver library we're using handles them :)

marmaduke · on Dec 13, 2018

how tied are the compiler aspects to the specific kernels (eg for convolutional layers)? Could load balancing and fusion logic be broken out into a library which could work for user defined kernels?

andrew-wja · on Dec 13, 2018

They aren't tied at all -- in fact the optimizer is a totally separate project (triNNity-optimizer) that just does graph optimization. You can add user defined kernels as long as you have some way of microbenchmarking them!

andrew-wja · on Dec 13, 2018

I just realised the implication in your last question is that a heuristic solution to the PBQP problem is obtained. In fact, for a lot of networks, we get the optimal solution :)

DNN DAGs are big, but they are very simple structurally compared to the kind of stuff you see in a real ISA!

fmap · on Dec 14, 2018

That's amazing! Do you have any insight as to which structural properties make this problem simple?

The reason I was asking after tree-width is that control and dataflow graphs in structured programming languages usually have small tree-width (which gets added to the maximum number of overlapping patterns during instruction selection, more or less) and this leads to the fast heuristics used by the PBQP solvers in llvm and libfirm. It's been a few years since I looked into this though, so I'm probably a bit behind the state of the art. :)

andrew-wja · on Dec 16, 2018

Primarily it's down to the fact that most programs have control flow!

As you most likely know, DAG selectors have to tile the CFG (which can have back-edges due to control flow) into a forest of DAGs, which are then fed to the instruction selector. So from a big program, you get a bunch of DAGs. The DAGs are selected separately, which means that when they execute one-after-the-other or in parallel, there may be inter-DAG conflicts on things like issue ports, or memory ports in ALUs. I'm sure there are a billion mitigation strategies for this, but for DNNs the landscape is a little different.

With a DNN you may have parallel paths, but they are typically few, and there are no back-edges. So the entire program is a DAG, and you don't have to tile and select separately!