Hacker Newsnew | past | comments | ask | show | jobs | submit | dawnofdusk's commentslogin

>Mathematicians are prone to taking words from elsewhere, either twisting their meaning or inventing wholly new meaning out of thin air, all according to their whimsy for their own particular needs.

True but one benefit of those guys is that they actually define what they mean in a formal way. "Programmers" generally don't. There is in fact some benefit in having consistent names for things, or if not at least a culture in which concepts have unambiguous definitions which are mandated.


> they actually define what they mean in a formal way.

Sometimes yes. The big stuff is usually exhaustively formally defined down to the axioms. The further you get away from the absolute largest, most well-tread ground... the wood grows dark quite fast. Math, especially on the cutting edge, is intuitive like anything else and filled with hand-waives. Even among the exhaustively defined, there are plenty which only achieved exhaustiveness thanks to later work.

> "Programmers" generally don't.

On the contrary, all programming languages are formal grammars. I think the best way that I can underline the difference is that mathematicians are primarily utilizing formal grammars for communication to share meanings, and almost exclusively deal with meanings that are very well-defined. Programmers on the other-hand are often more concerned with some other pressing matter, usually involving architecting something unfathomably massive with a minute fraction of the man-hours used to construct ZFC, often dealing with far fuzzier things and with outright contradictory axioms which they have no control over.

They are as different as trophy truck rally and formula 1. As someone who lives in both worlds, I'm endlessly disappointed by shitflinging and irrational superiority contests between the two as though they even live in the same dimension.

> There is in fact some benefit in having consistent names for things

There is, in some contexts to some ends. Those are important and influential contexts and ends, and so the relevant math should be studied and well understood on an intuitive level. But they form a minority in both fields. I've known many mathematicians outside of programming contexts, and none of them have any grasp of category theory, type theory, the lambda calculus, etc. They might have heard of category theory, but they look at it with the same suspicion as you might expect from some fringe theoretical physics framework.

There is also the problem that these "consistent names" are built on a graveyard of previous conception. Mathematics is a history of mental frameworks, as are all ancient fields. As I underlined in my previous post, the formalization of mathematics gutted the idea of the function and filled it with straw. There's nothing wrong with how functions are defined within ZFC mind you. I just vehemently disagree with projecting it as a universal context, showing it any degree of favoritism.


>what this routing mechanism is (heating a substrate, maybe?)

You can engineer a waveguide if you understand the nonlinear theory they propose. There's no heat exchange involved, which is easy to get confused on because the writing in the article does not really understand "optical thermodynamics".

>if the routing is dynamically changeable

At this point probably not, it requires a finely engineered waveguide which has a well-defined "ground state"

>it works in reverse, eg light coming in can be routed to one of several output ports

In theory it works in reverse, as everything in this system is time-reversible (i.e., the "optical thermodynamics" is just an analogy and not real thermodynamics, which would break time reversibility). This is demonstrated via a simulation in the SI, but experimentally they did not achieve this (it may be difficult, I am not an experimentalist so cannot comment).


I really like the second part of the blogpost but starting with Gaussian elimination is a little "mysterious" for lack of a better word. It seems more logical to start with a problem ("how to solve linear equations?" "how to find intersections of lines?"), show its solution graphically, and then present the computational method or algorithm that provides this solution. Doing it backwards is a little like teaching the chain rule in calculus before drawing the geometric pictures of how derivatives are like slopes.


Author here – I think you're probably right. I wrote the Gaussian elimination section more as a recap, because I figured most readers have seen Gaussian elimination before, and I was keen to get to the rest of it. I'd love to hear if other folks had trouble with this section. Maybe I need to slow it down and explain it better.


I actually really liked the gaussian elimination part. It's a term you hear often and 'demystifying' it is good imho.

Only nitpick I have is that it's a pity you use only 1 and 2 in the example with the carbs. Because of the symmetry it makes it harder to see which column/row matches which part of the vector/matrix because there's only 1s and 2s and it fits both horizontally and vertically...


Loved the article, and also the shoutout to Strang's lectures.

I agree with the order, the Gaussian should come later I almost closed the article - glad I kept scrolling out of curiosity.

Also I felt like I had been primed to think about nickles and pennies as variables rather than coefficients due to the color scheme, so when I got to the food section I naturally expected to see the column picture first.

When I encountered the carb/protein matrix instead, I perceived it in the form:

[A][x], where the x is [milk bread].T

so I naturally perceived the matrix as a transformation and saw the food items as variables about to be "passed through" the matrix.

But another part of my brain immediately recognized the matrix as a dataset of feature vectors, [[milk].T [bread].T], yearning for y = f(W @ x).

I was never able to resolve this tension in my mind...


To some, "Now we can add the two equations together to eliminate y: might need a little explanation.

The (an) answer is that since the LHS and RHS are equal, you can choose to add or subtract them to another equation and preserve equality.

If I remember correctly, substitution (isolating x or y) was introduced before this technique.


Positive proportion - negative proportion = 0.


I hadn’t, and your article lost me there to be honest. You didn’t explain the what, why, or when behind it, and it didn’t make sense to me at all. That said, I’m abnormally horrible at math.


> You didn’t explain the what, why, or when behind it

>> The trouble starts when you have two variables, and you need to combine them in different ways to hit two different numbers. That’s when Gaussian elimination comes in.

>> In the last one we were trying to make 23 cents with nickels and pennies. Here we have two foods. One is milk, the other is bread. They both have some macros in terms of carbs and protein:

>> and now we want to figure out how many of each we need to eat to hit this target of 5 carbs and 7 protein.


Noted! I may make a totally separate post on gaussian elimination. Could you talk me through what parts were confusing, and would you be willing to review a post on gaussian elimination to see if it works for you?


You're assumption worked for me... I've seen gaussian elimination before (but not the linear algebra) which gave me an idea of what we were doing.


Do you have any plan to turn it into a full book—maybe called Grokking Linear Algebra ?


Lol. Maybe! I did enjoy writing Grokking Algorithms, but writing a full book is a real commitment. That one took me 3 years.


Or something like to the tune of "what does it mean that we can eliminate", which is still unclear to me. But a lovely article, the way you (op) introduce the column perspective and really hepful for a novice such as myself.

+ there are many textbooks on LA. Not a lot of them introduce stuff in the same order or in the same manner. I think that's part of why LA is difficult to teach, and difficult to comprehend, and maybe there is no unique way to do it, so we kinda need all the perspectives we can get.


>My goal is to develop a practical, working understanding I can apply directly.

Apply directly... to what? IMO it is weird to learn theory (like linear algebra) expressly for practical reasons: surely one could just pick up a book on those practical applications and learn the theory along the way? And if in this process, you end up really needing the theory then certainly there is no substitute for learning the theory no matter how dense it is.

For example, linear algebra is very important to learning quantum mechanics. But if someone wanted to learn linear algebra for this reason they should read quantum mechanics textbooks, not linear algebra textbooks.


You're totally right. I left out the important context. I'm learning linear algebra mainly for applied use in ML/AI. I don't want to skip the theory entirely, but I've found that approaching it from the perspective of how it's actually used in models (embeddings, transformations, optimization, etc.) helps me with motivation and retaining.

So I'm looking for resources that bridge the gap, not purely computational "cookbook" type resources but also not proof-heavy textbooks. Ideally something that builds intuition for the structures and operations that show up all over ML.


Strang's Linear algebra and learning from data is extremely practical and focused on ML

https://math.mit.edu/~gs/learningfromdata/

Although if your goal is to learn ML you should probably focus on that first and foremost, then after a while you will see which concepts from linear algebra keep appearing (for example, singular value decomposition, positive definite matrices, etc) and work your way back from there


Thanks. I have a copy of Strang and have been going through it intermittently. I am primarily focused on ML itself and that's been where I'm spending most of my time. I'm hoping to simultaneously improve my mathematical maturity.

I hadn't known about Learning from Data. Thank you for the link!


Since you're associating ML with singular value decomposition, do you know if it is possible to factor the matrices of neural networks for fast inverse jacobian products? If this is possible, then optimizing through a neural network becomes roughly as cheap as doing half a dozen forward passes.


Not sure I am following; typical neural network training via stochastic gradient descent does not require Jacobian inversion.

Less popular techniques like normalizing flows do need that but instead of SVD they directly design transformations that are easier to invert.


The idea is that you already have a trained model of the dynamics of a physical process and want to include it inside your quadratic programming based optimizer. The standard method is to linearize the problem by materializing the Jacobian. Then the Jacobian is inserted into the QP.

QPs are solved by finding the roots (aka zeroes) of the KKT conditions, basically finding points where the derivative is zero. This is done by solving a linear system of equations Ax=b. Warm starting QP solvers try to factorize the matrices in the QP formulation through LU decomposition or any other method. This works well if you have a linear model, but it doesn't if the model changes, because your factorization becomes obsolete.


Although I am not an expert in quantum information, I think the problem you pose is resolved by the fact that the no-signalling theorem is about measurements of a quantum state, which is a microscopic state, and heat transfer is a measurement of a thermodynamic quantity, which is macroscopic. In much the same way that measuring the temperature of a classical gas doesn't give information on the location or momenta of the constituent particles, a thermodynamic probe of entanglement doesn't necessarily furnish precise information on how a state is entangled (e.g., Eq. 2 in https://arxiv.org/pdf/quant-ph/0406040).


I have some minor complaints but overall I think this is great! My background is in physics, and I remember finally understanding every equation on the formula sheet given to us for exams... that really felt like I finally understood a lot of physics. There's great value in being comprehensive so that a learner can choose themselves to dive deeper, and for those with more experience to check their own knowledge.

Having said that, let me raise some objections:

1. Omitting the multi-layer perceptron is a major oversight. We have backpropagation here, but not forward propagation, so to speak.

2. Omitting kernel machines is a moderate oversight. I know they're not "hot" anymore but they are very mathematically important to the field.

3. The equation for forward diffusion is really boring... it's not that important that you can take structured data and add noise incrementally until it's all noise. What's important is that in some sense you can (conditionally) reverse it. In other words, you should put the reverse diffusion equation which of course is considerably more sophisticated.


As any practicing scientist knows even good research papers may be littered with blatant but unimportant errors. There is unfortunately no good reason or system to "correct the record", and it is not clear to me if such a thing is a good use of human resources. Nonetheless, I think correcting the record is always appreciated!


Getting a compound incorrect is not an "unimportant" error (for example the difference between sodium nitrate & sodium nitrite is small but critical) and seeing "small but blatant" errors actively propagated is the entire reason why the record should be corrected. The only upside of these little artifacts like "vegetative electron microscopy" [0] is that it's a leading indicator that the entire paper and team deserve more scrutiny--as well as any of those whom cite it.

[0] https://www.sciencealert.com/a-strange-phrase-keeps-turning-...


I believe they meant that it's "unimportant" because (to use your example) sodium nitrate and sodium nitrite actually exist, whereas there's no element with the chemical symbol "Gr".


The error in the OP is a typo that could never seriously confuse anyone, as the element Gr does not exist.

An interesting perspective is Terry Tao's on local vs. global errors (https://terrytao.wordpress.com/advice-on-writing-papers/on-l...). A typo like this, even if propagated, is a local error which at worst makes it very annoying to Ctrl-F papers or do literature review. Local errors deserve to be corrected, but in practice their importance to science as a field is small.


That is a possible, but charitable explanation. I would like to hold your opinion, but don't know if I can. It must complete with less-charitable ones.


Like the legal principle of "De minimis" ('the law does not concern itself with trifles').

[0] https://en.wikipedia.org/wiki/De_minimis


Have you heard of this thing called Peer Review? It's what academia hold up as their gold standard and it is supposed to pick up on these things.


Peer review isn't spellcheck or proofreading.

It's about logic, methodology, significance, and citations.

It's not some gold standard of perfection or truth.


>Peer review isn't spellcheck or proofreading.

>It's about logic, methodology, significance, and citations

To quote kazinator in this thread. "The typo is not the problem; it's that the typo is evidence of academic dishonesty.

When you make a citation, it means you cracked open the original work, understood what it says and located a relevant passage to reference in your work.

The authors are propagating the same typo because they are not copying the original correct text; they are just copying ready-made citations of that text which they plant into their papers to manufacture the impression that they are surveying other work in their area and taking it into account when doing their work."

>It's not some gold standard of perfection or truth.

"Gold standard" is a term used within the scientific community to describe the high rigor expected within the scientific community when doing research. One of the processes they hold up in this standard is Peer Review. I wasn't making some general public statement about perfection. Google "Gold Peer Review Gold Standard".


That's not only quite factually wrong, but has nothing to do with the point, which is about mindless copying.


If it is factually wrong please tell me how.


Pithy replies are seldom backed up by much.


From what I read in the comments of the first post, the Pyon guy seems very toxic and pedantic, but the rebuttal by Ned isn't great. For example, nowhere in the rebuttal is the pedantic technical detail ever actually described. In fact the prose reads very awkwardly in order to circumlocute around describing it, just repeatedly naming it "particular detail". In my view, the author overreaches: he dismisses Pyon not only for the delivery of his criticism (which was toxic) but also the content of his criticism (why?).

Ultimately Ned is in the right about empathy and communication online. But as an educator it would have been nice to hear, even briefly, why he thought Pyon's point was unnecessarily technical and pedantic? Instead he just says "I've worked for decades and didn't know it". No one is too experienced to learn.

EDIT: I just skimmed the original comment section between Pyon and Ned and it seems that Ned is rather diplomatic and intellectually engages with Pyon's critique. Why is this level of analysis completely missing from the follow-up blogpost? I admit to not grasping the technical details or importance, personally, it would be nice to hear a summarized analysis...


Such is the curse of blogging: when writing a series of posts, authors naturally assume that the readers are as engaged and as familiar with the previous discussion as they are.

In reality, even regular subscribers probably aren't, and if you're a random visitor years later, the whole thing may be impossible to parse.


> Why is this level of analysis completely missing from the follow-up blogpost

Because that's not what the article is about. It's not about whether Pyon was correct or wrong, it's that they were a dick about it. Their opening words were "you should be ashamed". They doubled and tripled down on their dickery later on.

And no matter how good your point is or how right you are: if you're a dick about it people will dislike you.


Whenever I read content like this about Big O notation I can't help but think the real solution is that computer science education should take calculus more seriously, and students/learners should not dismiss calculus as "useless" in favor of discrete math or other things that are more obviously CS related. For example, the word "asymptotic" is not used at all in this blog post. I have always thought that education, as opposed to mere communication, is not about avoiding jargon but explaining it.


> students/learners should not dismiss calculus as "useless"

This seems to be quite a bit of a strawman to me.

ML is such a major part of the field today and at a minimum requires a fairly strong foundation in calc, linear algebra, and probability theory that I earnestly don't believe there are that many CS students who view calculus as "useless". I mean, anyone who has written some Pytorch has used calculus in a practical setting.

Now in pure software engineering you will find a lot of people who don't know calculus, but even then I'm not sure any would decry it as useless, they would just admit they're scared to learn it.

If anything, I'm a bit more horrified at how rapidly peoples understanding of things like the Theory of Computation seem to be vanishing. I've been shocked with how many CS grads I've talked to that don't really understand the relationship between regular languages and context free grammars, don't understand what 'non-determinism' means in the context of computability, etc.


>This seems to be quite a bit of a strawman to me.

Not really, if you ever listen to CS undergrads or people in non-traditional schooling (bootcamps, etc.) talk about software engineering this opinion is essentially ubiquitous. People interested in ML are less likely to hold this exact opinion, but they will hold qualitatively identical ones ("do you really need multivariable calculus/linear algebra to do ML?"). It is precisely because people (primarily Americans) are scared to learn mathematics that they rationalize away this fear by saying the necessary mathematics must not be essential, and indeed it is true that many people get away without knowing it.


> non-traditional schooling (bootcamps, etc.)

It's probably unfair to qualify those as "computer science education" though...


Most people from almost every specialty mostly dismiss calculus as useless, don't even remember linear algebra is a thing, and never learn or forgets everything about statistics. (But the reaction to probability varies a lot.)

I have no idea what's up with those disciplines, but it's an almost universal reaction to them. Unless people are very clearly and directly using them all the time, they just get dismissed.


Can’t imagine what it might be.


If that's sarcastic, do you have any idea why probability is different?


Part of the problem is that a lot of people that come across big O notation have no need, interest, or time to learn calculus. I think it's reasonable for that to be the case, too.


The thing is, this is like saying lots of mechanical engineers have no need, interest, or time to learn derivatives; they just want to get on with "forces" and "momentum" and cool stuff like "resonance". Saying you have no interest in learning limits and asymptotes but you want to know what people are talking about when they mention asymptotic analysis doesn't make sense.

If you want know what calculus-y words mean, you're going to need to learn calculus. People use calculus-y words to quickly convey things professionally. That's why it's a "topic" for you to learn. The thing under discussion is a limit.


I replied to this effect to someone else in this thread, but I think it's reasonable for people to want to have an idea of what big O is for (in software engineering!) without having to have a grounding in calculus. The notation is useful, and used, without it regularly.


It's reasonable but essentially every "common misconceptions about Big O" is because people didn't have the necessary notions in calculus. For example, the fact that O(x^2) can be practically faster than O(x), due to the size of constants/subdominant terms, is confusing only if you never properly learned what asymptotic behavior is.

The practical question is whether you think it's ok to continue propagating a rather crude and misunderstanding-prone idea about Big O. My stance is that we shouldn't: engineers are not business people or clients, they should understand what's happening not rely on misleading cartoon pictures of what's happening. I do not think you need a full-year collegiate course in calculus to get this understanding, but certainly you cannot get it if you fully obscure the calculus behind the idea (like this and uncountable numbers of blogpost explainers do).


Given the various ways people in this thread have pointed out you lack fluency with the notation, why do you think it reasonable for people to want to learn it without learning the concepts it's describing?


I’m not sure that’s quite my position. Happy to cede that I lack fluency, and I appreciate your time and the time others have given to help me understand.


I find myself in the odd position of disagreeing with you and the person you responded to.

First, software engineering doesn't just consist of Computer Science majors. We have a lot of people from accounting, physics, or people who have no degree at all. Teaching this concept in CS fixes very little.

Second, and complimentary to the first, is that asymptotic behavior is derivative of the lessons you learn in Calculus. You can't really full understand it beyond a facade unless you have a rudimentary understanding of Calculus. If you want to put this theory to the test then ask someone with a functional understanding of Big-O to write asymptotic notation for a moderately complex function.

I don't have a degree and in order to really understand asymptotics (and Big-O as well as the others) I read a primer on Calculus. It doesn't take a ton of knowledge or reading but a decent background is what will get you there. I do think we need a lot better continuing education in software that goes beyond O'Reilly style technical books that could fill this gap.


I'm not the original commentator, that makes a lot of sense! I had assumed there was a huge overlap, personally.


I think it's pretty common for folks to enter the software field without a CS degree, start building apps, and see big O notation without understanding what it is. These people have jobs, deadlines, they want to ship features that make peoples' lives easier. I'd bet many of those people don't care so much about calculus, but a quick intro to what all this big O nonsense is about could help them.


Apparently Big-O notation was invented by Paul Bachmann in 1894. Its not a johnny-come-lately.


Agree. Science communicators should stick to talking about well-established or at least peer reviewed results. They do not need to be peddling fringe crackpottery. I don't think Tim's prose is magnificent, but the work speaks for itself: he wrote a serious technical document which stands alone with no response. Serious, credentialed physicists should platform these types and not grifters.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: