I just don’t know what he means by logits. Everything else seems like straightfo...

Buttons840 · on April 5, 2023

He defines it pretty clearly. Logits are the inputs to a softmax layer / calculation, which turn the logits into normalized percentages (the percentages sum to 1.0).

Before going through the softmax layer, the logits will be small numbers around 0, probably. Something like: [2.89, -4.53, 0.24, -1.556, 0.57]. Logits like this are natural outputs of a neural network, because they can be any real number and everything will still work.

The logits become percentage as follows:

    julia> x = [2.89, -4.53, 0.24, -1.556, 0.57]
    5-element Vector{Float64}:
      2.89
     -4.53
      0.24
     -1.556
      0.57
    
    julia> x = e.^x
    5-element Vector{Float64}:
     17.993309601550315
      0.010780676072743085
      1.2712491503214047
      0.2109782988178321
      1.768267051433735
    
    julia> x / sum(x)
    5-element Vector{Float64}:
     0.8465613320288766
     0.0005072164987105474
     0.05981058503789324
     0.009926248902037537
     0.08319461753248213

Logits is an overloaded term though, and means different things in different contexts.

joshvm · on April 5, 2023

When people mention logits, they're usually referring to the raw output of the model before it gets transformed/normalised into a probability distribution (i.e. sums to 1, range [0,1]). Logits can take any value. The naming might not be mathematically strict, because it assumes(?) that you're going to apply softmax (which interprets the output of the model as logits), but that's how the term is used.

For example in many classification problems you get a 1D vector of logits from the final layer, you apply softmax to normalise, then argmax to extract the predicted class. It extends to other tasks like semantic segmentation (predict pixel classes) where the "logit" output is the same size as the image with a channel for each class and you apply the same process to get a single channel image with class-per-pixel.

Here's a nice explanation: https://stackoverflow.com/a/66804099/395457

KyeRussell · on April 5, 2023

Honestly what cracked logits for me was a conversation with ChatGPT in which I gave it my professional background, areas of strength and weakness, and problem context, and had it explain to me. I then went elsewhere to make sure I hadn’t been lied to. I’ve found ChatGPT such an invaluable learning tool when used in this way.

PheonixPharts · on April 6, 2023

I'm confused by the comments here, does "logit" here not mean "log odds" like it does in virtually every other context related to machine learning?

Generally I'm a huge fan of not getting too caught up in theory before diving into practice, but I'm seeing multiple responses to this comment without a single mention of "log odds".

The logit function transforms probabilities into the log of the odds (ln P(X)/(1-P(X)), which is important because it makes probabilities linear, which they are not in their standard [0,1] form. It's the foundation of logistic regression, which is, despite much misinformation, quite literally linear regression with a transformed target.

The logistic function is the inverse of the logit: it turns log odds values into probabilities once again. Logistic regression actually transforms the model not the target (most of the time) because the labels 0 and 1 are negative and positive infininity which can't be handled by linear regression (so we transform the model using the inverse instead).

I don't think I can stress enough how important is its to really understand logistic regression (which is also the basic perceptron) before diving into neural networks (which are really just an extension of logistic regression).

airstrike · on April 5, 2023

Having not watched the series, I can only assume he means logit as in a probability function from 0 to 1

https://deepai.org/machine-learning-glossary-and-terms/logit....

PheonixPharts · on April 6, 2023

logit is not a "probability function", quite the opposite. You can see this in the image in the link you posted (the x-axis is from 0-1, the y-axis is from -inf to inf). It transforms probabilities into log odds which is a linear space, and make combining probabilities much nicer.

The inverse logit or logistic function takes log odds and transforms them back into probabilities.

Most machine learning relies heavily on manipulating probabilities, but since probabilities are not linear, the logit/logistic transformations become essential to correctly modeling complex problems involving probabilities.

airstrike · on April 6, 2023

probability-related* function