Just a small note on the code in the linked script:
API_KEY = os.environ.get("YOUTUBE_API_KEY")
CHANNEL_ID = os.environ.get("YOUTUBE_CHANNEL_ID")
if not API_KEY or not CHANNEL_ID:
print("Missing YOUTUBE_API_KEY or YOUTUBE_CHANNEL_ID.")
exit(1)
Presenting the user with "Missing X OR Y" when there's no reason that OR has to be there massively frustrates the user for the near zero benefit of having one fewer if statement.
if not API_KEY:
print("Missing YOUTUBE_API_KEY.")
exit(1)
if not CHANNEL_ID:
print("Missing YOUTUBE_CHANNEL_ID.")
exit(1)
Way better user experience, 0.00001% slower dev time.
I write a decent amount of Python, but find the walrus operator unintuitive. It's a little funky that API_KEY is available outside of the `if`, perhaps because I had first seen the walrus operator in golang, which restricts the scope to the block.
This isn't really unique to the walrus operator, it's just a general python quirk (albeit one I find incredibly annoying). `for i in range(5): ...` will leave `i` bound to 4 after the loop.
Also true of JavaScript pre-ES5, another language that on first glance seems to only have function scope: it actually does have block scope, but only for variables introduced in `catch` blocks. AFAIU that was the standard way for a dumb transpiler to emulate `let`.
I wonder if that was ever popular, considering the deoptimization effects of try/catch, and given that block scope can also be managed by renaming variables.
Exception blocks don't create a different scope. Instead, the name is explicitly (well, implicitly but deliberately) deleted from the scope after the try/except block runs. This happens because it would otherwise produce a reference cycle and delay garbage collection.
This is the type of things that make me roll my eyes at all the wtf JavaScript posts[0], yes there are a lot of random things that happen with type conversions and quite a few idiosyncrasies (my favourite is that document.all is a non empty collection that is != from false but convert to false in an if)
But the language makes sense at a lower level, scopes, values, bindings have their mostly reasonable rules that are not hard to follow.
In comparison python seems like an infinite tower of ad-hoc exceptions over ad-hoc rules, sure it looks simpler but anywhere you look you discover an infinite depth of complexity [1]
[0] and how half of the complaints are a conjugation of "I don't like that NaNs exist
[1] my favourite example is how dunder methods are a "synchronized view" of the actual object behaviour, that is in a + b a.__add__ is never inspected, instead at creation time a's add behaviour is defined as its __add__ method but the association is purely a convention, eg any c extension type need to reimplement all these syncs to expose the correct behaviour and could for funzies decide that a type will use __add__ for repr and __repr__ for add
> yes there are a lot of random things that happen with type conversions and quite a few idiosyncrasies... the language makes sense at a lower level, scopes, values, bindings have their mostly reasonable rules
The "random things" make it practically impossible to figure out what will happen without learning a whole bunch of seemingly arbitrary, corner-case-specific rules (consider the jsdate.wtf test currently making the rounds). And no, nobody is IMX actually simply complaining about NaNs existing (although the lack of a separate integer type does complicate things).
Notice that tests showcasing JavaScript WTFery can work just by passing user data to a builtin type constructor. Tests of Python WTFery generally rely on much more advanced functionality (see e.g. https://discuss.python.org/t/quiz-how-well-do-you-know-pytho...). The only builtin type constructor in Python that I'd consider even slightly surprising is the one for `bytes`/`bytearray`.
Python's scoping is simple and makes perfect sense, it just isn't what you're used to. (It also, unlike JavaScript, limits scope by default, so your code isn't littered with `var` for hygiene.) Variables are names for objects with reference semantics, which are passed by value - exactly like `class` types in C# (except you don't have to worry about `ref`/`in`/`out` keywords) or non-primitives in Java (notwithstanding the weird hybrid behaviour of arrays). Bindings are late in most places, except notably default arguments to functions.
I have no idea what point you're trying to make about __add__; in particular I can't guess what you think it should mean to "inspect" the method. Of course things work differently when you use the C API than when you actually write Python code; you're interacting with C data structures that aren't directly visible from Python.
When you work at the Python level, __add__/__iadd__/__radd__ implement addition, following a well-defined protocol. Nothing happens "at creation time"; methods are just attributes that are looked up at runtime. It is true that the implementation of addition will overlook any `__add__` attribute attached directly to the object, and directly check the class (unlike code that explicitly looks for an attribute). But there's no reason to do that anyway. And on the flip side, you can replace the `__add__` attribute of the class and have it used automatically; it was not set in stone when the class was created.
I'll grant you that the `match` construct is definitely not my favourite piece of language design.
> methods are just attributes that are looked up at runtime
At runtime when evaluating a + b no dunder method is looked up and there is no guarantee that a + b === a.__anydunder__(b) https://youtu.be/qCGofLIzX6g
What i mean with weird scoping is
def foo():
e = 'defined'
try:
raise ValueError
except Exception as e:
print(e)
print(e) # this will error out
foo()
I also dislike how local/global scopes work in python but that is more of a personal preference.
I agree that that Javascripts standard library is horrible the jsdate.wtf is an extreme but apt example, IMO most of these are solved with some "defensive programming" but I respect other opinions here.
> And no, nobody is IMX actually simply complaining about NaNs existing
I watched many Javascript WTF! videos on youtube and NaNs and [2] == "2" were usually 90% of the content.
It's not that odd, since it's the only situation where you cannot keep it bounded, unless you enjoy having variables that may or may not be defined (Heisenberg variable?), depending on whether the exception has been raised or not?
Compare with the if statement, where the variable in the expression being tested will necessarily be defined.
> Compare with the if statement, where the variable in the expression being tested will necessarily be defined.
if False:
x = 7
print(x)
print(x)
^
NameError: name 'x' is not defined
Ruby does this sort of stuff, where a variable is defined more or less lexically (nil by default). Python doesn't do this. You can have local variables that only maybe exist in Python.
Well, after writing my comment, I realized that a python interpreter could define the variable and set it to None between the guarded block and the except block, and implicitly assign it to the raised exception right before evaluating the except block, when the exception as been raised. So technically, it would be possible to define the variable e in GP example and have it scoped to "whatever is after the guarded block", just like what is done with for blocks.
Is there any chance this would cause trouble though? Furthermore, what would be the need of having this variable accessible after the except block? In the case of a for block, it could be interesting to know at which point the for block was "passed".
The answer is: it is unbound. Intellisense will most likely tell you it is `Unbound | <type>` when you try to use the value from a for loop. Would it be possible that it could be default initialized to `None`? Sure, but `None` is a destinctivly different value than a still unbound variable and may result in different handling.
I stand corrected, the exception case is definitely an oddity, both as being an outlier and as a strange behaviour wrt Python's semantics. Or is it a strange behaviour?
In the case of an if like in your example, no provision is made about the existence of x. It could have been defined earlier, and this line would simply update its value.
Your example:
if True:
x = 5
print(x) # 5
Same with x defined prior:
x = 1
if False:
x = 5
print(x) # 1
What about this one?
if False:
x = 5
print(x) # ???
On the other hand, the notation "<exception value> as <name>" looks like it introduces a new name; what if that name already existed before? Should it just replace the content of the variable? Why the "as" keyword then? Why not something like "except <name> = <exception value>" or the walrus operator?
While investigating this question, I tried the following:
x = 3
try:
raise Exception()
except Exception as x:
pass
print(x) # <- what should that print?
> `for i in range(5): ...` will leave `i` bound to 4 after the loop.
reply
This "feature" was responsible for one of the worst security issues I've seen in my career. I love Python, but the scope leakage is a mess. (And yes, I know it's common in other languages, but that shouldn't excuse it.)
I don't remember the exact details, but it basically involved something along the lines of:
1) Loop through a list of permissions in a for list
2) After the loop block, check if the user had a certain permission. The line of code performing the check was improperly indented and should have failed, but instead succeeded because the last permission from the previous loop was still in scope.
Fortunately there was no real impact because it only affected users within the same company, but it was still pretty bad.
Oof that's a near miss. That's the sort of hard-to-find issue that keeps me up at night. Although maybe these days some ai tool would be able to pick them up
Python 2 actually did let comprehension variables leak out into the surrounding scope. They changed it for Python 3, presumably because it was too surprising to overwrite an existing variable with a comprehension variable.
I cannot tell you how many times I've hit issues debugging and it was something like this. "You should know better" -- I know, I know, but I still snag on this occasionally.
It would be utterly nuts otherwise. For loops over all elements in a sequence. If the sequence is a list of str, as an example, what would the «item after the last item» be?
the issue isn't the value of i, the issue is that i is still available after the loop ends. in most other languages, if it was instantiated by the for-each loop, it'd die with the for-each loop
There's no block scope in Python. The smallest scope is function. Comprehension variables don't leak out, though, which causes some weird situations:
>>> s = "abc"
>>> [x:=y for y in s]
['a', 'b', 'c']
>>> x
'c'
>>> y
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'y' is not defined
Comprehensions have their own local scope for their local variables, but the walrus operator reaches up to the innermost "assignable" scope.
It's only existed for 6 of those years so perhaps you can be forgiven :)
The last time I wrote Python in a job interview, one of the interviewers said "wait, I don't know Python very well but isn't this kinda an old style?" Yes, guilty. My Python dates me.
It seems that way! I briefly experimented with it when it first came out, but never used it in any production code. I've never seen it used by anyone else, either.
I find it interesting that none of the voluminous python code I've had AI tools generate has ever had a walrus operator in it. Reflects the code they're trained on, I guess.
Neophytes take notice. Attention to details like this is what separates truly great programmers from merely good ones. That said, for scripts reusable by others you should use command line arguments . Environment variables in lieu of command line arguments is a huge code smell.
Using read or an equivalent, presumably. Just because you don't know why a practice is recommended against doesn't mean that there isn't a good alternative.
The threat model is that history is persistent while the environment isn't. That said, whenever possible you should handle secrets using file descriptors as opposed to environment variables.
Typer has a great feature that lets you optionally accept argument and flag values from environment variables by providing the environment variable name:
No, that's an anti-feature. :) Sibling comments here claim that command line arguments "leak" whereas environment variables does not. It's plain wrong. An attacker with access to arbitrary processes' cmdline surely also has access to their environ. Store secrets in files, not in the environment. Now you can easily change secret by pointing the --secret-file parameter to a different file. The only reason people use BLABLA_API_KEY variables is because Heroku or something did it back in the day and everyone cargo-culted this terrible pattern.
One could write a huge treatise on everything that is wrong with environment variables. Avoid them like the plague. They are a huge usability PITA.
This is bad advice. Please don't make claims about security if you're making it up as you go.
Environment variables are substantially more secure than plain text files because they are not persistent. There are utilities for entering secrets into them without leaking them into your shell history.
That said, you generally should not use an environment variable either. You should use a secure temporary file created by your shell and pass the associated file descriptor. Most shells make such functionality available but the details differ (ie there is no fully portable approach AFAIK).
The other situation that sometimes comes up is that you are okay having the secret on disk in plain text but you don't want it inadvertently commited to a repository. In those cases it makes sense to either do as you suggested and have a dedicated file, or alternatively to set an environment variable from ~/.bashrc or similar.
if not API_KEY and not CHANNEL_ID:
print("Missing both YOUTUBE_API_KEY and YOUTUBE_CHANNEL_ID.")
exit(1)
if not API_KEY:
print("Missing YOUTUBE_API_KEY.")
exit(1)
if not CHANNEL_ID:
print("Missing YOUTUBE_CHANNEL_ID.")
exit(1)
That way you don't end up fixing one just come back and be told you're also missing another requirement
Even better would be to only check each once and buffer the decision:
valid = True
if not API_KEY:
print("Missing YOUTUBE_API_KEY.")
valid = False
if not CHANNEL_ID:
print("Missing YOUTUBE_CHANNEL_ID.")
valid = False
if not valid:
exit(1)
This way you only check each value once (because your logic might be more complicated than just checking it's not set, maybe it can be wrongly formatted) and you still get to do whatever logic you want. It also removed the combinatorial problems.
This is a pretty general principle of separating decision from action.
Thank you. I see something like this all the time on one of the sites I use for work. If you fail the 2-factor, it'll tell you your password was wrong and reset the whole thing instead of telling you the 2-factor code was wrong or expired.
I feel like use case and audience matters when making these decisions. In this case, the user is probably someone interacting with a python script they're running in a console (I assume by print), then I really don't think it matters - the user will check that both things are set. Should you also give them some documentation about setting env vars? Should you customize that documentation to the OS they're running? etc.
If the user is a typical consumer using a typical consumer interface, then yes you want to handhold them a bit more.
$ python3 -c "print('clear messaging'); exit(1)"
clear messaging
$ python3 -c "raise ValueError('text that matters')"
Traceback (most recent call last):
File "<string>", line 1, in <module>
ValueError: text that matters
and that story gets _a lot_ worse when some programs raise from within a "helper" module and you end up with 8 lines of Python junk to the one line of actual signal
Funny how statements like these get upvoted and not flagged, but my "Lmao" did. I should be more toxic it seems. Which I thought is worse than an "Lmao" but hey, HN knows best.
("Lmao" is useless, but definitely not worse than some other responses.)