Fun joke but Anaconda has a track record of creating OSS and then turning it over to community governance. This includes the conda tool itself, libraries like bokeh, dask, numba, jupyterlab, and many more. And while PyScript project governance isn't in NumFOCUS, all of the code is permissively-licensed BSD/MIT.
The commercial licenses for the products and commercial repository is what supports all of this OSS development work.
I'm sorry but deep learning is only a very small part of why Python is preferred by data scientists. The fact that Python was the the preferred language is why the enormous corporations wrote bindings to them. Both of these frameworks exist in the Julia ecosystem.
As someone who does data science, I roll my eyes every time I have to touch Python. It’s ubiquitous, but it actually sucks once you get used to better languages.
It is somewhat circular: it was preferred because your earlier alternatives were Java or C(++) both of which had their shortcomings. SKLearn is still one of the most feature-complete and powerful libraries and it was Python only and thus drew a crowd.
A lot of people who write data science code, I would be confident to bet that if you taught them Julia first, they’d prefer that.
There are lots of little things that add up. Things like expressing multiline closures in Python is clunky compared to almost any other dynamic language whether Ruby, Lua or Julia.
AsyncIO is quite complex in how it works. You use the same patterns for concurrency in Julia, but it is so much easier to grasp and work with.
Multiple-dispatch as used in Julia e.g. makes API design so much cleaner. You can see that almost anywhere. I make some comparison of making REST calls in Python and Julia here. Julia is much cleaner IMHO: https://erik-engheim.medium.com/explore-rest-apis-with-curl-...
The Python problem is that you cannot reuse the same function name easily for different types. Hence instead of creating one abstraction across many different types you need to invent all these different names which can be hard to guess. In Julia there are often far fewer core concepts to learn which can be re-applied in far more ways.
Things like calling shell programs is done more elegantly in Julia. Same with calling C functions. String interpolation is more obvious. There are not like 4 different ways of doing it.
Package management and environment management is much simpler and elegantly done.
I agree some of this may seem unfair as Python has baggage from being an older language. But that also counts in its favor with wider selection of libraries. Both should be taken into account when evaluating your choices.
> The Python problem is that you cannot reuse the same function name easily for different types. Hence instead of creating one abstraction across many different types you need to invent all these different names which can be hard to guess. In Julia there are often far fewer core concepts to learn which can be re-applied in far more ways.
At least there's `functools.singledispatch` in the standard library. There's apparently also multiple dispatch libraries. I've never used either, duck typing with some try/except (and some isinstance) has served me well so far, but I agree it's not as clean.
> There are not like 4 different ways of doing it.
Yet.
> Package management and environment management is much simpler and elegantly done.
Yeah, it's a nightmare in Python. After setting up projects dozens of times now I still don't grok it.
Not the OP, the community approach to performance, that rather rewrites code into C, still calling it Python (?), instead of being more supportive of ongoing JIT endeavours.
Yes Python is very dynamic, not more than Smalltalk, SELF or Common Lisp, all with quite good JIT engines.
That's my opinion, and coming from you (from what I get you're seasoned in many area of the computing field) it's not surprising. But the current mainstream is not really aware of all this. There's tiny python cult due to scipy et al.
There is another thing, most of the major libraries where Python is used as DSL, written in a mix of C, C++ and Fortran, can be used in other languages just as well, nothing special about Python there other than lack of awareness of what everyone else is doing.
In the ML, DL world or in the physical simulators, people just compose a task and throw it to a CPU/GPU/TPU or a cluster of these and let it run for a long time. I don't see how Julia will be different for this kind of tasks. I understand that in Julia you solve the 2-language problem and all the goodies that multiple dispatch brings but the Python ecosystem progressed a lot in the past 2 years, now you have Numba JIT, Jax JIT, PyTorch JIT, XLA JIT and many other proprietary JITs that are not open-sourced. Since JAX (as an example) is mostly numpy and Python, you can leverage your existing knowledge instead of having to learn a fundamentally new paradigm. I would say that Python has many "specialised" JIT engines and it seems to work great for the community. Don't get me wrong, Julia is interesting, I can't deny that but I expect a huge adoption period for it. It can find its niche as C++ did for extremely high performance computing or Scala for Big Data (though Java is starting to replace many use cases). If you ask me now, I would say that the world converges around Java, C++ and Python when it comes to data, the old trio and it will remain this way for at least another decade.
> but the Python ecosystem progressed a lot in the past 2 years, now you have Numba JIT, Jax JIT, PyTorch JIT, XLA JIT and many other proprietary JITs that are not open-sourced.
Python has a bunch of specific use-case, non-interoperable limited JIT’s that you have to learn separately.
Not the case with Julia: you write some arbitrary code, it gets optimised, so much simpler. The Julia community did some cool things with Flux where complex and field-specific equations were dropped wholesale into neural-network definitions without having to rewrite anything. That sort of power is invaluable.
Not OP, but I absolutely prefer R, the Scheme inspiration is obvious and allows for flexibility completely impossible in Python (here's to PEP 638, but there's a ton of hostility to it from what I can tell).
I also really really really (really) like Julia, but don't quite think it's there yet. I'm optimistic though, these things take time.
It's growing in a sprawling, disorganized fashion. It's developing generics (with horrid ambiguous syntax) for semantic typing that's not generally used outside some applications. The walrus operator is obscene.
This is the very recent trend. I think it's a global wave, most languages accelerated pace in the last decade. es6, php, heck.. even java went chrome-speed now. Can be a bit messy yeah.
Rust is hands down my favourite language and it’s what I write my personal projects in these days. Julia is second. Rusts philosophy of “the complexity doesn’t go away, so just be aware and pay it up front” and the type system and compiler feel aaaamazing to use once you grasp them: writing some code, knowing exactly how and where the failure points are and having it compile and know that it will be correct is hard to go back from.
I used to be a huge Python fan, and I’d dug through docs and guides for pretty much everything I could, so my frustrations aren’t “outsider criticisms” as such.
My biggest issue is the amount of “magic” that goes on and is actively encouraged; it happily lets you get away with anything, and I’ve read and had to fix too much horrible Python code that technically does what’s required, but in the most torturous and difficult-to-untangle manner possible. I’ve come to resonate with this idea of “what does the language/tooling encourage you to do” and in my experience, Python doesn’t encourage a lot of good things, but it does end up encouraging you to “hack around problems” rather than fixing them at their root, lean on magic wherever possible (which is never backed up by any kind of correctness guarantee) and let the programmer do whatever they want, regardless of how much of a bad or non-idiomatic thing it is. The “type system” leaves a lot to be desired, there’s optional type hints
now, but the larger community seems ambivalent at best-uptake is glacial in my observations and mypy is just sort of ok. The performance is pathetic, there’s no getting around that, and the response of “just write it in C if you need speed” is a poor answer. The Python core dev team seems insistent on continually stacking pointless new features in (walrus operator why?) whilst simultaneously not really doing anything about real issues (like the packaging situation). I’ve also come to really dislike exception-based error handling: having no compiler, type-checking and knowing that anything could explode anywhere isn’t a reassuring feeling once your codebase gets big enough. Yeah you can put try-catch, and code defensively to head-off issues, but it doesn’t take much before you’ve spent as much time and energy doing that as it would have taken to write it in a more suitable language but with maybe 1/10th of the guarantees and none of the performance.
If you want to write a web API, you’d be better off writing something in Golang, .Net, Typescript on NodeJS, possibly even Swift.
General purpose stuff you could replace with any of those languages + Rust. Admittedly it does still have prime position for ML frameworks, but I’d be using Julia at work if it was up to me.
Edit: a sibling comment mentioned Async in Python - an experience that was so frustrating I’d excised it from memory.
Yeah, I've used Python in university and before, for ML and DL, for scripting, at work... it's very inconsistent and annoying. The package management situation is horrible. Yet it's seen as this magical beginner friendly and clean or even beautiful and elegant language while the reality is quite different.
How are you finding it? I’d like to think that with some suitable package evolution/development doing production ML stuff in it would actually be pretty reasonable.
What packages are you using? Linfa looks like it’s developing strong legs and SmartCore seems be ticking away in the background quietly...
yeah I keep up with the Linfra group, they are making steady progress. I hadn't seen SmartCore yet but that looks promising.
I mainly use tch-rs which are just bindings around libtorch, there are a couple edges wrapping c++ (function overloading) but overall it works great. I've also used ndarray a fair amount which is nice.
It’s mostly a personal favourite, but once Ballista [1] gets a bit more developed, I expect we’ll tear out our Java/Spark pipelines and replace them with that.
The ML ecosystem in Rust is a bit underdeveloped at the moment, but work is ticking along on packages like Linfa and SmartCore, so maybe it’ll get there? In my field I’m mostly about it’s potential for correct, high-performance data pipelines that are straightforward to write in reasonable time, and hopefully a model-serving framework: I hate that so many of the current tools require annotating and shipping Python when really model-serving shouldn’t really need any Python code.
Anything labeled Watson get out the checkbook!!! My guess is that this thing will do less than Amazon Lex and cost $$$$.
Reading the buzz on the page, I don't see any difference between this and the chat api Watson released last year. The one they wanted like $60K for my startup to use for 50 conversations a day....A DAY!!!
I would need some sort of contact details to shoot you a message, or perhaps I'm not kewl enough to know how to do that in hacker news... This thread leads me to believe that it can't be done, but that was nearly 3000 days ago https://news.ycombinator.com/item?id=1028763
"There has been very little open source that has made its way into broad use within the HPC commercial community where great emphasis is placed on serviceability and security. There is a better track record in data analytics recently with map/reduce as a notable example.
...
It should be noted that the most significant consumption of open source software is China and it is also the case that the Chinese are rare contributors to open source as well. Investments in open source or other policy actions to stimulate creation are likely to produce a disproportionate benefit accruing to the Chinese."
Currently in a project situation with no budget and blue sky dreams. I can't say how much I love this article enough.
I think one of the unspoken insights here is: give your projects a budget.
I've been with two startups now that didn't do this and see them either think there is no budget or there is infinite budget. Ultimately both CEOs would say "Just come ask me" which means that you now have to pester the busiest person at the company to get a budget. It is effectively giving the project no budget.
As acdha above notes, he was. But the author incredulously blurted out that he himself had hit the button when he was told his supervisor had been fired over the event (maybe ten minutes later).
The reality was that his boss took the fall for him, which is awesome and terrible. Much of the discussion in this thread has been a tempest in a teapot due to missing context.
His supervisor took the fall to protect him, he was fired on paper, but it was his last day anyway, and he did actually get to work at the embassy again, as it really was a simple innocent mistake.
Though certainly one with serious, long-lasting consequences.
OK, probably I'm just dumb and have poor reading comprehension (and will get downvoted again for asking a simple question), but can you explain why Itoh was responsible?
It seems that the translators could have saved their work more regularly -- perhaps they hold some of the blame. Obviously the poster could have thought a bit before hitting the button -- he holds all the blame for the resetting of all the terminals. How is Itoh "more responsible"?
My reasoning is that Mr. Itoh put an intern in charge of a system that could cause major damage. It's like giving the intern keys to your AWS console and shitting your pants when he terminates all you EBS root disks that you didn't back up.
Mr. Beck wasn't culpable because he didn't understand the full effects of his actions or the tension of the current situation. Ioth should not have let Beck in the door that morning and he should not have given that much power to the intern.
Misses the fundamental point that Make is broken for so many things. To begin with you have to have a single target for each file produced. Generating all the targets to get around this is a nightmare that results in unreadable debug messages and horribly unpredictable call paths.
nix tried to solve much of this, but I agree it can't compete with the bazillion other options.
It does not miss it, just ignores it. The author states that there are lots of things we can improve but the point is that we have too many variations on the theme without converging to a solution that has few (or no) dependencies and comes with built-in build knowledge and the ability to discover what you want rather than make you declare it.
Such a tool should be:
- Zero (or few) dependencies. Likely written in plain C (or C++, D, Rust) and compiled to distribute in binary form.
- Cross-platform
- Support any mix of project languages and build tasks.
- Recognizes standard folder hierarchies for popular projects.
- Easy enough to learn. Not overly verbose (looking at you, XML). Similar to Make if possible.
Examples of the auto-discovery: It can find "src", "inc", and "lib" directories then look inside and see .h files then make some educated guesses to build the dependency tree of header and source files (even with mix of C and C++). Or it could see a Rails app and figure out to invoke the right Rake commands, perhaps checking for the presence of an asset pipeline etc. Or a Node.js project. It could check for GIT or SVN and make sure any sub-modules have been checked out.
The dependencies thing is a killer. I remember a Windows developer co-worker insisting that everyone had the .NET runtime installed, and after shipping it turned out that most of our customers didn't have it installed, to which he finally said, "well, I always have it installed." (To be fair, I should have pressed him harder, and I did ask the question twice, but because I'd never built against the runtime I was unprepared for any challenge.)
Almost every new project I download starts with a sad, manual, and demoralizing installation of a bunch of third-party stuff that you have to google to find out what's missing. And it's not educational at all, because in a few years all these tools will now be obsolete.
(The best project I ever encountered was the Stripe CTF, which almost always used just one command to install a complete working copy of everything you needed and didn't have. I'm still impressed with that.)
Some of these requirements should be built into any build tool. However, most can be added easily enough:
For instance, redux [https://github.com/gyepisam/redux] is written in Go (not compiled for binary distribution, but I could add that), is cross platform, supports any mix of languages and tasks, is very easy to learn.
It uses shell scripts to create targets so everything is scriptable.
Stuff like recognizing standard folder hierarchies and auto-discovery can be added with small scripts or tools.
It can be as simple as you want or as complex as you need.
> To begin with you have to have a single target for each file produced.
Try this next time (only the pertinent lines are included):
SOURCES=$(wildcard $(SRCDIR)/*.erl)
OBJECTS=$(addprefix $(OBJDIR)/, $(notdir $(SOURCES:.erl=.beam)))
DEPS = $(addprefix $(DEPDIR)/, $(notdir $(SOURCES:.erl=.Pbeam))) $(addprefix $(DEPDIR)/, $(notdir $(TEMPLATES:.dtl=.Pbeam)))
-include $(DEPS)
# define a suffix rule for .erl -> .beam
$(OBJDIR)/%.beam: $(SRCDIR)/%.erl | $(OBJDIR)
$(ERLC) $(ERLCFLAGS) -o $(OBJDIR) $<
#see this: http://www.gnu.org/software/make/manual/html_node/Pattern-Match.html
$(DEPDIR)/%.Pbeam: $(SRCDIR)/%.erl | $(DEPDIR)
$(ERLC) -MF $@ -MT $(OBJDIR)/$*.beam $(ERLCFLAGS) $<
#the | pipe operator, defining an order only prerequisite. Meaning
#that the $(OBJDIR) target should be existent (instead of more recent)
#in order to build the current target
$(OBJECTS): | $(OBJDIR)
$(OBJDIR):
test -d $(OBJDIR) || mkdir $(OBJDIR)
$(DEPDIR):
test -d $(DEPDIR) || mkdir $(DEPDIR)
I've been using a makefile about 40 lines long and I've never needed to update the makefile as i've added source files. Same makefile (with minor tweaks) works across Erlang, C++, ErlyDTL and other compile-time templates and what have you. Also does automagic dependencies very nicely.
> Generating all the targets to get around this is a nightmare that results in unreadable debug messages and horribly unpredictable call paths.
If you think of Makefiles as a series of call paths, you're going to have a bad time. It's a dependency graph. You define rules for going from one node to the next and let Make figure out how to walk the graph.
Could you post an example of what you mean by the single target/file limitation? As stated I can't tell how implicit rules or a rule to build an entire directory wouldn't be a solution, but maybe I'm not understanding the problem.
Sure, consider a compiler that produces an (foo.o) object file and an annotation (foo.a). Now if a target requires both foo.o and foo.a you have to create two targets on them (even though its really one command).
You can do implicit rules which requires a very verbose makefile, which is what automake and other make generation tools do. God help you figure out what went wrong.
If you make people go to a directory approach you've now imposed a new structure on their code. One reason for the multitude of packages is each one matches their target community better.
The third rule simulates a compiler producing two outputs. Now if foo.o changes, both "copied" and "o" will be updated, and if foo.a changes, both "copied" and "a" will be updated. (And if either foo.o or foo.a are deleted, the compiler will be rerun, as will everything depending on foo.a or foo.o.)
If both the .o and the .a are created from another file, wouldn't it be safe to just rely on either one of them? (Obviously, you will need to be consistent in choice.)
That is, if every time a .o is created, so is the .a, then where is the difficulty? Just rely on one (the .o). I could conceive of a scenario where the .a updated but the .o didn't, but I don't know of any tools that really work that way right now. I thought the norm was to at least touch all output files.
Further, if that is happening, seems you are safest having two rules, anyway.
Say you have a long build process and do a quick semi-clean by hand to speed up the next buld (not the best idea, but not inconceivable), deleting the .a files, but fogetting to delete the matching .o files. Then, your next build will produce some novel (to you) error messages that may take long to clean up. Worse, the command building on the .o and the .a might just say "OK, I'm given a .o without a .a; fine, then I'll do a slightly different thing"
Also, having two rules means duplicating a command:
Invoking the command twice can also screw up things if you run parallel build, which you should always do! Not only to speed things up it's also a good way to verify that your make file actually is correct. If your make file doesn't work in parallel build it is broken, in the same way as C code that breaks at -O2 and above due to reliance on undefined behavior.
The solution to the multiple target problem is using the built in magic .INTERMEDIATE rule which isn't entirely obvious how it works.
Ok, that makes sense. I'm tempted to rattle the knee jerk, "don't go deleting random crap," but I realize that is a hollow response.
I'm curious how .INTERMEDIATE helps in this case. I did find this link[1], which was a rather fun read down how one might go about solving this, along with all of the inherent problems.
The target baz has both a .l and a .o, both of which are produced in one command. The line that begins with "%.o" starts an implicit rule, which loosely states, in English: "to produce a .o file, or a .l file, run the following ...". $(*F) is a GNUism that maps to the filename of the source (directory part, if any, is stripped). This works. I tested all three targets (foo, bar, baz) with a "make clean" between each one.
(and for the really curious, a09 is a 6809 assembler; disasm.a is a 6809 diassembler, written in 6809; binary is a 2K relocatable library)
Or if you don't like taeric's suggestion you can just touch a .ao file after the line that creates the .a and .o files and have your further rule(s) depend on that .ao file. Have .ao depend on your source. If you still want to be able to type stuff like 'make foo.a' instead of 'make foo.ao' and have it work, then you can make a rule where .a depends on .ao and all the rule does is touch the .a file. Create the same rule for the .o too.