Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The big difference when comparing bioinformatics systems with non are what the typical payload of a DAG node is and what optimizations that indicates. Most other domains don’t have DAG nodes that assume the payload is a crappy command line call and expecting inputs/outputs to magically be in specific places on a POSIX file system.

You can do this on other systems but it’s nice to have the headache abstracted away for you.

The other major difference is assumption of lifecycle. In most biz domains you don’t have researchers iterating on these things the way you do in bioinf. The newer ML/DS systems do solve this problem than say Aorflow



I for one have started to appreciate the fact that the shell/commandline interface means:

- We have an interface that very strongly imposes composability, that is rarely seen in other parts of IT, and making people actually "follow the rules" :D

- Data is (mostly) treated as immutable, except perhaps inside tools

- Data is cached

- The cli boundaries means that at least one can inspect inputs/outputs as a way to debug.

- Etc...

Personally, the biggest frustration is all the inconsistencies in how people design the commandline interfaces. Primarily that output filenames are so often created based on non-obvious and sometimes arbitrary rules, rather than being specified by the user. If all filenames were specified (or at least possible to specify) via the CLI, pipeline managers would have such an enormously easier time.

What happens now is that you basically need a mechanism like Nextflow has, where all commands are executed in a temp directory, and the pipeline tool just globs up all the generated files afterwards. This works, but opens a lot of possibilities for mistakes in how files are tracked (might be routed to the wrong downstream output, if you do something funny with the naming, such that two output path patterns overlap).


nextflow can't even get this right- base nextflow uses some combination of `--paramName` and `--param-name` and treats them as interchangeable, while nf-core encourages `--param_name` (but nextflow sees that as different). All trivial differences but just layers on the CLI frustration train.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: