Hacker Newsnew | past | comments | ask | show | jobs | submit | chaosfox's commentslogin

I have been working on bioinformatics for many years now, and I see this question often asked by beginners, people that insist in using just a single language are crippling themselves and their work. You can and should learn both and more, programming languages are just tools, the more tools you have the better prepared you will be for solving any problems that arise.


It’s not a really good article in that way. There should be a decision tree per application that guides you to the language choice.

What are the R packages and what is available in Python? Are the Python or R packages just calling a C library after all, then does memory or single threading matter?

What about Julia, can you better express your thoughts and equations in Julia and that’s most important for your project?

Do you want to work in notebooks or build ‘production’ code?

Do you need to put your work on the web?


Agreed— But also both are really terrible languages, so I’d encourage people to broaden their horizons even more fearlessly.


Python is not a terrible language. It's widely used in reputable organizations by highly informed and capable software engineers to great effect. Generic language criticism in 2024 is an anti-pattern that intellectually hinders new engineers. There are certainly preferences for different use cases and highlighting those differences are productive discussion.

For Example: One of the most common technical concern points to bring to a new engineer is the Global Interpreter Lock (GIL) that restricts Python to execute in one thread only.


> Generic language criticism in 2024

Research on PL design is alive and well in 2024, and Python is nowhere near the cutting edge. I never said it didn't get the job done, or that people shouldn't learn or use it.

It's just not as elegant or simple of a PL design as people sometimes seem to think it is. It's hard to look at something like Clojure or Haskell next to Python and to come to the conclusion that python is particularly elegant.

I think my main point got lost on people who took offense to my dislike for python: my main point is that you should also learn languages that are very different / that are in an altogether different branch.


Are you getting paid for broadening your horizons or for getting sh*t done?

Stroustrup was spot on about two types of programming languages.


Grand-parent: programming languages are tools. The more tools you can wield, the better you can choose the right one for the job.

Parent: yeah, but R and python are shit tools; learn how to use better tools too.

You: are you paid to learn how to use better tools, or are you paid to solve problems with shit tools?


What you are really doing:

You: If these tools are not ideal, what is a better tool?

Parent: I won't tell you. I just simply criticize online and provide no technical guidance, instead I gatekeep my definition of quality tooling. I think being a good senior expert does not include showing best practices I have developed through my years of experience.

You: Oh. Then I don't want to listen to you because I'm more interested in making personal progress on my professional journey than listening to uninitiated opinions provided without context.


> You: If these tools are not ideal, what is a better tool?

I am sure there's the word rust in there somewhere ;-) After all, HN has been raving about polars.


Languages aren't just good or bad in the abstract. It's about problem-tool fit. The author mentions a specific field and evaluates the fit of R and Python. Do you believe there is a language with better fit to Bioinformatics? I'd like to hear your recommendation.


> Languages aren't just good or bad in the abstract.

I don't know about that. There's certainly no "perfect" language, but to claim that python is brilliant or elegant language design is to not know a whole log about PL design.

I think I'm mainly getting downvoted because Python is popular, and people think that means it must be inherently good.


I have been using R almost daily for years now and its no hyperbole when I say that if ggplot2/dplyr wasn't a thing I would have never bothered with R.


Adding to that, it is what makes me miss R when using other languages. It just isn't very much fun to me to do discovery and building in python, but ultimately where things get operationalized.


I completely agree with you. I think about what I want and R code comes out of my fingers. By comparison, pandas requires more thought to do intermediate-to-advanced things.


the silver lining, I hope, is that this will highlight how important independent verification and reproduction of results is in academia, people "know" that but funding for it is still scarce as its always more exciting to try to find something new than to validate a known result.


Replication is far from trivial. It's still worth pursuing, but it's easy to overlook how challenging it can be to successfully execute scientific procedures.

Labs generally specialize, as there's a long learning curve to climb before you can reliably execute even 'bedrock' molecular biology protocols like immunoassays. Small ambiguities in protocol can lead to failure, and there's always simple human error involved that can tank a result. Generally you'll have positive controls available to tell you whether a protocol was successfully executed, but there are cases where that's simply not practical.

In the end, 'failure to replicate' does not necessarily mean there was anything wrong with the original work. Positively concluding that requires a lot of additional work that could explain the discrepancy.


But OTOH, what even is the scientific value of the original paper if they cannot provide a clear protocol with a substantial chance of replication?

“Here, I did this magic trick, but I’m unable to tell you sufficient detail for how it works!”


While independent verification and reproduction is hard, I wonder if there is any requirement for researchers to at least publish their data set for statistical analysis and further research.

Also, I found it interesting that even though computer science research are usually easier to reproduce, a lot of journals and conferences do not mandate artifact evaluation, this is just considered nice to have for submission. If we can have mandatory artifact evaluation, even something not reusable and can just repeat the experiment in the paper, it will be much easier to verify the claims in the papers and compare different approaches.


> I wonder if there is any requirement for researchers to at least publish their data set for statistical analysis and further research.

Not generally, though the tide is slowly turning in the right direction. Unfortunately many laws/policies pushing for openness and transparency in research are sidestepped with the classic "data available upon request," a.k.a. "I promise I'll share the Excel files if you email me" (they will not).


I don't understand why can they use this as an excuse. If they can share the data upon request, why can't they just publish that as well? Is that related to some legal/privacy issue?


> Is that related to some legal/privacy issue?

Possibly in some medical or social science fields, I don't know. I know there is not such an issue in chemistry and materials science. There also may be some complications for collaborations with industry, but that's kinda a different situation. For people whose career development is not strongly tied to reproducibility of their work (a.k.a. everybody) it's just another step in the overly complex process of publishing in for-profit journals. Funding agencies generally aren't going to punish people for using this excuse and the watchdogs/groups concerned with reproducibility have no teeth.

Not an excuse, but journals don't make it easy to share files, as hard as that is to believe. Some will only take PDFs for supplemental information and many have garbage UIs, stupidly small file size limits, etc. Just uploading to a repo (or tagged release) on GitHub is common these days because there is much less friction.


No one at any point in the funding cycle benefits from asking questions.

In fact, for most of the people in a position to ask the kinds of questions that need to be asked, they risk their entire career when they do so.

The entire industry has issues.


Independent verification is why scientific fraud is so dangerous.

On the one hand you'll eventually get caught, but only after potentially millions has been spent on said catching.


Isn't that exactly what has NOT happened here? Some combination of:

* People not checking

* People checking and joining the fraud

* People checking, not joining the fraud and then getting their work suppressed.


python isn't premature. python is more than 30 years old now, python 3 was released more than 10 years go.


It's been at least 5 years since I read an angry post about the 2 to 3 version change, so I guess it's finally been accepted by the community.


that heatmap is just asking to be clustered https://imgur.com/a/lxn4ApA


I love xsv, but its been 3 years since last release and the PRs are accumulating.. miller seems much more active.


last time I tried ssh does not delete the socket file when the connection closes and complains the file already exists on a new connection.


> but there are many artists I can no longer buy

this is the most frustrating part of the whole ordeal, many indie artists had their digital albums on GPM and youtube music is not a replacement because you can't buy albums on it.

on the other hand bandcamp is awesome, I wish more artists would use it.


unfortunately having a competitive card is only half the battle, they also need all the deep learning libraries to support using this card otherwise nobody is going to bother. I hope AMD understands that I enlists their own engineers to help the community make this card support solid.


ROCm (https://rocmdocs.amd.com/en/latest/) is their compute framework/stack. Not as good as CUDA but has support for Tensorflow etc


The problem is that ROCm is only for linux which is still a huge downside, and it doesn't have good support (or support at all) for the consumer grade GPUs, pretty much Polaris onwards is good luck, heck even Radeon VII isn't well supported.

CUDA works because any NVIDIA GPU will run CUDA this means it's easier to learn, easier to prototype and easier to ship and the code you ship isn't limited to the datacenter.

What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".


I would be interested in knowing how much compute happens away from Linux. My impression is that almost nobody uses windows for theses tasks, but anecdotes are not data. There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS), but I am not privy to the breakdown of compute demand per segment.

> What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".

This is a great point. NVDA has worked on CUDA for years and has a great ecosystem of material and questions on places like stackexchange. AMD will have to work very purposefully to close the gap, but it seems like they are aware and headed in the right direction.


The software side of AMD has been (in preceding years) a disaster. I say that as someone interested in their products. Who would love to see a realistic competitor to CUDA.

AMDs linux support has been somewhere between no-assed to half-assed for the preceding decade. I think they believed that the world will return to windows for everything. That ship has sailed, long ago. The point about CUDA being usable up/down the HW stack is quite salient. When I develop GPU things, I start on my laptop GTX1060. Test on my deskside RTX2060, and run them on V100s. Code is in Julia, C, Fortran, so it should work anywhere with good underlying library support. I've got a zen laptop with integrated Radeon. No dice, can't do computing on it (yet).

AMDs function/library support is nascent, and will take years to get to a viable point for many.

I am hoping ... hoping ... that AMD sees this as an opportunity long term, and not a short term expense that must provide immediate ROI. SW ecosystems drive the HW purchases, but there is usually a lag of years before this engine really gets started.

AMD needs to be in this for the long haul.


Your impression is wrong, there is a metric ton of enterprise and consumer software that uses CUDA and runs only on windows.

There are also whole "data sciences" divisions in bluechip companies that are running windows.

Case in point i work for a huge financial company we have CUDA powered excel add-ins/macros...

And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

And engineering, sciences and medical consumer applications are also quite often than not Windows only or Windows first.

Then you have all the less-enterprisy stuff video and photo editing, filters, chess programs w/e...

And lastly the biggest point is that Windows and consumer grade hardware is where most developers and students live, good luck running ROCm on your laptop, and no I really mean it it's officially not supported and in reality even if you manage to get a moderately compatible chip you'll encounter more bugs than on Klendathu.

Don't underestimate the importance of software that runs everywhere and just works. Node.JS didn't became popular because JavaScript on the backend was something that was desperately needed, it became popular because you had a plethora of front-end developers that had little to no knowledge of server-side languages and frameworks.

Unlike what HN and recruiters would like you to believe most developers can't learn 10 languages and frameworks, and definitely not well sure some can but the vast majority of developers don't spend 9 hours working and 9 hours hacking, for every dev with a github account that needs it's own storage rack there are 10,000 that just do 9 to 5 and check out.

If on one hand you have a solution that forces you to pick from a narrow list of linux kernels and supported distros and an extremely narrow list of GPUs and still encounter bugs on every corner so you can maybe produce something that if it runs would only run on the same system as yours vs on the other hand a solution that would run on any OS that supports an NVIDIA GPU you'll pick the latter unless you are really really bored.

And that is before you entertain the marketability and job prospects of learning CUDA vs ROCm, one allows you to get a job at any place that ships something that runs on a GPU it doesn't matter if it's something that occupies 1000 racks and might become sentient or something that filters excel spreadsheets faster the other one doesn't.


> And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

Thank you for sharing the link and correcting my information bias. It sounds like the "workstation," compute world is a forest of deep niches.

You make a lot of good points about the staying power of Windows. I am excited about all the moves towards a complete Linux desktop, but am not imagining that it will be mainstream.


I’m not sure if it’s a forest of deep niches, at this point I would say that the niche is the 7 figure server racks with A100’s outisde of the cloud providers...

There are still more use cases for GPU compute on the edge than in the datacenter and that likely won’t change.

And for Linux on the enterprise desktop well then ROCm can’t run in WSL2, CUDA can so yet another reason to bloody support Windows...

Because WSL2 is ironically probably the way forward for Linux on the desktop for the majority of the computerized workforce.


> There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS)

Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

https://www.ansys.com/solutions/solutions-by-role/it-profess...

Although only on nVidia hardware if I'm interpreting it right


> Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

With ANSYS in particular the question is who is doing the simulation. If the engineer is doing it, many engineering tools are windows only (although this has been improving) so it makes sense to run ANSYS under windows as well so you can be close to your modelling software. If the stress or EM guys are separate from the designers, then it shouldn't matter as much.

I would love to see all productivity tools move to Linux and things have been getting a lot better over the years. Personally I'm excited around the noise that Microsoft was exploring office for Linux, as office is the only reason I ever boot into windows. What a godsend it would be to be able to program and run all my productivity software at the same time.


Yes, AMD’s fuckup in compute isn’t just ROCm but also their OpenCL support on Linux.

Windows still gets semi-decent support especially for their WS cards but Linux oh-boy...


Do not believe it. They wrote it for Windows and ported it badly.

Source: Struggled for months to get ANSYS to work even crappily on three different distros of Linux, two of which were clean installs of "officially supported" distros.

Eventually gave up, bought Windows, and installed it on that. Worked immediately.


That’s kind of the make-or-break plan for Frontier and El Capitan [1]. They’re having all the science folks try using ROCm and the HIP recompiler thing. We’ll see how that shakes out in practice.

[1] https://www.anandtech.com/show/15581/el-capitan-supercompute...


LLNL writes their own stack usually, I don't see the main API for El Capitan being anything but OpenMP and LLNL can and has written their own compilers and libraries for other GPU powered supercomputers.


Sure, but other folks have to make use of it, too. Not everyone's code will be abstracted from CUDA. They're trying to get folks on Summit to test out HIP more strongly. Repeating my comment from last summer [1] that linked to the "try to use HIP" [2]:

> The OLCF plans to make HIP available on Summit so that users can begin using it prior to its availability on Frontier. HIP is a C++ runtime API that allows developers to write portable code to run on AMD and NVIDIA GPUs. It is essentially a wrapper that uses the underlying CUDA or ROCm platform that is installed on a system. The API is very similar to CUDA so transitioning existing codes from CUDA to HIP should be fairly straightforward in most cases. In addition, HIP provides porting tools which can be used to help port CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

[1] https://news.ycombinator.com/item?id=20495637

[2] https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontie...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: