Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: Learn Python, R? Or something else?
27 points by socrates1998 on Feb 24, 2017 | hide | past | favorite | 43 comments
Hi Hacker News, I am doing a statistics project for some American football teams and I think it's going to require me to learn R.

Should I just learn R right away? Or should I learn another programming language first (like Python), then learn R?

I have some limited experience in Web development, doing web design and building website mainly in Word Press. As such, I know a some html, css, and Php.

Just looking to see if I could learn R without knowing much else.



You can learn R without knowing anything else. It's a good statistics package but a dated and quirky programming language.

I'd consider doing the Stanford Statistical Learning class which starts off with teaching you R. https://lagunita.stanford.edu/courses/HumanitiesSciences/Sta...

Also recommend Swirl, which is an interactive tutorial - http://swirlstats.com/students.html

Or you can go straight to Julia, which is a modern statistical package and language without R's quirks http://julialang.org/

If you're eventually going to put your project on the Web, or just want to learn programming the right way, might be better off doing it in python.


What makes R a dated and quirky language?

I know a reasonable amount of R, but I don't really know any other programming language. I did once go looking at Julia, and had thought to write a few simple things using it, but in the end just ran out of time and did it with R.

Learning R has been one of the most challenging and enjoyable things I've done over the last three years. However, I would be interested to know where Python, Go or Julia would make my life better or easier.

I mostly write R scripts to analyse and graphically display interpretations of data. There are so many contributed libraries, it feels like I am really spoiled for choice.

I also write Shiny web apps. Is there a Python equivalent of Shiny? I have heard of Django, but is that really as easy as just writing a Python script and deploying on a server? I always thought Django was more of a complete framework which also required server side coding.

Anyway, I would genuinely like to know what advantage Python or Julia or Go would provide me with over just continuing with R.


Julia claims performance closer to Fortran or C, with the expressiveness of a high-level scripting language. It seems to have enough traction that it's probably the way of the future.

R quirks ... the <- syntax, the ~ formula syntax, looping through a dataframe is horrifically slow when you can't vectorize, multiple legacy object semantics, poor parallel/multiprocessing support, poor support for datasets that don't fit in memory ... upside is any statistical method probably has a decent implementation in R

python - most comprehensive ecosystem and libraries beyond statistics, e.g. Web, numerical and scientific computing, machine learning (Tensorflow), NLP, generally a good language to learn to program in, pretty easy and forgiving while also being reasonably expressive, performant, offering functional as well as object oriented features/styles.

the good folks at plotly are working on a shiny equivalent (dash) but it's not out yet. Django + matplotlib or bokeh or some client-side graphics like plotly.js is potentially powerful but not really as integrated as shiny.


Julia's biggest hurdle is the lack of well functioning DataFrames (or the current fork, DataTables). Tons of issues around nullable arrays, etc. have really slowed progress. I do think it's got a ton of upside, but I've found that reimplementing my R or Python scripts in Julia to be too much of a hassle. Costs of reimplemention greatly outweigh the not insignificant gains in speed.

Also check out this article on updates to R 3.4. R tends to be fast enough for most work (I use it regularly on one-off analysis or things that won't ever make it farther than ad-hoc reporting/findings but can't imagine using it in production systems). The listed changes should go a long way towards making R just fast(er) enough for dealing with larger datasets (doesn't help with datasets larger than memory though). For large datasets all the momentum seems to be moving towards Spark (sparklyr is RStudio's SparkR integration. Very much a beta but getting better by the day). On the Python front Dask is awesome for out of memory computation that has no equivalent in R.


For large datasets all the momentum seems to be moving towards Spark (sparklyr is RStudio's SparkR integration.

Worst case, you can always use MPI with R and run on a Beowulf cluster. Of course that might not help if you want to use a function from a library, and the library itself expects everything to be in memory on one node, but at least it gives you another option for parallelization.


Absolutely, though as you mention, removing the ability to use packages and the necessity of writing statistical code that properly accounts for data being spread out across multiple nodes would likely be out of the reach of your everyday/typical R user. An open sourced alternative to Revolution R/Microsoft R Server's out of core processing backend + distributed analtyics packages would be a huge addition to the R language.


Realized I never posted the link about R 3.4 that I referenced. https://cdn.ampproject.org/c/s/www.r-bloggers.com/performanc...


How have I not heard of Dask? Would have saved me a lot of pain when trying to deal with oom datasets in my Luigi pipeline.


I guess I never found the <- syntax odd, because that's all I know. Not using = for assignment kind of makes sense to me, as equals has mathematical connotations.

I always thought there were plenty of options for parallel and high(er) performance computing with R.[1]

Data sets which don't entirely fit in memory can be stored pretty much anywhere and queried. Simplest would be an SQLite database. Formats like Feather[2] and fst[3] are also really useful. There are multiple R interfaces for Redis.[4]

R has a library to interact with the Tensorflow API[5] and many other machine learning services.

"Multiple legacy object semantics" doesn't really mean a lot to me. Knowing nothing else, that's just the way R is, plus I don't think there are that many really strange ways of doing things. Mostly there's a discernable pattern across functions.

Sounding like a fanboy sure, but as I said I just don't know anything else and while Julia sounds nice (speed, way of the future etc) it doesn't have the breadth of library support, stability or community that R does yet and I don't see a compelling reason for me personally to spend time learning Python.

Not saying R is best to the exclusion of all else, but for data analysis, generalised scripting and some web based reports/reactive apps, I can't find a compelling reason to switch, and I guess don't really understand the criticism of the language.

[1] https://cran.r-project.org/web/views/HighPerformanceComputin...

[2] https://github.com/wesm/feather

[3] https://github.com/fstpackage/fst

[4] https://github.com/antirez/redis-doc/pull/798/files

[5] https://github.com/rstudio/tensorflow/blob/master/README.md


>> that's all I know >> I can't find a compelling reason to switch

See The Blub paradox: http://www.paulgraham.com/avg.html

You should learn a few languages even if you don't need them, just for the mind expansion and learning what is necessary and what is your culture's conventions. It's like travelling to a different country, beware the patriot who has never left his home country!


If you like it and it does what you need, that's great! If/when you learn python, you'll get it. Python is expressive, readable, structured in a way R isn't...Shiny is all right but no one in their right mind would build a large general purpose web app in R.


I mean, there's about 10 slightly-different versions of `map` (whither `zip`, though?). There's several different mutually-incompatible object systems. It doesn't come with a proper dictionary or set. Multidimensional arrays (well, lists) are treated specially for some reason instead of just being a property that arises from the concept of an indexed container. It's definitely a weird language.


If you definitely need to use R, I'd say just learn R. R is different enough from most other languages that I don't think you'd get a lot of value from, say, learning Python first.

Why do I say that? First of all, R's syntax is quirky and different enough from (Python|Java|C|Ruby|etc.) that you might almost find it harder to learn R if you're already used to something else. Second, aside from the syntax the biggest thing to get used to with R is that it's very much vector oriented. Basically you're always working with vectors, even when you only have what you would otherwise think of as a single scalar value. You just put in a vector of length 1. Anyway, that whole paradigm is different enough from other programming languages, that you might as well just learn it that way from the beginning.

Now to be fair, there are libraries and things that let other languages act and feel a bit more like R, but I'm intentionally not considering those right now, as that would just be one more complication to deal with. And if you are locked in on using R for whatever reason, there's no need to complicate life.

The only other question I would have, is whether or not you absolutely must use R at all. If you have the option to choose your language, you can do pretty much anything that you can do in R, using Python, or Octave, or probably many other languages. If that's an option, then you just need to decide which would be easier / more useful for you. And while I won't take sides in general, I will say that Python may be a little bit easier to learn in general, but then you're back to using external libraries for more of the statistical / numerical stuff.

Just looking to see if I could learn R without knowing much else.

My guess is that you can. R has some quirks, but there's nothing especially scary about it. Depending on how much you already know about statistics, you may find that learning and understanding the math is more difficult than learning to use R.


What dataset are you using? I would be interested in checking it out.

If you don't know R or Python, I would say that learning Python might work out better for you. Python is a general purpose programming language, whereas R is really good at stats and visualization. Python is also pretty good at this, you can use pandas, matplotlib, and scikit-learn.


It's a string of qualitative data that I want to interpret, like the team runs the ball Right, then Right, then Left.

My goal is to be able to analyze the next play and possibly give a probability attached to where the coach will call the play.

But that's just the first step. I would probably need to do a lot of different stuff with the data.


Is it possible for me to get the data? I can trade you 11 years of NFL point spreads.



Got it. Can you send me an email? I would like to work with you on this. My email is in my profile.


Sorry, I can't see your email in your profile, it might be because I don't have enough karma on HN.

My email is mtgprivatelearning@gmail.com


Just sent you a link. It's pretty raw, it needs some filtering before I do anything to it.


Learn R first, it's ideal for your project and not difficult.


R should not be ignored if you are doing statistics. It has first class support for stats built in plus there are so many packages available to do more weight lifting.


If you're just analyzing the data R seems to be the right choice. If you're going to need to gather and parse the data then Python is a much better general purpose language.


I've been asking myself the same question. I often hear of Go as a better Python, so I was hoping to find a good numerical / statistical story for Go ...

Bruce Eckel had a blog about this, I think you'll find the discussion around R / Python / Mathematica interesting

http://bruceeckel.github.io/2015/02/15/why-not-go-there/


I started and found Python easier then R. Python is a lot more 'english readable' while R is more like the code you see on Hollywood screens, somewhat indecipherable with magic incantations.

As a starter, you probably need something like dataquest[0] or udacitys[1] data courses.

[0] https://www.dataquest.io [1] www.udacity.com


And if Python: 2 or 3?

(I'm kinda decided and want to do a bit of statistcics, but also have one general purpose language I know a bit better)


R is good for statistics and nothing else. It is, however, really good at statistics.

Python in more general purpose. If you want to do things that are not statistics in the future, python makes more sense.

If you're going to get started with python, get started with python 3.


One thing to consider is that most data projects require a fair bit of manipulation or munging, and Python is particularly good at that kind of stuff.


Definitely 3, at this point. Most libraries are Python 3 ready, and it doesn't make sense to start with 2: http://py3readiness.org/.


As other said, R is the way to go if you decide to work solely with statistics. My opinion towards R is that it needs some improvements, sometimes it feels like a language made for prototyping only. But that is maybe its goal, so in the end it could be a good thing to match academic needs.


For me the best part of R is documentation.

Using Python for data will help you learn something that can be used for more... but be warned that Dataframes on stuff like Spark don't work with list comprehensions (my favorite Python feature).


If this is a one-off and you don't expect to need to do more of this kind of thing, I'd say learn Python, otherwise R.

Fortran is always fun though. :)


Yes. Learn R. Download RStudio. If you do statistics nothing beats it.


Rodeo is an equivalent IDE for Python - https://www.yhat.com/products/rodeo ... also Jupyter Lab (notebooks) ... https://github.com/jupyterlab/jupyterlab


Worth mentioning that Rodeo is still unstable and definitely not a feature for feature equivalent for RStudio (I also worry about speed as the size of the project grows as it's based on the same backend as the Atom IDE which has been dogged by speed complaints almost from day one). As for Jupyter Lab, the readme itself says that it's not yet ready for general use. Currently there is no true Python equivalent to RStudio (unfortunately).


Spyder is reasonably close, unless I'm missing some major feature present in RStudio.


Can't speak for Spyder, but one killer feature of RStudio is the ability to easily make C++ modules that your R project uses (RCpp rocks - you can ignore most of the annoying things about C++). They compile and link with zero effort. Also the ability to easily install modules. And browse documentation. And create projects in R or RMarkdown. Create modules, basic scripts, web apps. And publish things. And see your data in a spreadsheet format. And inspect all your objects/data/functions. All in one place.

Aside from RStudio, I love R's lispy/apl-ish semantics, and easy integration with Fortran/C++ (and a host of other languages, including Java, Haskell, Ruby, Prolog, Lisp, etc...).

I'm sure Python is great, but R is made for stats, is amazing, and RStudio beats every IDE I've ever used, regardless of language.


Thanks for mentioning it. I almost always forget about Spyder. I'm not sure if that says more about me or Spyder.


You can't preemptively criticise Rodeo for using electron whilst praising RStudio.


I don't believe I praised RStudio at any point in that commment and concern about a very realistic potential hurdle that Rodeo may face as it's codebase grows and matures =/= criticism. I have no issues with the speed of the current implementation (but do have a problem with its lack of features and numerous bugs). It's not a bad IDE, just nowhere near as mature as RStudio.


I wanted to get into Python for this reason. The RStudio IDE is absolutely brilliant. When I tried to run Rodeo, it just kept crashing. Not very fun.


So true. I thought I was the only one ;-(


R is a bad programming language. Learn R.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: