Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Ask HN: What are the best resources to learn Python for Data Analysis
28 points by jbmorgado on Oct 23, 2016 | hide | past | favorite | 15 comments
There are a great number of posts in HN about resources to learn R for Data Analysis, but - while considered also an excellent language for the task - very few about Python.

What are the best online courses, books, blogs for learning Data Analysis in Python?




There have been some good recommendations here for machine learning and data science, but I'm not sure that's what the OP is looking for here (or maybe it is, certainly no harm in posting them).

But yeah, if what you want to know how to do is query, organize, filter, trim, format, reformat, munge, and finagle data, you're probably looking for something more like the oreilly book hatmatrix mentioned above.

If not the book, I'd recommend just going through pandas as much as possible. Nothing wrong with just going through the online docs.

Oh, one more thing, I'm personally a huge fan of pandasql as well. It's a nice library that allows you to query a panda data frame as if it were a sql table (joins work with other data frames). Pretty much whatever is available in sqlite will be available through pandasql.

There have been a few spats on the interwebs about whether it's better to do things in sql vs data frame operations. Personally, I use both - I do find some things are far easier to do with a query, and then transition over to pandas and bumpy when I get to programming/mathy things.

Lastly - if you do want to do data science/ML stuff, I'd recommend going over to scikit-learn and just going through all the examples, trying things out on your own datasets.


OP here. You are right, my interest is actually in the Data Analysis part and what are the most up to date tools and best practices for using those tools. This is because I already have a good understanding of the scientific part (i.e. statistical analysis) from the academy and from my job as a researcher.

Still, I find that a good part of the Data Science resources, starts by giving a good introduction about the Analysis part, so they are also important answers for this question.

Thank you all.



I had actually found that course and it seemed quite interesting. I know that you are it's creator, but it would be nice to have some 3rd party giving some input about what he liked and didn't like about it.


Data Science from Scratch http://amzn.to/2dD9Iba

Python for Data Analysis http://amzn.to/2dDw6fL

Web Scraping with Python: Collecting Data from the Modern Web http://amzn.to/2eov4dZ

Python Machine Learning http://amzn.to/2eobdt3

http://sebastianraschka.com/books.html



http://sebastianraschka.com/ has some great articles for machine learning in Python.


Not necessarily a learning resource, but I'd like to plug the Anaconda distribution of Python. https://www.continuum.io/ It includes most of the commonly-used libraries/packages in data analytics, so at ASU I made all my students download it just to start from. Three things it gives you right out of the box are iPython (a better Python shell), Spyder (the Python version of RStudio), and Jupyter Notebooks.

For learning, I'd recommend taking something like Janert's "Data Analysis with Open Source Tools" and go through chapter-by-chapter trying to figure out how to implement the various analyses in Python. That book in particular uses a different technology every chapter for its tutorial exercises, so ignore those. But the exposition of the concepts is fantastic.


Download Anaconda:

https://www.continuum.io

Take an icy plunge right into the "Titanic: Machine Learning from Disaster" dataset ;)

https://www.kaggle.com/c/titanic


I tried to read books or even take a night course but nothing did the trick except: a real side project that required me to learn the proper tools and techniques. Turned into a full time job that I am really enjoying.

Now I am able to read the books because I desire the knowledge to further my passion.


I also follow that approach (more out of eagerness to do something than from a didactic point of view), but the problem with it, is that you end up bypassing the best practices in the area.


I've found General Assembly's data science course to be pretty good at getting you up and running: https://github.com/justmarkham/DAT8


Scipy videos posted from the Scipy conference(using Python for mathematical computations and data mining) are on Youtube. An excellent resource.


the internet is a pretty good resource




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: