I really enjoy using Python for data analysis. I wanted to write a post that covered just some of the reasons why I enjoy using Python when trying to convey a story over other great tools and languages.
Cleaning Data With Python
When it comes to tools that you are using to visualize data with such as Tableau, they are really good at connecting to data, but not exactly great at cleaning data. Python allows you to load data from multiple formats, clean it and get it ready for use. You’re able to create a number of data sets in Python that can plug into basically anything within one environment. This is key because having to go outside an environment could mean additional tools and resources to complete your objectives.
Data Modeling With Python
Similar to cleaning data, you can also easily model data with Python. You have Python packages that are available to you like Pandas, which allow you to clean and transform your data into data frames and series.
Pandas is extremely powerful when slicing and dicing up data on the fly. You can take one data set and turn it into dozens of different data sets that all feed into individual visualizations or all merged into one.
The one really cool benefit is that you are able to do SQL type processing with Pandas. They have a great SQL comparison guide that shows you the common SQL query task and how it’s accomplished in Python with Pandas.
If you think about that, this allows you to become both the SQL Developer and the Data Analysis in one. You don’t have to depend on that SQL Developer like you used to when you can slice and dice the data yourself directly in Python with Pandas.
Visualization with Python
When you actually have data that is cleaned and modeled, you can also use Python to visualize your final results. You have a number of wonderful packages that you can tie into like Matplotlib, ggplot, Seaborn, Bokeh, Plotly, and more.
Looking at the imgage (gif) above, you can see just how powerful and even interactive Python can be with Plotly. Python allows you to quickly tab into those powerful visualization packages to quickly data on screen or in your web browser. Python also allows you to tap into a tremendous amount of customization that sometimes is hard to find in other visualization tools. You can create some really powerful visualizations that really help tell the story while using Python for data analysis.
Explore Data With Python
I’ve read some comments in the past that stated that Python is not really aimed for exploring data. That using tools like Tableau are better suited for such tasks. Well, I have to disagree with those statements. Exploring data with Python is very easy to do.
For example, I can load up The Jupyter Notebook, import some packages like Pandas, Matplotlib, and Seaborn, and then start exploring data in real-time in my web browser. I know that may sound like a lot, but it’s really a few lines of code.
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('../Some_Dataset_Raw.csv', names=header)
sns.factorplot('SomeColumn', data=df, kind='count')
What I love about being able to explore data with Python is the fact I can run quick information functions on the data. This is truly what I mean by using Python for data analysis. For example, I can quickly call something like df.info() for Pandas.DataFrame.Info in the code above to quickly see all the fields and data types of the object. I can also zoom in and examine each field in that object to understand what exactly I’m analyzing.
The Jupyter Notebook
I can’t begin to tell you how awesome this tool is for on the fly analyzation of data with Python. It’s even got amazing usability for presentations of your final analysis and can be shared in various formats including my favorite, HTML. The best part is, it’s step-by-step on everything you did from start to finish.
You can walk everyone in the room on how you loaded your data, cleaned it, transformed it, ran tests against it, and eventually visualized it. You can show your entire methodology around your story that leaves very little to question. That’s because you’re basically showing your hand where in other tools, all you’re seeing is the end result.
I really am just scratching the surface with Using Python for Data Analysis. There are so many great benefits in using Python for your next data project. The language, the packages available, and the tools that support it are growing every day. The best part about Python is that it’s mostly all free and the language is not hard to learn. It’s great for beginners who are either looking to become developers or those looking to become data scientist.
So, what are you waiting for? Go, explore, and tell meaningful and immersive stories with your data!