Skip to content

Instantly share code, notes, and snippets.

@lauracodecreations
Last active May 1, 2019 04:36
Show Gist options
  • Save lauracodecreations/f88dab1e5ab4293266220ad439cc15d7 to your computer and use it in GitHub Desktop.
Save lauracodecreations/f88dab1e5ab4293266220ad439cc15d7 to your computer and use it in GitHub Desktop.

Creating Histograms with Normal Distribution Line using Python

Problem

You need to know if the data is normally distributed before doing any analysis on it.

Solution

Create histograms for every column and compare it to a normal distribution or bell curve. My code outputs this bell curve on the graph.

Method: The code assumes that you have already uploaded the data in Azure ML and imported into Python. More information about how to do this can be found in my tutorial about Descriptive Statistics per Column in Azure ML

It wil output one boxplot, and a histogram with a bell curve

To run:

get_ipython().run_line_magic('matplotlib', 'inline')
plotstats(<datasetname>, '<columnname>')

Example

get_ipython().run_line_magic('matplotlib', 'inline')
plotstats(frame, 'PTrustb')

Note: "frame' is the name of the dataset, and the 'Ptrust' is the name of column.

import numpy as np
import matplotlib.mlab as mlab
import matplotlib.pyplot as plt
import scipy.stats as stats
from scipy.stats import norm
import pylab as pl
def plotstats(df, col):
## Setup for ploting two charts one over the other
fig, ax = plt.subplots(2, 1, sharex=True, figsize = (12,8))
## Print the first chart: a box plot ax[0]
df.dropna().boxplot(col, ax = ax[0], vert=False,
return_type='dict')
## Print the second chart: a histogram
temp = df[col].as_matrix()
mu, std = norm.fit(temp)
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax+1, 100)
p = norm.pdf(x, mu, std)
#print the normal distribution line
plt.plot(x, p, 'k', linewidth=2)
#print the histogram
ax[1].hist(temp, bins = [0, 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5],normed=True)
plt.ylabel('Frequency')
plt.xlabel(col)
return [col]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment