Skip to content

Instantly share code, notes, and snippets.

@che0
Last active February 8, 2019 14:42
Show Gist options
  • Save che0/d47495f7b817c5543e4d97385d944ad6 to your computer and use it in GitHub Desktop.
Save che0/d47495f7b817c5543e4d97385d944ad6 to your computer and use it in GitHub Desktop.
cheat sheet for pandas, so I don't have to read the docs all the time

Pandas cheat sheet

  • dataframe from records: pd.DataFrame.from_records([['A', 4], ['B', 2]], columns=['g', 'legs'], index=['g'])
  • (un)pickle: orders.to_pickle('saved.pkl'); pd.read_pickle('saved.pkl')
  • filter data frame per (part of) index: frame.loc[580]
  • timestamp: pd.Timestamp('2017-05-01')
  • random sample: frame.sample(n=20, random_state=42)
  • columns list: list(frame.columns)
  • rename columns: frame.rename({'from': 'to'})
  • drop columns: frame.drop(columns=['colname'])
  • frame of bools: frame['col'] > 5
  • filter: frame[frame_of_bools & more_bools]
  • remove rows with null/NaN values: frame.dropna() (one null is enough to trigger the filter)
  • aggregation: frame.groupby('colname').agg(['count', 'min', 'max'])
  • merge on index: frame.merge(another_frame, left_index=True, right_index=True)
  • merge on index via join: frame.join(another_frame, how='inner')
  • index changes: frame.set_index(['col1', 'col2']), frame.reset_index()
  • only rows where index is in given set: frame.filter(items=[458, 580], axis='index')
  • regex replace in string series series.str.replace(r'foo(.*)', r'bar\1', case=False)
  • per-row mapping:
    • frame.apply(lambda row: pd.Series([row[0] + 1], index=[row.index[0]]), axis='columns')
    • frame.apply(lambda row: [row[0] + row[1]], axis='columns', result_type='expand')
  • mapping in a series: series.map(lambda x: x + 5)
  • don't truncate table display: pd.set_option('display.max_colwidth', -1)
  • display table with HTML: IPython.core.display.HTML(frame.to_html(escape=False))
  • sort by column: frame.sort_values(['col1', 'col2'], ascending=True)

MultiIndex

  • item labels: mi.labels -> [[1,1,2], ['blue', 'red', 'blue']]
  • unique labels for each level: mi.levels -> [[1,2,3], ['blue', 'red']]
  • column names: mi.names -> ['id', 'color']
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment