Pandas cheat sheet

dataframe from records: pd.DataFrame.from_records([['A', 4], ['B', 2]], columns=['g', 'legs'], index=['g'])
(un)pickle: orders.to_pickle('saved.pkl'); pd.read_pickle('saved.pkl')
filter data frame per (part of) index: frame.loc[580]
timestamp: pd.Timestamp('2017-05-01')
random sample: frame.sample(n=20, random_state=42)
columns list: list(frame.columns)
rename columns: frame.rename({'from': 'to'})
drop columns: frame.drop(columns=['colname'])
frame of bools: frame['col'] > 5
filter: frame[frame_of_bools & more_bools]
remove rows with null/NaN values: frame.dropna() (one null is enough to trigger the filter)
aggregation: frame.groupby('colname').agg(['count', 'min', 'max'])
merge on index: frame.merge(another_frame, left_index=True, right_index=True)
merge on index via join: frame.join(another_frame, how='inner')
index changes: frame.set_index(['col1', 'col2']), frame.reset_index()
only rows where index is in given set: frame.filter(items=[458, 580], axis='index')
regex replace in string series series.str.replace(r'foo(.*)', r'bar\1', case=False)
per-row mapping:
- frame.apply(lambda row: pd.Series([row[0] + 1], index=[row.index[0]]), axis='columns')
- frame.apply(lambda row: [row[0] + row[1]], axis='columns', result_type='expand')
mapping in a series: series.map(lambda x: x + 5)
don't truncate table display: pd.set_option('display.max_colwidth', -1)
display table with HTML: IPython.core.display.HTML(frame.to_html(escape=False))
sort by column: frame.sort_values(['col1', 'col2'], ascending=True)

MultiIndex

item labels: mi.labels -> [[1,1,2], ['blue', 'red', 'blue']]
unique labels for each level: mi.levels -> [[1,2,3], ['blue', 'red']]
column names: mi.names -> ['id', 'color']

che0/pandas-cheat-sheet.md

Pandas cheat sheet

MultiIndex