Cookbook for Pandas
23 Jan 2018I end up looking for how to do things in pandas way too often. So I’ll summarize some basic features that I need.
- datetime
- converting to datetime format
df['time'] = pd.to_datetime(df['time'])
It can also be used on the index
df.index = pd.to_datetime(df.index)
- Shift a timeindex by constant time
df.index = df.index - datetime.timedelta(hours=6)
- Change sampling frequency; for instance, going from 1min data to 20min data
df.set_index('time', inplace=True) df20 = df['open'].resample('20min').first() df20['close'] = df['close'].resample('20min').last() df20['mean'] = 0.5*(df['open'].resample('20min').mean() + df['close'].resample('20min').mean() ) df20['low'] = df['low'].resample('20min').min() df20['high'] = df['high'].resample('20min').max()
Many different frequencies can be used with
resample
, the main ones beingM monthly W weekly D daily H hourly
A nice reference is available here.
- converting to datetime format
- Normalize data (e.g, ‘volume’) by their daily maximum; assuming the index is every minute,
df['date'] = df.index.date df.volume / df.groupby('date').volume.transform(np.max)
- Plot results (‘res’) of a groupby with timeindex, all on top of each other. The trick
here is to turn off the use of the index when plotting,
df.groupby('date').res.plot(use_index=False) plt.show()
- take negation of a boolean index selection:
~
. For instance, the following example sets all entries in the ‘movie’ column to NaN where they do not end with a colon ‘:’,df.loc[~df['movie'].str.endswith(':'),'movie'] = np.NaN
pandas
python
cookbook
ML
]