Monday, March 25, 2019

python - How to store a dataframe using Pandas

Numpy file formats are pretty fast for numerical data


I prefer to use numpy files since they're fast and easy to work with.
Here's a simple benchmark for saving and loading a dataframe with 1 column of 1million points.


import numpy as np
import pandas as pd
num_dict = {'voltage': np.random.rand(1000000)}
num_df = pd.DataFrame(num_dict)

using ipython's %%timeit magic function


%%timeit
with open('num.npy', 'wb') as np_file:
np.save(np_file, num_df)

the output is


100 loops, best of 3: 5.97 ms per loop

to load the data back into a dataframe


%%timeit
with open('num.npy', 'rb') as np_file:
data = np.load(np_file)
data_df = pd.DataFrame(data)

the output is


100 loops, best of 3: 5.12 ms per loop

NOT BAD!


CONS


There's a problem if you save the numpy file using python 2 and then try opening using python 3 (or vice versa).

No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...