Numpy file formats are pretty fast for numerical data
I prefer to use numpy files since they're fast and easy to work with.
Here's a simple benchmark for saving and loading a dataframe with 1 column of 1million points.
import numpy as np
import pandas as pd
num_dict = {'voltage': np.random.rand(1000000)}
num_df = pd.DataFrame(num_dict)
using ipython's %%timeit
magic function
%%timeit
with open('num.npy', 'wb') as np_file:
np.save(np_file, num_df)
the output is
100 loops, best of 3: 5.97 ms per loop
to load the data back into a dataframe
%%timeit
with open('num.npy', 'rb') as np_file:
data = np.load(np_file)
data_df = pd.DataFrame(data)
the output is
100 loops, best of 3: 5.12 ms per loop
NOT BAD!
CONS
There's a problem if you save the numpy file using python 2 and then try opening using python 3 (or vice versa).
No comments:
Post a Comment