Monday, April 22, 2019

How to do an R style aggregate in Python Pandas?



I need to do an aggregate (at least that what you would call it in R) over the mtcars data set that I have uploaded into python. The end goal is to get the average mpg for each value of cyl in the data set (There are three values for cyl, 4,6,8). Here is the R code for what I want to do



mean_each_gear <- aggregate(mtcars$mpg ~ mtcars$cyl, FUN = mean)



output:

cyl mpg
1 4 26.66364
2 6 19.74286
3 8 15.10000



The closest I've come with in Pandas is this



mtcars.agg(['mean'])



I'm not sure how I would do that in Pandas. Any help would be appreciated!



Answer



You want pandas groupby()!



import pandas as pd

my_dataframe = pd.read_csv('my_input_data.csv') //insert your data here
pd.groupby(['col1'])['col2'].mean()


where 'col1' is the column you want to group by and 'col2' is the column whose mean you want to obtain. Also see here:




https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...