Monday, August 27, 2018

dataframe - Random Sample of rows from an R dataset




Suppose I have a dataset with (90,000 x 17) i.e. (n x p) where n is the number of observations and p is the number of variables and I would like to take a random sample of 20% of rows from my whole dataset how can this be done in R?




After taking a random sample I will be performing cluster analysis accordingly.



I had tried using other questions to answer my question but they were inconclusive because it was not giving me what I needed.


Answer



You can do it with sample_frac from dplyr, here is an example with the database iris



 library(dplyr)
#data(iris)
sample20 <- iris %>% sample_frac(0.2)


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...