Suppose I have a dataset with (90,000 x 17)
i.e. (n x p)
where n
is the number of observations
and p
is the number of variables
and I would like to take a random sample of 20%
of rows from my whole dataset how can this be done in R?
After taking a random sample I will be performing cluster analysis accordingly.
I had tried using other questions to answer my question but they were inconclusive because it was not giving me what I needed.
Answer
You can do it with sample_frac
from dplyr
, here is an example with the database iris
library(dplyr)
#data(iris)
sample20 <- iris %>% sample_frac(0.2)
No comments:
Post a Comment