Sunday, July 29, 2018

r - Extracting a random sample of rows in a data.frame with a nested conditional



This question builds from the SO post found here and uses code that was modified from a post on the R-help mailing list which can be seen here



I am trying to extract a random sample of rows in a data frame but with a conditional. Using the R iris data which looks like:




> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa



To take a simple random sample, the code below works fine to take a sample of 2 rows.



iris[sample(nrow(iris), 2), ]


However I am unsure how to condition the Species field. For example how to take the random sample as indicated above but only when Species != “setosa”



There are three categories of iris$Species



> summary(iris$Species)

setosa versicolor virginica
50 50 50


I am unsure how to correctly nest conditionals. One of my earlier attempts is below with the obviously incorrect results included….



> iris[sample(nrow(iris)[iris$Species != "setosa"], 2), ]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
NA NA NA NA NA
NA.1 NA NA NA NA



Thanks


Answer



I'd use which to get the vector of rows numbers from which you can sample given your condition....



iris[ sample( which( iris$Species != "setosa" ) , 2 ) , ]
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#59 6.6 2.9 4.6 1.3 versicolor
#133 6.4 2.8 5.6 2.2 virginica


No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...