Sunday, October 28, 2018

r - How to convert a factor to integernumeric without loss of information?

It is possible only in the case when the factor labels match the original values. I will explain it with an example.


Assume the data is vector x:


x <- c(20, 10, 30, 20, 10, 40, 10, 40)

Now I will create a factor with four labels:


f <- factor(x, levels = c(10, 20, 30, 40), labels = c("A", "B", "C", "D"))

1) x is with type double, f is with type integer. This is the first unavoidable loss of information. Factors are always stored as integers.


> typeof(x)
[1] "double"
> typeof(f)
[1] "integer"

2) It is not possible to revert back to the original values (10, 20, 30, 40) having only f available. We can see that f holds only integer values 1, 2, 3, 4 and two attributes - the list of labels ("A", "B", "C", "D") and the class attribute "factor". Nothing more.


> str(f)
Factor w/ 4 levels "A","B","C","D": 2 1 3 2 1 4 1 4
> attributes(f)
$levels
[1] "A" "B" "C" "D"
$class
[1] "factor"

To revert back to the original values we have to know the values of levels used in creating the factor. In this case c(10, 20, 30, 40). If we know the original levels (in correct order), we can revert back to the original values.


> orig_levels <- c(10, 20, 30, 40)
> x1 <- orig_levels[f]
> all.equal(x, x1)
[1] TRUE

And this will work only in case when labels have been defined for all possible values in the original data.


So if you will need the original values, you have to keep them. Otherwise there is a high chance it will not be possible to get back to them only from a factor.

No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...