Monday, October 22, 2018

transformation - R: as.numeric function not returning correct # from data.frame











I am importing an excel document using read.xls. I know this command uses read.table and returns everything as "factors". I am unable to upload my data directly telling read.xls which columns are numeric, as all columns have previous categorical data. So I have been extracting my numeric data columns I desire, then wanting to transform them from data.frames to numeric data, however when I use as.numeric I am receiving numbers that do not correspond to the original data.



For example:



These are the first 6 rows of my data.frame called dfA1, which is a 96,1 vector



         [,1]

[1,] "103316"
[2,] "130720"
[3,] "141808"
[4,] "131864"
[5,] "148144"
[6,] "145760"


When I perform as.numeric(dfA1) I receive:




[1]  2  18  29  19  43  40


I have absolutely no idea why I get these numbers or how it could be coming up with them. I checked my original xls document and they are marked as numeric with no decimals.


Answer



You can try:



as.numeric(as.character(dfA1))



and you can also prevent things from automatically being converted to factors by setting stringsAsFactors = FALSE using ?options.



The reason this happens is that factors are actually stored internally as integers, and the labels are what is actually displayed when you print them out (things like "103316" in your case). The function as.numeric thinks that what you want is the underlying integer representation.


No comments:

Post a Comment

plot explanation - Why did Peaches' mom hang on the tree? - Movies & TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...