Friday, August 24, 2018

r - Data Masking in Dataframe


  • I have a dataframe with 8 unique values



     data<-data.frame(id=c("ab","cc","cc","dd","ee","ff","ee","ff","ab","dd","gg",1,"air"))
    >data
    id

    1 ab
    2 cc
    3 cc
    4 dd
    5 ee
    6 ff
    7 ee
    8 ff
    9 ab
    10 dd

    11 gg
    12 1
    13 air

  • I create another dataframe holding 8 unique values that are to be used as replacements



     library(random)
    replacements<-data.frame(value=randomStrings(n=8, len=2, digits=FALSE,loweralpha=TRUE, unique=TRUE, check=TRUE))
    replacements
    V1

    1 SJ
    2 fH
    3 TZ
    4 Mr
    5 oZ
    6 kZ
    7 fe
    8 ql

  • I want to replace all unique values from data dataframe with values in replacement dataframe in below way





All ab values replaced by SJ
All cc values replaced by fH
All dd values replaced by TZ
All ee values replaced by Mr
All ff values replaced by oZ
All gg values replaced by kZ
All 1 values replaced by fe
All air values replaced by ql




  • Currently, I am achieving this by:



        data<-data.frame(id=c("ab","cc","cc","dd","ee","ff","ee","ff","ab","dd","gg",1,"air"))
    data$id<-as.character(data$id)
    replacements<-data.frame(value=randomStrings(n=8, len=2, digits=FALSE,loweralpha=TRUE, unique=TRUE, check=TRUE))

    replacements$V1<-as.character(replacements$V1)
    for(i in 1:length(unique(data$id))){
    data$id[data$id %in% data$id[i]] <- replacements$V1[i]
    }


    >data
    id
    1 SJ
    2 fH

    3 fH
    4 TZ
    5 Mr
    6 oZ
    7 Mr
    8 oZ
    9 SJ
    10 TZ
    11 kZ
    12 fe

    13 ql

  • Is there any base function in R to achieve? Is there better approach than this for masking data?


No comments:

Post a Comment

plot explanation - Why did Peaches&#39; mom hang on the tree? - Movies &amp; TV

In the middle of the movie Ice Age: Continental Drift Peaches' mom asked Peaches to go to sleep. Then, she hung on the tree. This parti...