fbpx
R Tip: Use match_order() to Align Data R Tip: Use match_order() to Align Data
R tip. Use wrapr::match_order() to align data. Suppose we have data in two data frames, and both of these data frames have common row-identifying... R Tip: Use match_order() to Align Data

R tip. Use wrapr::match_order() to align data.

Suppose we have data in two data frames, and both of these data frames have common row-identifying columns called “idx“.

library("wrapr")

d1 <- build_frame(
   "idx", "x" |
   3    , "a" |
   1    , "b" |
   2    , "c" )

d2 <- build_frame(
   "idx", "y" |
   2    , "D" |
   1    , "E" |
   3    , "F" )

print(d1)
#>   idx x
#> 1   3 a
#> 2   1 b
#> 3   2 c

print(d2)
#>   idx y
#> 1   2 D
#> 2   1 E
#> 3   3 F

(Please see R Tip: Think in Terms of Values for build_frame() and other value capturing tools.)

Often we wish to work with such data aligned so each row in d2 has the same idx value as the same row (by row order) as d1. This is an important data wrangling task, so there are many ways to achieve it in R, such as base::merge()dplyr::left_join(), or by sorting both tables into the same order and then using base::cbind().

However if you wish to preserve the order of the first table (which may not be sorted), you need one more trick.

You can add a row-id column, sort by the joining id, combine and then re-sort by the row-id column.

Or you can match the orders in one step using wrapr::match_order().

p <- match_order(d2$idx, d1$idx)

print(d2[p, , drop=FALSE])
#>   idx y
#> 3   3 F
#> 2   1 E
#> 1   2 D

match_order is merely wrapping all of the sort and re-sort tricks we mentioned above, however the theory is based on the absolute magic of associative array indexing.

Please see R Tip: Use drop = FALSE with data.frames, for why one should get in the habit of writing drop = FALSE.

 


 

Original Source

John Mount

John Mount

My specialty is analysis and design of algorithms, with an emphasis on efficient implementation. I work to find applications of state of the art methods in optimization, statistics and machine learning in various application areas. Currently co-authoring "Practical Data Science with R"

1