Occasionally one ends up using a dataframe with a list column The
purrr library has all kinds of useful functions for working with such a dataframe, but there is a particular idiom that I have a hard time remembering how to use: mapping a function element-wise across each row in the dataframe. An example of this would be computing the length of each row’s list column.
R has a number of different ways to do this, but there are a variety of different syntaxes, and furthermore for whatever reason the documentation always sends me down the wrong track- I seem to always end up mapping the function over each row in its entirety, not over the individual data elements I want. As a favor to Future Me™, I’m writing down how to do this here in the hopes that I’ll remember to look here next time.
First, let’s make a simple tibble with a list column:
some.data <- tibble( col1=c(1,2,3), col2=c('purple','monkey','dishwasher'), col3=list( c('a','b'), c('a','b','c','d'), character() ) ) some.data
## # A tibble: 3 x 3 ## col1 col2 col3 ## <dbl> <chr> <list> ## 1 1 purple <chr > ## 2 2 monkey <chr > ## 3 3 dishwasher <chr >
Now, we can use
purrr::map_int() to map the
length function over
col3. The trick (and this is what I always seem to forget) is that it should happen from within a call to
muatate(), like so:
some.data %>% mutate(col3.length = map_int(col3, length))
## # A tibble: 3 x 4 ## col1 col2 col3 col3.length ## <dbl> <chr> <list> <int> ## 1 1 purple <chr > 2 ## 2 2 monkey <chr > 4 ## 3 3 dishwasher <chr > 0
Purrr has other related functions (
map_dbl, etc.) that work the same way, as well as
map2_* sibling functions for mapping over multiple columns. There are, I am quite sure, many other ways to achieve this same result (there’s probably some clever way to do it using
lapply, for instance) but life is short and
purrr does the trick.
« Back to notes