Steven Bedrick

Getting the length of list columns with Purrr

« Back to notes

Occasionally one ends up using a dataframe with a list column The purrr library has all kinds of useful functions for working with such a dataframe, but there is a particular idiom that I have a hard time remembering how to use: mapping a function element-wise across each row in the dataframe. An example of this would be computing the length of each row’s list column.

R has a number of different ways to do this, but there are a variety of different syntaxes, and furthermore for whatever reason the documentation always sends me down the wrong track- I seem to always end up mapping the function over each row in its entirety, not over the individual data elements I want. As a favor to Future Me™, I’m writing down how to do this here in the hopes that I’ll remember to look here next time.

First, let’s make a simple tibble with a list column:

some.data <- tibble(
  col1=c(1,2,3),
  col2=c('purple','monkey','dishwasher'),
  col3=list(
    c('a','b'),
    c('a','b','c','d'),
    character()
  )
)

some.data
## # A tibble: 3 × 3
##    col1 col2       col3     
##   <dbl> <chr>      <list>   
## 1     1 purple     <chr [2]>
## 2     2 monkey     <chr [4]>
## 3     3 dishwasher <chr [0]>

Now, we can use purrr::map_int() to map the length function over col3. The trick (and this is what I always seem to forget) is that it should happen from within a call to muatate(), like so:

some.data %>% mutate(col3.length = map_int(col3, length))
## # A tibble: 3 × 4
##    col1 col2       col3      col3.length
##   <dbl> <chr>      <list>          <int>
## 1     1 purple     <chr [2]>           2
## 2     2 monkey     <chr [4]>           4
## 3     3 dishwasher <chr [0]>           0

Purrr has other related functions (map_lgl, map_dbl, etc.) that work the same way, as well as map2_* sibling functions for mapping over multiple columns. There are, I am quite sure, many other ways to achieve this same result (there’s probably some clever way to do it using lapply, for instance) but life is short and purrr does the trick.


Update, 8/2021: Newer versions of dplyr include a rowwise function, which makes this sort of analysis even easier. Essentially, rowwise() is like group_by() except that it creates a single “group” for each row. After that point, your “regular” dplyr verbs will all operate on each row:

some.data %>% rowwise() %>% mutate(col3.length=length(col3))
## # A tibble: 3 × 4
## # Rowwise: 
##    col1 col2       col3      col3.length
##   <dbl> <chr>      <list>          <int>
## 1     1 purple     <chr [2]>           2
## 2     2 monkey     <chr [4]>           4
## 3     3 dishwasher <chr [0]>           0

So much less need for lapply!


« Back to notes