Getting the length of list columns with Purrr
« Back to notesOccasionally one ends up using a dataframe with a list column The purrr
library has all kinds of useful functions for working with such a dataframe, but there is a particular idiom that I have a hard time remembering how to use: mapping a function element-wise across each row in the dataframe. An example of this would be computing the length of each row’s list column.
R has a number of different ways to do this, but there are a variety of different syntaxes, and furthermore for whatever reason the documentation always sends me down the wrong track- I seem to always end up mapping the function over each row in its entirety, not over the individual data elements I want. As a favor to Future Me™, I’m writing down how to do this here in the hopes that I’ll remember to look here next time.
First, let’s make a simple tibble with a list column:
some.data <- tibble(
col1=c(1,2,3),
col2=c('purple','monkey','dishwasher'),
col3=list(
c('a','b'),
c('a','b','c','d'),
character()
)
)
some.data
## # A tibble: 3 × 3
## col1 col2 col3
## <dbl> <chr> <list>
## 1 1 purple <chr [2]>
## 2 2 monkey <chr [4]>
## 3 3 dishwasher <chr [0]>
Now, we can use purrr::map_int()
to map the length
function over col3
. The trick (and this is what I always seem to forget) is that it should happen from within a call to muatate()
, like so:
some.data %>% mutate(col3.length = map_int(col3, length))
## # A tibble: 3 × 4
## col1 col2 col3 col3.length
## <dbl> <chr> <list> <int>
## 1 1 purple <chr [2]> 2
## 2 2 monkey <chr [4]> 4
## 3 3 dishwasher <chr [0]> 0
Purrr has other related functions (map_lgl
, map_dbl
, etc.) that work the same way, as well as map2_*
sibling functions for mapping over multiple columns. There are, I am quite sure, many other ways to achieve this same result (there’s probably some clever way to do it using lapply
, for instance) but life is short and purrr
does the trick.
Update, 8/2021: Newer versions of dplyr
include a rowwise
function, which makes this sort of analysis even easier. Essentially, rowwise()
is like group_by()
except that it creates a single “group” for each row. After that point, your “regular” dplyr
verbs will all operate on each row:
some.data %>% rowwise() %>% mutate(col3.length=length(col3))
## # A tibble: 3 × 4
## # Rowwise:
## col1 col2 col3 col3.length
## <dbl> <chr> <list> <int>
## 1 1 purple <chr [2]> 2
## 2 2 monkey <chr [4]> 4
## 3 3 dishwasher <chr [0]> 0
So much less need for lapply
!
« Back to notes