Steven Bedrick

"Joining" two datasets in Mathematica

« Back to notes

Often, I want to perform a quasi-“inner join” of two matricies in Mathematica on some column that I know they have in common (e.g., a subject identifier, etc.). There are approximately a million ways to do this; O’Reilly’s Mathematica Cookbook has a particularly idiomatic and clever (but extremely verbose) method involving patterns and rewrite-rules, which I used for a while… I’m lazy, though, and found myself needing to do this all the time, so I wrote my own solution, which handles far fewer possible situations than the Cookbook’s, but still works for the most common case.

This function lets you take two matrices that have a common first column and joins them together on that column (see usage example below). It’s intended for situations where the two matrices have an equal number of rows, and order is not important. The algorithm is is very basic and brute-force, so running it on millions of rows might be slow. For day-to-day use, though, I’ve found it to be very handy. Right now, it assumes that the first column of the matrix is the “key” column; in principle, one could easily modify it to allow the caller to specify which column to treat as the key.

joinByFirstCol[d1_, d2_] :=(*figure out the elements of d2 that correspond with d1 on the designated column*)
 Return[Flatten[{Part[#, 1], Part[#, 2 ;;], 
      If[Count[d2, {Part[#, 1], __}] > 0, 
       Part[(First@Cases[d2, {Part[#, 1], __}]), 2 ;;], Null]}, 1] & /@
    d1]

Usage is as follows:

In[1]:= a = {{1, 2}, {2, 5}, {3, 3}};
In[2]:= b = {{1, "a"}, {2, "b"}, {3, "c"}};
In[3]:= joinByFirstCol[a, b]
Out[3]= {{1, 2, "a"}, {2, 5, "b"}, {3, 3, "c"}}

« Back to notes