Steven Bedrick

Interesting Code Snippets

Every so often, I come up with an interesting snippet of code I want to hang on to or that I think might be helpful to somebody else.

Probability Distribution Visualization

I put this together to help illustrate some concepts behind probability for my sister. Copy and paste it into Mathematica and you’ll get a handy interactive pair of probability distributions.

Manipulate[
 Histogram[
  {
   RandomReal[NormalDistribution[\[Mu]1, \[Sigma]1], n1],
   RandomReal[NormalDistribution[\[Mu]2, \[Sigma]2], n2]
   }, {-10, 10, .5}
  ],
 {\[Mu]1, -5, 5, Appearance -> "Labeled"},
 {\[Sigma]1, 1, 5, Appearance -> "Labeled"},
 {n1, 25, 1000, Appearance -> "Labeled"},
 {\[Mu]2, -5, 5, Appearance -> "Labeled"},
 {\[Sigma]2, 1, 5, Appearance -> "Labeled"},
 {n2, 25, 1000, Appearance -> "Labeled"}
 ]

It should look like this:


Using the “meta” key in Emacs over SSH using Terminal.app

I’d always had trouble figuring out how to use the “meta” key in Emacs while in an SSH session- I’ve got my local machine set up to use the “option” key as “meta”, but that never seemed to work while in Emacs via SSH. Turns out that it’s “escape”. So, for example, to launch a shell session from within Emacs, enter esc, then x and then type shell. This falls squarely under the category of “obvious stuff that everybody else probably already knew”, but just in case, I’ve recorded it here.

Inspecting pl/pgSQL functions from PSQL

Creating pl/pgSQL functions is easy: CREATE FUNCTION.... Inspecting those functions six months later, after you’ve forgotten what’s in them? Turns out that that’s easy too. To inspect the SQL behind a function named, say, foo, use \df+, like so:

dbname=# \df+ foo

This will print out, among other things, foo’s source code.

Getting an OpenCV IplImage from an NSImage

There are plenty of places online to find out how to get an NSImage from an OpenCV IplImage, but, for whatever reason, I had a devil of a time figuring out how to go in the other direction. This code seems to work, and as far as I can tell doesn’t leak memory—- but I could be completely wrong. If any Objective-C or OpenCV wizards happen across this code, please let me know if it’s got any glaring problems.

- (IplImage*) nsImageToIplImage:(NSImage*)img {
	
	NSBitmapImageRep *orig = [[img representations] objectAtIndex: 0];
	
	// [NSImage -representations] operates in-place, so we have to make
	// a copy or else the color-channel shift that we do later on will affect the original NSImage!
	NSBitmapImageRep *rep = [NSBitmapImageRep imageRepWithData:[orig representationUsingType:NSTIFFFileType properties:NULL]];
	
	int depth = [rep bitsPerSample];
	int channels = [rep samplesPerPixel];
	int height = [rep size].height;
	int width = [rep size].width;
	
	// note- channels had better be "3", or else the loop down below will act pretty funky...
    // NSTIFFFileType seems to always give three-channel images, so I think it's okay...
	IplImage* to_return = cvCreateImage(cvSize(width, height), depth, channels); 
	
	// found this cvSetData trick here: http://www.osxentwicklerforum.de/thread.php?postid=89767
	cvSetData(to_return, [rep bitmapData], [rep bytesPerRow]);
	
	// Reorder BGR to RGB
	// no, I don't know why it's in BGR after cvSetData
	for (int i = 0; i < to_return->imageSize; i += 3) {
		uchar tempR, tempG, tempB;
		tempR = to_return->imageData[i+2];
		tempG = to_return->imageData[i+1];
		tempB = to_return->imageData[i];
		
		to_return->imageData[i] = tempR;
		to_return->imageData[i+1] =tempG;
		to_return->imageData[i+2] = tempB;		
	}

	return to_return;
}

Errors while removing DRM from legally-purchased “Digital Editions” files

In the course of using ineptepub or a similar tool to remove the DRM from a legally-purchased eBook (for personal backup use, as described in this court decision), you may occasionally run into ePub files that can’t be decrypted due to an error that looks something like this: "File name in directory “OEBPS/blahblahbah” and header “OEBPS/xxx.jpg” differ". About one in ten files seem to produce this error, and it’s something to do with the nitty-gritties of how the zip file was produced.

Luckily, there’s an easy fix, at least on OSX: use the application described here. It’s essentially an AppleScript droplet that re-zips the file using a few special incantations, and it works almost all of the time.

UPDATE: There now exists a program called “DeDRM” that takes care of this automatically (it is essentially a thin wrapper around the ineptepub scripts as well as the aforementioned AppleScript droplet), and is a much better solution to one’s legitimate DRM-removal needs. Google and ye shall find.

“Joining” two datasets in Mathematica

Often, I want to perform a quasi-“inner join” of two matricies in Mathematica on some column that I know they have in common (e.g., a subject identifier, etc.). There are approximately a million ways to do this; O’Reilly’s Mathematica Cookbook has a particularly idiomatic and clever (but extremely verbose) method involving patterns and rewrite-rules, which I used for a while… I’m lazy, though, and found myself needing to do this all the time, so I wrote my own solution, which handles far fewer possible situations than the Cookbook’s, but still works for the most common case.

This function lets you take two matrices that have a common first column and joins them together on that column (see usage example below). It’s intended for situations where the two matrices have an equal number of rows, and order is not important. The algorithm is is very basic and brute-force, so running it on millions of rows might be slow. For day-to-day use, though, I’ve found it to be very handy. Right now, it assumes that the first column of the matrix is the “key” column; in principle, one could easily modify it to allow the caller to specify which column to treat as the key.

joinByFirstCol[d1_, d2_] :=(*figure out the elements of d2 that correspond with d1 on the designated column*)
 Return[Flatten[{Part[#, 1], Part[#, 2 ;;], 
      If[Count[d2, {Part[#, 1], __}] > 0, 
       Part[(First@Cases[d2, {Part[#, 1], __}]), 2 ;;], Null]}, 1] & /@
    d1]

Usage is as follows:

In[1]:= a = {{1, 2}, {2, 5}, {3, 3}};
In[2]:= b = {{1, "a"}, {2, "b"}, {3, "c"}};
In[3]:= joinByFirstCol[a, b]
Out[3]= {{1, 2, "a"}, {2, 5, "b"}, {3, 3, "c"}}

Using DatabaseLink to connect to PostgreSQL

This is another entry in the “really simple but still seems to take forever to get working” family of Mathematica tricks. Mathematica can, in theory connect to any JDBC data source, but it is not always obvious what magic set of options and commands will be required. For PostgreSQL, the trick is all in the arguments to JDBC[]. The first one needs to be “PostgreSQL”, and the second needs to be the hostname and database name, separated by a slash. Basically, this second argument is going to get stuck on to the end of a JDBC connection string, so you can (presumably) also include stuff like port numbers and so forth.

In[1]:= Needs["DatabaseLink`"];
In[2]:= 
dbconn = OpenSQLConnection[JDBC["PostgreSQL", "YOUR_HOST_NAME/YOUR_DB_NAME"], 
  Username -> "USERNAME", Password -> "PASSWORD"]
Out[2]= SQLConnection[3, "Open", "Catalog" -> "topic_gen", 
 "TransactionIsolationLevel" -> "ReadCommitted"]

Partitioning into unevenly-sized lists:

I can never find this one when I need it, so I’m posting it here:

In[139]:= Partition[
 Range[5],
 2, 2, {1, 1}, {}
 ]

Out[139]= {{1, 2}, {3, 4}, {5}}

In[140]:= Partition[
 Range[5],
 2, 2, {1, 1}, {"a"} (* note that we're not just passing in an empty list this time... *)
 ]

Out[140]= {{1, 2}, {3, 4}, {5, "a"}}

Making a language model using OpenGRM

Assumes that the corpus has been processed appropriately (tokenized, whitespace replaced with entity token, etc.).

echo "usage: buildlm.sh ngram_count corpus_filename"

NGRAM_SIZE=$1
: ${NGRAM_SIZE:=3}
echo "ng: $NGRAM_SIZE"

FNAME=$2
echo "Fname: $FNAME"

set -x

# make model from alignment
ngramsymbols < $FNAME.split > $FNAME.syms

farcompilestrings -symbols=$FNAME.syms -keep_symbols=1 $FNAME.split > $FNAME.far

ngramcount --order=$NGRAM_SIZE < $FNAME.far > $FNAME.$NGRAM_SIZE.cnts
ngrammake  $FNAME.$NGRAM_SIZE.cnts > $FNAME.$NGRAM_SIZE.smoothed.mod

Copyright © 2017 Steven Bedrick. All rights reserved.
Powered by Thoth.