Interact with R from Python

apathy · on April 13, 2008

This is cute, RPy is "the simplest thing that can possibly work" (and it does!), and most of all, it's awesome to see an R story high on the front page of YC.news.

But I would encourage readers (of this article and of YC news) to try playing with R interactively, just like you should try using iPython+SciPy interactively, to realize the truly awesome power of the libraries+language in each case. Then you will know what can and cannot be done easily in each, and can use your code in the other (or C, or shudder maybe even Java if necessary) to chink the gap. Both languages can play nice with C and Java, incidentally. But that's not the point -- let's look at an everyday situation.

For example -- I might like to do some dimensionality reduction on a high-dimensional dataset which is information-starved along some of the features that interest me (until I collapse it a little). Obviously, I'd like to read in the data (in as rich a form as possible), look at the combinations of dimensions that are most informative (probably via principal components analysis, aka PCA, but maybe I'd try some other techniques as well), plot them as predictors, and then perhaps use a bunch of resources on the Web to do some further annotation or testing and see how useful my pile of predictors is for various tasks. Off the top of my head, here is what I'd think of doing:

1) depending on the nature of the data, parse it in either R or Python, and bind it into a dataframe (sort of a fancy matrix) or list of dataframes so I can manipulate it in R.

2) almost certainly I'd do the PCA and other EDA in R, due to the sheer power and variety of packages available for this sort of task in the R environment. There are lots of libraries available for this sort of thing in Python, too, but if you have a wild hair up your ass to try some revolutionary technique that you saw in Bioinformatics or Genetic Epi (or whatever), yesterday, odds are that it was released as an R package.

3) Most likely I'd plot the correspondences for each predictor with responses I care about in R, maybe with GGobi if I had to deal with time series or high-dimensional plots. Not that Python can't do an awesome job, but hey, we're already in R so let's get this over with, shall we?

4) I'd probably want to use the results in Python for web- or database-backed inquiries, because R's DBI packages sort of suck. Maybe I'd save entire dataframes to MySQL or SQLite or what have you, and then retrieve them in Python to monkey around with the results (or use a stripped-down algorithm based on my R results to implement the 'app' version in Python, because eventually I'll bet we put this behind Django anyways...). But Python for sure if it's ever going to talk to Windows or the Web. Like, duh, rite?

So the point here is that it pays to have a couple of sharp tools lying around when your interesting problem shows up and starts flopping around on your desk. Don't be like the RDBMS guys who try and drive every damned screw with a filigreed hammer!

Hope this gets one or a few people to try R (on its own) and then stick it in their utility belt for later use.