Can anyone knowledgeable here speak about statistics offer some advice? I'm abou...

stevenbedrick · on Nov 16, 2010

I've been using Mathematica (MMA) for my dissertation's data analysis, and for the most part it's been great. As an environment to manipulate data in, it's by far the best that I've used- once you get the hang of it, the pattern/transformation-rule language is incredibly useful for reformatting, recoding, mixing, slicing, dicing, etc. one's data. If you're coming from Haskell, you'll probably pick this part up way faster than I did at first.

If your stats needs are relatively simple- linear models, glms, logit models, anovas, simple tests of hypotheses, etc.- MMA is more than adequate. The new version looks like it adds some non-parametric stats functions, as well as paired t-tests, both of which would be quite useful to me.

Also, the visualization tool in MMA are fabulous, and don't make me want to tear my beard out every time I have to go off the beaten path (as opposed to those found in certain other one-letter-long stats environments I could name). 'Nuff said. Another thing I really appreciate about MMA is how consistent the syntax and functions are- once you've figured out one function, the odds are good that your knowledge will be useful on the next function you try and figure out. This, again, stands in stark contrast to other packages (R, SAS, I'm looking at you guys).

I have found myself turning to R for certain specific things, though. Mixed-effects models, repeated-measure ANOVA, Fisher's Exact Test, etc. Really, the two work together well- it's easy to use MMA to get your data in exactly the right form for R, export it, and then do whatever you need from there.

lkozma · on Nov 15, 2010

I think Mathematica is more useful for symbolic processing. For crunching large matrices of numbers and making some plots, R is probably best and/or Matlab (or the free Octave). It depends mostly on which programming paradigm you are more comfortable with.

cdavid · on Nov 16, 2010

I cannot speak for mathematic, I have barely used it. For stats, R is hard to beat, it has a lot of "cutting edged" toolboxes through CRAN (CPAN for R), and that's what uses most academics in statistics - most leading academics in statistics use it.

Now, its heritage shows quite a bit, and the language is not always nice to play with. It has great plotting facilities ala ggplot: http://had.co.nz/ggplot, which is a very interesting way to look at data visualizaton in a principled way.

There are also quite a few things available in scipy if python is your thing. If you just want to do stats and are not familiar with python nor want to deal with a general programming languge, R is better I think. If you want to make full-fledged applis with a web-fronted, R will not be pleasant :) {usual disclaimer: I am a numpy/scipy contributor).

greattypo · on Nov 15, 2010

R has some basic visualization libraries. The graphs are overall very functional, but basic. Ex: http://www.statmethods.net/advgraphs/images/splotm.png

Alternatives like Matlab, Maple, Stata all have basically the same 'look' to their default graphing packages.

Even though Mathematica would not be the right choice for statistical processing, the graphs it produces are a step above the rest.

So depends what you're use case is.. any of the above would look good enough for an academic paper. But if you're going to be publishing these in a magazine, they probably won't cut it.

mayanklahiri · on Nov 15, 2010

> R has some basic visualization libraries

If you include CRAN packages, then this couldn't be further from the truth: http://addictedtor.free.fr/graphiques/

R can be made to produce beautiful visualizations.

Edit: here's a link to the top-voted thumbnails of R visualization: http://addictedtor.free.fr/graphiques/thumbs.php?sort=votes

IMO, since it can be scripted from the command line, it's high time Gnuplot died a graceful death.

sciboy · on Nov 16, 2010

I have used R/ggplot for a long time now, and it's exceedingly difficult to produce beautiful visualizations.

The only thing that makes tufte-quality visualizations in my experience is hand-building your graphs in tikz.

The graphs you linked to are hideous from an aesthetics point of view; font's are ugly, data:ink ratio far too low, color choices poor etc.

wwortiz · on Nov 16, 2010

Especially when you have such pretty defaults in things like Matlab and Mathematica.

Matplotlib (http://matplotlib.sourceforge.net/) is actually the best opensource library for creating great looking graphs that I have come across, and is comparable to Matlab and Mathematica.

sciboy · on Nov 17, 2010

I've looked at matplotlib but still think it's ugly by default.

Compare to:

http://www.texample.net/tikz/examples/weather-stations-data/

http://www.texample.net/tikz/examples/rna-codons-table/

The beauty of tex font's + vector graphics is impressive. Try zooming in on the sparklines in the first example

cdavid · on Nov 16, 2010

I guess taste comes in place - I find producing good (as in academic publication good) figures in matlab an exercice of pain. The subplot mechanism is awful (at least was 5 years ago), and it is hard to control the layout. I heard mathematic is much better in that regard, but never used it myself.

tumult · on Nov 16, 2010

For anyone who checked this thread again, I'd just like to say thanks for all of the replies.

earl · on Nov 15, 2010

Just use R -- the stats support is IMO the best in the world and it has very high adoption amongst stats grad departments and practitioners of statistics. The visualization tools work well -- I've written about basic plotting tools, but if you're just starting in R, skip the built in plotting tools and just use ggplot2. It allows you to build astounding graphics.

Of course, any of this is a time investment, but I'd say the only alternative is Matlab - S-plus is stupidly expensive and no better than R, Stata is a pain for any sort of automated processing, SAS is overpriced by an order of magnitude with a hideous learning curve for functionality that lags 10 years behind R, and Mathematica is brand new to the market. Let someone else work out the kinks.

steve19 · on Nov 16, 2010

Is SPSS any good?

I have used R for projects, but quite a few researchers I know use SPSS.

stevenbedrick · on Nov 16, 2010

SPSS is quite good, for certain things. A lot of researchers use it because little-to-no programming is required, and you can interact with it in an entirely GUI-way- if you can use Excel, you can use SPSS. It makes it easy to set up certain analyses, and gives lots of output... and there's where my concerns about it come up. It's easy to fall into a false sense of security with it, and to end up with statistics that you don't know how to interpret properly (I call this the "Huh. Now what do I do?" problem). The documentation is often pretty useless on this front as well- lots of pages follow this general pattern: "Jones Test of Gronkularity: If checked, SPSS will calculate the Jones Test of Gronkularity statistic, which tests the null hypothesis that the data are gronkular", as opposed to useful information about why you might care whether the data are gronkular or not, why the Jones test was included in another test's output, etc. For a product aimed at people with relatively limited technical capabilities, I feel like SPSS should have better docs.

One important thing to know about SPSS- that a lot of people don't- is that is really a programming language, for which the GUI is simply a code generator. I find that it's almost always easier for me to interact directly with the under-the-hood guts of SPSS than the GUI, although sometimes when setting up a new analysis for the first time I'll use the GUI to do most of the work and then tweak its results.

steve19 · on Nov 16, 2010

Thanks.