This document covers some of the reasons we use R in this course. It’s not “required reading”, but take a look if you’re interested.
R is a piece of software for handling data. It’s the one used on this course, but it’s not the only option available, others include: Excel, Jamovi, JASP, MATLAB,Stata and, perhaps the most talked-about alternative, SPSS.
Students prefer R. In a recent study, undergraduate psychology students at Glasgow University were given a choice between R and SPSS, having experienced both. Two-thirds of the students chose R. Those who chose R did better in the final assessments and showed lower stats anxiety. R is being used to teach Plymouth University undergraduates (and visiting Year 10 students) across a range of different courses. Read more.
Data science is a graduate skill in high demand, and using R is a key skill in that market. In contrast, demand for SPSS skills has been declining dramatically for a decade. At SPSS’s current rate of decline, it’ll be gone by the time you graduate. Read more at r4stats and at loveR.
R is free. You don’t need to pay anything to download or use it, and never will. In contrast, once you leave university, SPSS would cost you or your employer around £3000 per person per year.
Every analysis you can think of is already available in R, thanks to over 15,000 free packages. As new analyses are developed, they become available in R first. In 2013, SPSS realised it couldn’t keep up with R, and admitted defeat.
Real data analysis is mainly preprocessing – scientists spend around 80% of their analysis time getting the data into a format where they can apply statistical tests. R is fantastically good at preprocessing. Our course focusses on realistic data analysis, making R the perfect tool for the job.
The alternatives to R for real data analysis are either kludgy, error prone and have poor reproducibility (e.g. preprocessing in Excel, followed by statistics in SPSS), or are more niche in the graduate jobs market (e.g. MATLAB). In particular, Excel is famously error prone with, for example, 1 in 5 experiments in genetics having been screwed up by Excel and the case for the UK government’s policy of financial austerity being based on an Excel screwup.
R’s use of scripts means that, if you have done the analysis completely in R, you already have a full, reproducible record of your analysis path. Anyone with an internet connection can download R, and reproduce your analysis using your script. Making your analyses reproducible is an essential skill in many areas of research.
R is “free as in freedom” because all the source code is available to everyone (it’s “open source”). Some reasons this is important:
All software has bugs; making the source code available means it’s more likely that these bugs are found and fixed. In contrast, no one outside of IBM can look at the source code for SPSS, and it’s entirely up to IBM whether they fix, or tell you about, the bugs it has.
All software is eventually abandoned by the people who wrote it (if for no other reason than their death). Open source software only dies if no one in the world cares enough about it to maintain it. In contrast, closed-source software (e.g. SPSS) dies as soon as the current owners decide to kill it.