Skip to main content

Statistical Software: R Koujue

This guide contains resources to assist faculty, researchers, staff, and students in learning statistical software, such as STATA, SAS, R, SPSS.

Upcoming R Event

Jianjun Hua, the Statistical Consultant in Educational Technologies, will teach “R Machine Learning III” workshop on Feb 1, 2-4 at Rocky 1930.

The original definition given by Arthur Samuel in 1959: machine learning is a subfield of computer science that gives "computers the ability to learn without being explicitly programmed." Practically, this means developing computer programs that can make predictions based on data. R is a free and open-sourced statistical software package.

This is the third workshop of "R Machine Learning" series. You'll learn about supervised learning tools, like bagging and random forest, and unsupervised learning tools, like hierarchical clustering. Time permitting, more supervised learning, like boosting will be covered. Knowledge about basics of R and linear regression is helpful for better understanding the contents of this workshop.

If you are interested, please sign up here:

Look forward to see you at the workshop!

Quick Tips

R commands are case-sensitive.

# comment follows.

<- or = is assignment operator.

c is used to concatenate.

demo() is used to see what R can do.

ls() lists existng objects in R.

help() displays the help manual for a command.

read.table() reads text files.

read.csv() reads comma-separated files.

read.dta() reads Stata (.dta) data files.

read.fwf() reads fixed format text files.

str() gets structure of a dataset.

save() data in an R data file.

load() reads data in an R data file.

library() loads an installed package.

rm() removes objects.

class() lists the type of an object.

mean() calculates the mean.

median() calculates the median.

sd() calculates the standard deviation.

cor() calculates correlations.

summary() is a generic function which provides a summary results of an object.

by() is used to apply a function to a data frame split by factors.

tapply() is used to apply a function to each cell of an array.

hist() is used to draw a histogram plot.

boxplot() is used to draw a box plot.

table() is used to generate a frequency table.

rbind() combines rows of data.

cbine() combines columns of data.

merge() is used to match-merge two data frames.

t.test() can conduct one sample, two sample and paired t-tests.

lm() fits a lienar model (regression).

anova() extracts the anova table from a lm object.

glm() is used for generalized lienar models.

wilcox.test() is a non-parametric analog to the indepndent two-sample t-test.

kruskal.test() is a non-parametric analog to the one-way anova.




Introduction to plotting in R

Summary Statistics In R

R Session


Jianjun Hua's picture
Jianjun Hua
Dartmouth College

Hanover NH 03755


This guide has a user friendly interface.

Strongly agree: 4 votes (50%)
Agree: 4 votes (50%)
Neutral: 0 votes (0%)
Disagree: 0 votes (0%)
Strongly disagree: 0 votes (0%)
Total Votes: 8

Post-Workshop Survey