25 Nov How to Import Data in R
R language is one of the most popular programming language for statistical analysis used extensively among researchers and data scientists. In this tutorial, you will learn some basic and highly effective functions for importing data into R’s working environment, the RStudio.
R can handle multiple types of data files for statistical analysis. One of the most common data files is the .csv file which is a delimited text file that uses comma for value separation. An example of such file is given in the following image. It is clear that every row is a data record with multiple values and each value is separated with a comma.
R is a versatile programming language and several types of files can be imported for statistical analysis.
The functions used to import .csv files in the data workflow are the following two:
- csv(): for comma separated value files (csv)
- csv2(): for files that use comma (“,”) as a decimal point and semicolon (“;”) as field separators
In R’s environment, the following script imports into Rstudio a csv file located in the file’s local directory:
file_dir <- "path/to/csv/file/directory" data <- read.csv(file_dir, header=TRUE, sep=",", dec=".")
The header argument in the read.csv function is used to determine whether the first row of the .csv file is used as header or not whereas the dec and sep arguments define the type of decimal points and separators respectively.
An example of a txt file can be shown in the following image;
For reading tab-separated value files (“.txt”) we can use the following two functions:
- delim(): for tab-separated value files
- delim2(): for tab-separated value files where comma (“,”) is used as the decimal point
In R’s environment, the following script imports a .txt file located in the file’s local directory:
file_dir <- "path/to/txt/file/directory" data <- read.delim(file_dir, header=TRUE, sep="\t", dec=".")
Since we are reading a tab-separated value file, the separator used is by default tab-delimited (“\t”).
Importing from URL
In R you can import files that are stored locally in a computer but also you can directly read files existing in a specific url without having to store them in a local space first. With the following script we can extract data in a tabular form located in a specific url and import them directly to RStudio;
library(RCurl) #importing package url<-getURL(‘http://link_to_dataset.com’) data<-read.csv(textConnection(url), header=TRUE)
Breaking down the above code, getURL returns the url in a string format and read.csv is used to read the url file in a csv format. If the data were comma-delimited then we could have used read.csv function instead.