Chapter 3 If/For/Function/IO
3.1 Conditional statements
Syntax:
if (expr_1)
{
expr_2
……
……
}else{
expr_3
……
……
}
## [1] 283
if(sum(x)>500)
{
print("Sum of x is greater than 500");
}else{
print("Sum of x is less than 500");
}
## [1] "Sum of x is less than 500"
3.2 Repetitive execution using for loop
## [1] 5
## [1] 6
## [1] 7
## [1] 8
## [1] 9
## [1] 10
## [1] 10
## [1] 15
## [1] 17
## [1] 16
## [1] 25
## [1] 10
## [1] 15
## [1] 17
## [1] 16
## [1] 25
## [1] 1 10
## [1] 2 15
## [1] 3 17
## [1] 4 16
## [1] 5 25
Store squares of values of above vector in separate vector s using for loop
## [1] 10 15 17 16 25
## [1] 100 225 289 256 625
Note: Usually we don’t need to use for loop to perform operations on a single vector. The vector s created using above for loop can simply be created using command, s<- x^2. But for loop is very useful to perform operations on multiple columns and rows of a matrix or data frame.
3.2.1 Task
Create a matrix of 5 rows for values 1 to 50. Calculate the means of all rows in a matrix and store them in a vector m.
## [,1] [,2] [,3] [,4] [,5]
## [1,] 44 47 47 15 9
## [2,] 34 10 45 26 13
## [3,] 21 5 3 45 40
## [4,] 11 13 39 5 38
## [5,] 4 20 46 32 44
## [6,] 9 38 49 40 5
## [7,] 31 6 30 5 5
## [8,] 4 40 13 14 14
## [9,] 50 23 22 38 17
## [10,] 45 1 15 29 34
## [1] 32.4 25.6 22.8 21.2 29.2 28.2 15.4 17.0 30.0 24.8
3.3 Functions
3.3.1 Define function
Define a function to find maximum values of every column of a matrix.
3.3.2 Call function
## [,1] [,2] [,3] [,4] [,5]
## [1,] 37 26 49 34 41
## [2,] 49 40 8 37 42
## [3,] 42 47 13 19 5
## [4,] 36 13 17 49 21
## [5,] 16 7 17 22 30
## [6,] 42 10 5 18 11
## [7,] 31 22 22 43 27
## [8,] 35 39 8 33 19
## [9,] 19 12 31 6 8
## [10,] 29 11 12 16 46
## [1] 49 47 49 49 46
3.4 File IO
Input_1.txt
Input_2.txt
Input_3.txt
Input_4.txt
Input_3.xlsx
1BUW.pdb
3.4.1 The read.table() function
If data is well structured in tabular form, we can use read.table() to read the data.
3.4.2 Read data
In file Input_1.txt all rows have equal numbers of columns. Each cell is separated by tab. Try ?read.table() to check the default values for arguments.
Default: Header=FALSE, sep=" ", stringsAsFactors=T
## V1 V2 V3 V4 V5
## 1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 2 5.1 3.5 1.4 0.2 setosa
## 3 4.9 3 1.4 0.2 setosa
## 4 4.7 3.2 1.3 0.2 setosa
## 5 4.6 3.1 1.5 0.2 setosa
## 6 5 3.6 1.4 0.2 setosa
## 7 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 7 obs. of 5 variables:
## $ V1: Factor w/ 7 levels "4.6","4.7","4.9",..: 7 5 3 2 1 4 6
## $ V2: Factor w/ 7 levels "3","3.1","3.2",..: 7 4 1 3 2 5 6
## $ V3: Factor w/ 5 levels "1.3","1.4","1.5",..: 5 2 2 1 3 2 4
## $ V4: Factor w/ 3 levels "0.2","0.4","Petal.Width": 3 1 1 1 1 1 2
## $ V5: Factor w/ 2 levels "setosa","Species": 2 1 1 1 1 1 1
We can see that header is considered as 1st row which is what we dont want.
3.4.3 About header=TRUE parameter
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : Factor w/ 1 level "setosa": 1 1 1 1 1 1
Header=T allows to read first row in a file as column names vector.
Note the structure of in2. The first four columns are numeric as expected. But Species column has been considered as factors. Species column is considered as factor (categorical variable). If we dont want to read character data type as factor, we can explore stringsAsFactors = FALSE parameter, as shown below.
3.4.4 About stringsAsFactors = FALSE parameter
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
stringsAsFactors=F disables factor formatting of character columns. Check the data types of Species (chr). The first four columns are numeric. Now Species column has been considered as a character vector.
3.4.5 Other useful parameters:
- sep = ""
- comment.char = “#”
- na.strings = “NA”
- quote = “"’”
- row.names
- col.names
- blank.lines.skip = TRUE
3.4.6 Read file with unequal columns in first row using read.table.
Note that in file Input_2.txt, first row has 5 column fields while remaining rows have 6 fields/values i.e. first row has one column less than other rows. Under such format Header is automatically set to TRUE by read.table(). So in the below code, we didnt specify header=TRUE (optional here).
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
3.4.7 Read file consisting of comments, blank lines, null values
Default arguments: skip=0, comment.char=“#”, na.strings=“NA”
Please note that in file Input_3.txt consists of
- First two lines inserted by author but are not required for processing data.
- 2 comment lines starting with character “!”
- 1 blank line
- Row 3 has one missing value shown by NULL
- First row has 5 columns while remaining rows havecolumns. Under such input format Header is set to TRUE by read.table().
in5 = read.table("./data/Day1/Input_3.txt", stringsAsFactors = FALSE, comment.char = "!", na.strings = NULL, skip=2);
in5;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 NULL 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: chr "5.1" "4.9" "NULL" "4.6" ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
- skip=2 removes first two lines not required for processing data by R.
- comment.char=“!”, instructs R to exclude lines starting with “!”. Default comment.char=“#”
- na.strings= “NULL” allows R to interpret NULL values as missing or NA characters, which will be helpful to remove such rows by na.omit() function. Default na.strings= “NA”.
3.4.8 Read csv file read.csv()
Please note that values in file Input_4.txt are separated by comma, “,” and all rows have equal number of columns. See help for read.csv to check the default values for arguments. Default Header=TRUE, sep=“,”
in7= read.csv("./data/Day1/Input_4.txt", stringsAsFactors = FALSE, comment.char = "!", na.strings = NULL, skip=2);
in7;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 NULL setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : chr "0.2" "0.2" "0.2" "NULL" ...
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
3.4.9 Reading data from excel file
We need to use gdata package to read excel file. To use, we must have perl installed in the system.
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
## The following object is masked from 'package:stats':
##
## nobs
## The following object is masked from 'package:utils':
##
## object.size
## The following object is masked from 'package:base':
##
## startsWith
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : Factor w/ 1 level "setosa": 1 1 1 1 1 1
library(gdata) loads the gdata package to access read.xls() function. sheet option allows to choose sheet from input excel file.
3.4.10 Read data using readLines()
By default readLines() read all the lines of a file. It returns a character vector. Check the str(ln). n=10 allows to read first 10 lines.
## chr [1:5484] "HEADER OXYGEN STORAGE/TRANSPORT 06-SEP-98 1BUW " ...
## [1] "HEADER OXYGEN STORAGE/TRANSPORT 06-SEP-98 1BUW "
## [2] "TITLE CRYSTAL STRUCTURE OF S-NITROSO-NITROSYL HUMAN HEMOGLOBIN A "
## [3] "COMPND MOL_ID: 1; "
## [4] "COMPND 2 MOLECULE: PROTEIN (HEMOGLOBIN); "
## [5] "COMPND 3 CHAIN: A, C; "
## [6] "COMPND 4 SYNONYM: S-NITROSO-NITROSYLHB; "
## [7] "COMPND 5 OTHER_DETAILS: THE SULFHYDRYL GROUPS OF CYSTEINE 93 OF "
## [8] "COMPND 6 BETA SUBUNITS ARE S-NITROSYLATED. THE HEME GROUPS ARE "
## [9] "COMPND 7 NITROSYLATED.; "
## [10] "COMPND 8 MOL_ID: 2; "
3.4.11 Read data using clipboard feature
Read copied text using clipboard