Author: Team BioSakshat
Last update: June 2017
Copyright © 2017 BioSakshat, Inc. All rights reserved.
Input_1.txt
Input_2.txt
Input_3.txt
Input_4.txt Input_3.xlsx
1BUW.pdb
If data is well structured in tabular form, we can use read.table() to read the data.
In file Input_1.txt all rows have equal numbers of columns. Each cell is separated by tab. Try ?read.table() to check the default values for arguments.
Default: Header=FALSE, sep=" “, stringsAsFactors=T
in1 = read.table("_site/data/Day1/Input_1.txt");
in1;
## V1 V2 V3 V4 V5
## 1 Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 2 5.1 3.5 1.4 0.2 setosa
## 3 4.9 3 1.4 0.2 setosa
## 4 4.7 3.2 1.3 0.2 setosa
## 5 4.6 3.1 1.5 0.2 setosa
## 6 5 3.6 1.4 0.2 setosa
## 7 5.4 3.9 1.7 0.4 setosa
str(in1);
## 'data.frame': 7 obs. of 5 variables:
## $ V1: Factor w/ 7 levels "4.6","4.7","4.9",..: 7 5 3 2 1 4 6
## $ V2: Factor w/ 7 levels "3","3.1","3.2",..: 7 4 1 3 2 5 6
## $ V3: Factor w/ 5 levels "1.3","1.4","1.5",..: 5 2 2 1 3 2 4
## $ V4: Factor w/ 3 levels "0.2","0.4","Petal.Width": 3 1 1 1 1 1 2
## $ V5: Factor w/ 2 levels "setosa","Species": 2 1 1 1 1 1 1
We can see that header is considered as 1st row which is what we dont want.
in2 = read.table("_site/data/Day1/Input_1.txt", header = TRUE);
in2;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(in2);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : Factor w/ 1 level "setosa": 1 1 1 1 1 1
Header=T allows to read first row in a file as column names vector.
Note the structure of in2. The first four columns are numeric as expected. But Species column has been considered as factors. Species column is considered as factor (categorical variable). If we dont want to read character data type as factor, we can explore stringsAsFactors = FALSE parameter, as shown below.
in3 = read.table("_site/data/Day1/Input_1.txt", header = TRUE, stringsAsFactors = FALSE);
in3;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(in3);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
stringsAsFactors=F disables factor formatting of character columns. Check the data types of Species (chr). The first four columns are numeric. Now Species column has been considered as a character vector.
Note that in file Input_2.txt, first row has 5 column fields while remaining rows have 6 fields/values i.e. first row has one column less than other rows. Under such format Header is automatically set to TRUE by read.table(). So in the below code, we didnt specify header=TRUE (optional here).
in4 = read.table("_site/data/Day1/Input_2.txt", stringsAsFactors = FALSE);
in4;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(in4);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
Default arguments: skip=0, comment.char=“#”, na.strings=“NA”
Please note that in file Input_3.txt consists of
in5 = read.table("_site/data/Day1/Input_3.txt", stringsAsFactors = FALSE, comment.char = "!", na.strings = NULL, skip=2);
in5;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 NULL 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(in5);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: chr "5.1" "4.9" "NULL" "4.6" ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
Please note that values in file Input_4.txt are separated by comma, “,” and all rows have equal number of columns. See help for read.csv to check the default values for arguments. Default Header=TRUE, sep=“,”
in7= read.csv("_site/data/Day1/Input_4.txt", stringsAsFactors = FALSE, comment.char = "!", na.strings = NULL, skip=2);
in7;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 NULL setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(in7);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : chr "0.2" "0.2" "0.2" "NULL" ...
## $ Species : chr "setosa" "setosa" "setosa" "setosa" ...
We need to use gdata package to read excel file. To use, we must have perl installed in the system.
library("gdata");
## Warning: package 'gdata' was built under R version 3.3.3
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
## The following objects are masked from 'package:dplyr':
##
## combine, first, last
## The following object is masked from 'package:stats':
##
## nobs
## The following object is masked from 'package:utils':
##
## object.size
## The following object is masked from 'package:base':
##
## startsWith
xl=read.xls("_site/data/Day1/Input_3.xlsx", sheet=1);
xl;
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
str(xl);
## 'data.frame': 6 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4
## $ Species : Factor w/ 1 level "setosa": 1 1 1 1 1 1
library(gdata) loads the gdata package to access read.xls() function. sheet option allows to choose sheet from input excel file.
By default readLines() read all the lines of a file. It returns a character vector. Check the str(ln). n=10 allows to read first 10 lines.
pdb=readLines("_site/data/Day1/1BUW.pdb");
str(pdb);
## chr [1:5484] "HEADER OXYGEN STORAGE/TRANSPORT 06-SEP-98 1BUW " ...
pdb=readLines("_site/data/Day1/1BUW.pdb", n=10);
pdb;
## [1] "HEADER OXYGEN STORAGE/TRANSPORT 06-SEP-98 1BUW "
## [2] "TITLE CRYSTAL STRUCTURE OF S-NITROSO-NITROSYL HUMAN HEMOGLOBIN A "
## [3] "COMPND MOL_ID: 1; "
## [4] "COMPND 2 MOLECULE: PROTEIN (HEMOGLOBIN); "
## [5] "COMPND 3 CHAIN: A, C; "
## [6] "COMPND 4 SYNONYM: S-NITROSO-NITROSYLHB; "
## [7] "COMPND 5 OTHER_DETAILS: THE SULFHYDRYL GROUPS OF CYSTEINE 93 OF "
## [8] "COMPND 6 BETA SUBUNITS ARE S-NITROSYLATED. THE HEME GROUPS ARE "
## [9] "COMPND 7 NITROSYLATED.; "
## [10] "COMPND 8 MOL_ID: 2; "
Read copied text using clipboard
data=read.table("clipboard");
View(in1);
edit(in1);
write.table(in1, file="result.txt", sep="\t", eol="\n", quote=FALSE, row.names=FALSE, append = FALSE);
cat("Hello", file="result.txt");