Author: Team BioSakshat

Last update: June 2017

Copyright © 2017 BioSakshat, Inc. All rights reserved.

Lets Begin

Figure 1: Vector

Figure 1: Vector

Vector is one dimensional way of data storage in which sequence of numbers/characters/logical values can be stored. In the above image, example shows measured temperature of a city for 5 days. So we have 5 temperature values and all values are of integer type (25C, 30C etc). These 5 integers are stored in a sequence, which we term as integer vector (A vector storing integers). Therefore a vector can be understood as a single entity/variable/object storing an ordered collection of elements. Similarly, names of each students in a class can be stored in the form of a Character vector. Since all elements stored in a vector are of only one data types, i.e either integer or numeric or character or logical, vector is a homogeneous data structure (integer vector, numeric vector, character vector and logical vector respectively).

Creating a vector

In R vectors can be created using

  • Assignment operator (=)
  • Range operator (:)
  • Concatenate function c()
  • Sequence function seq()
  • Repeat function rep()
  • Sample function

We will explore these one by one.

Create Vector of one element

x1=2; # Integer data type
x2=2.3; # Numeric data type
x3="A"; # Single Character. Double quote was used.
x4="ABC"; # Multiple Characters. Double quote was used.
x5='A'; # Single Character. Single quote was used.
x6='ABC'; # Multiple Characters. Single quote was used.
x7=TRUE; # Logical. Note the use of TRUE (All capital)
x8=FALSE; # Logical. Note the use of FALSE (All capital)
#x9=true;  # Error since true is written in small. R is case sensitive.
  • In the above examples, x1 is an integer vector of one element having value 2. Similarly x2 is a numeric vector with one element 2.3. The vectors x3, x4, x5 and x6 are character vectors storing character data. Note that the character data can be either single character (“A”) or multiple character (“ABC”). Similarly one can use either single or double quote to assign character data types. The vector x7 and x8 are logical vector storing boolean values (TRUE or FALSE). Note that TRUE/FALSE can be also be written as T/F but everything has to be capital. R is case sensitive i.e. X and x are different. Thus x9=true; will throw error since R will not understand true.

  • The = is assignment operator using which the results from right hand side expression is stored in left hand side variable. When we say x1=2, 2 (right side) is assigned to x1 (left side).

  • Also note semicolon (;) at the end of few statements. Use of ; is optional. However it is recommended that you must end every statement with ; so that R understands it is end of a statement.

So far we saw vectors with one element. Now we will explore how to create vectors with more than one element.

Create Vector using Range operator (:)

x1=1:5; 
x1;
## [1] 1 2 3 4 5
x2=5:1;
x2;
## [1] 5 4 3 2 1

In the above example x1 stores 5 elements 1, 2, 3, 4 and 5 while x2 stores 5, 4, 3, 2, 1. So using range operator you can assign more than one element incremented/decremented by 1.

Create Vector using c()

Range operator (:) is handy when you have sequence of integers incremented or decremented by 1. What if you have random data and there is no pattern. In such case you can use concatenate function c().

# Integer vector with 3 element
x1=c(2,10,35);
x1;
## [1]  2 10 35
# Character vector with 3 elements
x2=c("Gene","Expression",'Chromosome');
x2;
## [1] "Gene"       "Expression" "Chromosome"
# Logical vector
x3=c(T,F,TRUE,FALSE);
x3;
## [1]  TRUE FALSE  TRUE FALSE
x4=c(x1,44,55);  # x4 will now contain 2, 10, 35, 44, 55
x4;
## [1]  2 10 35 44 55

In the example x1 is a vector with 3 elements: 2, 10 and 35. Similarly x2 is a character vector and x3 is a logical vector. Note that, since x2 is a character vector and each element are characters, you have to write within either single or double quote (e.g. “Gene”, “Expression”, ‘Chromosome’). In case of x4, first element is x1 vector thus x4 will first store 2, 10, 35 followed by 44 and 55.

Generate regular sequence of numbers using seq() function

Generate regular sequences.

# Generate a sequence of number from 0 to 10 incremented by 2
x1=seq(from=0, to=10, by=2)
x1;
## [1]  0  2  4  6  8 10
# Generate a sequence of 50 number from 0 to 10
x2=seq(from=0, to=10, length=30)
x2;
##  [1]  0.0000000  0.3448276  0.6896552  1.0344828  1.3793103  1.7241379
##  [7]  2.0689655  2.4137931  2.7586207  3.1034483  3.4482759  3.7931034
## [13]  4.1379310  4.4827586  4.8275862  5.1724138  5.5172414  5.8620690
## [19]  6.2068966  6.5517241  6.8965517  7.2413793  7.5862069  7.9310345
## [25]  8.2758621  8.6206897  8.9655172  9.3103448  9.6551724 10.0000000

In R there are several inbuilt functions which can be used to do certain tasks. Functions can be called by their name followed by (). Inside () various parameters required to the function can be passed as key=value pair. In the above example to x1 will generate sequence of number from 0 (from=0) to 10 (to=10), incremented by 2 (by=2). Thus x2 will contain 0, 2, 4, 6, 8 and 10. Note that from=0, to=10 and by=2 are 3 parameters which we pass to seq() function and these are separated by comma (,). In case of 2nd example, we just replace by parameter with length=50. So we use same function to generate 50 elements between 0 to 10.

Create vector using rep() function

rep replicates the values in x.

# Create a vector x1 with 3 elements
x1=c(2,3,5);
x1;
## [1] 2 3 5
# Repeat each element of x1, 3 times
x2=rep(x1,each=3);
x2;
## [1] 2 2 2 3 3 3 5 5 5
# Repeat x1, 3 times.
x3=rep(x1, times=3);
x3;
## [1] 2 3 5 2 3 5 2 3 5
rep(1:4, each = 2, times = 4);
##  [1] 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4 1 1 2 2 3 3 4 4

x1 is a vector with 3 elements (2, 3, 5). Using rep() function, the first argument we pass is a vector, when we specify each=3, rep() function will repeat each element of x1 3 times and store in x2. Thus x2 will store 2, 2, 2, 3, 3, 3, 5, 5, 5. When we specify times=3, rep() function will repeat the x1 vector 3 times. Thus x3 will store 2, 3, 5, 2, 3, 5, 2, 3, 5.

Create a vector using sample()

sample takes a sample of the specified size from the elements of x using either with or without replacement.

# Fetch 10 random number from the database 1:50
sample(1:50, 10);
##  [1]  2 49 27 31 16 14 33 26 15 10
# Fetch 100 random number from the database 20:30. Since we are fetching more elements than present in database, we need to specify replace=TRUE (Should sampling be with replacement?)
sample(20:30, 100, replace = TRUE);
##   [1] 21 25 25 27 27 24 28 28 21 28 29 25 29 30 29 21 20 22 23 20 25 24 24
##  [24] 27 30 27 29 21 30 20 29 22 21 29 21 30 21 23 24 24 28 30 21 20 28 29
##  [47] 25 26 22 20 22 23 24 29 24 20 24 26 30 23 24 21 30 27 25 22 27 23 29
##  [70] 21 29 28 29 21 20 29 23 29 20 28 24 23 22 22 28 30 22 22 24 20 27 30
##  [93] 27 25 30 29 25 20 28 24

rnorm() function

The Normal Distribution.

# Fetch 50 element from a normal distribution
rnorm(50);
##  [1] -0.33947659  1.18421330  1.26449736 -0.80094934 -0.11101353
##  [6] -0.17163330 -0.57014025  0.89615922 -0.16436141  0.78442586
## [11]  0.47038024 -0.25916965 -1.34957804 -0.49907908  1.09746627
## [16]  1.24201865 -0.67916731 -0.36130357  0.76689316  0.09572528
## [21] -0.78032864 -0.74065061  0.53857052  0.85625262 -0.63400637
## [26]  0.56723804  0.45377112  0.47416407 -0.41020812  0.25702847
## [31] -0.48518615  0.33864976 -1.85886543  1.35713175  0.52229644
## [36] -0.11380629  0.36442696 -1.27874689  1.78400211 -0.68902175
## [41]  0.75980679 -0.84037241 -1.58303509  0.13358215  1.22526276
## [46] -0.30302856 -2.10579478 -0.86309893 -0.75388509  0.95585698
rnorm(50, mean=5, sd=2);
##  [1]  4.591098  4.086869  3.291382  3.236566  4.393003  5.137515  9.624502
##  [8]  4.978015  4.999060  1.951534  3.107941  5.280697  7.120681  2.809229
## [15]  6.328117  4.692529  7.350387  3.825919  6.151406  4.538983  5.030422
## [22]  5.023429  5.909770  4.972700  2.259665  7.044727 11.091493  7.590905
## [29]  8.327381  7.384594  5.434234  5.740811  6.590314  6.496393  6.425664
## [36]  4.455996  7.363049  6.107538  4.787815  5.288657  3.718269  5.204330
## [43]  6.363852  4.539236  2.277308  5.327213  6.291902  5.036022  8.221094
## [50]  6.630806

Task: Now Go to Task page and finish Vector creation

Fetching elements from a vector

x=c(10,20,30,40,50,60);
length(x);
## [1] 6
x[1]; # 1st element
## [1] 10
x[4]; # 4th element
## [1] 40
x[2:3]; # 2nd to 3rd element
## [1] 20 30
x[c(1,3,5)]; # 1st, 3rd, 4th element
## [1] 10 30 50
x[3:length(x)];
## [1] 30 40 50 60
x[-1]; # Exclude 1st element
## [1] 20 30 40 50 60
x[-c(1,3)] # Exclude 1st, 3rd element
## [1] 20 40 50 60
x[10]; # NA: Missing value
## [1] NA
# x[1, 3, 5] # Error

Delete element(s) from a vector

x=10:20;
x;
##  [1] 10 11 12 13 14 15 16 17 18 19 20
x=x[-3];
x;
##  [1] 10 11 13 14 15 16 17 18 19 20
x=x[-c(7,3)];
x;
## [1] 10 11 14 15 16 18 19 20
x=x[-c(2,length(x))];
x;
## [1] 10 14 15 16 18 19

Add element(s) to existing vector

x=10:20;
x;
##  [1] 10 11 12 13 14 15 16 17 18 19 20
x=c(x,55);
x;
##  [1] 10 11 12 13 14 15 16 17 18 19 20 55
x=c(33,x,77);
x;
##  [1] 33 10 11 12 13 14 15 16 17 18 19 20 55 77
y=seq(100,110,0.5);
y;
##  [1] 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5 104.0 104.5 105.0
## [12] 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0 109.5 110.0
x=c(x,y);
x;
##  [1]  33.0  10.0  11.0  12.0  13.0  14.0  15.0  16.0  17.0  18.0  19.0
## [12]  20.0  55.0  77.0 100.0 100.5 101.0 101.5 102.0 102.5 103.0 103.5
## [23] 104.0 104.5 105.0 105.5 106.0 106.5 107.0 107.5 108.0 108.5 109.0
## [34] 109.5 110.0

Replace elements in existing vector

x=sample(20:30, 5);
x;
## [1] 20 22 21 25 24
x[2]=111111;
x;
## [1]     20 111111     21     25     24
x[c(1,3)]=c(555, 777);
x;
## [1]    555 111111    777     25     24

Task: Now Go to Task page and finish Fetching vector elements and Vector manipulation section

Inbuilt functions for numeric vector

x=seq(4, 8, length=10);
x;
##  [1] 4.000000 4.444444 4.888889 5.333333 5.777778 6.222222 6.666667
##  [8] 7.111111 7.555556 8.000000
length(x);
## [1] 10
sort(x);
##  [1] 4.000000 4.444444 4.888889 5.333333 5.777778 6.222222 6.666667
##  [8] 7.111111 7.555556 8.000000
order(x);
##  [1]  1  2  3  4  5  6  7  8  9 10
max(x);
## [1] 8
min(x);
## [1] 4
range(x);
## [1] 4 8
mean(x);
## [1] 6
median(x);
## [1] 6
mode(x);
## [1] "numeric"
sd(x);
## [1] 1.345622
var(x);
## [1] 1.8107
quantile(x);
##   0%  25%  50%  75% 100% 
##    4    5    6    7    8
summary(x);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       4       5       6       6       7       8
sin(x);
##  [1] -0.75680250 -0.96431712 -0.98446429 -0.81332939 -0.48416406
##  [6] -0.06092533  0.37415123  0.73652996  0.95580043  0.98935825
log(x, base=2);
##  [1] 2.000000 2.152003 2.289507 2.415037 2.530515 2.637430 2.736966
##  [8] 2.830075 2.917538 3.000000
log(x, base=10);
##  [1] 0.6020600 0.6478175 0.6892102 0.7269987 0.7617608 0.7939455 0.8239087
##  [8] 0.8519375 0.8782664 0.9030900

Correlation using cor()

x=sample(1:100,20);
y=sample(1:100,20);
x;
##  [1] 95 98 78 52  3 77 85 88 72  2  4  7 41 84 83 87 50 15  8 89
y;
##  [1] 56 27 29 26 95  2 87 73 60 65 55 63 96 79  6 15 54 89 94  7
# Methods: "pearson", "kendall", "spearman"
cor(x,y, method = "spearman");
## [1] -0.437594
cor(x,y, method = "pearson");
## [1] -0.5226678

Set operations

x;
##  [1] 95 98 78 52  3 77 85 88 72  2  4  7 41 84 83 87 50 15  8 89
y;
##  [1] 56 27 29 26 95  2 87 73 60 65 55 63 96 79  6 15 54 89 94  7
union(x,y);
##  [1] 95 98 78 52  3 77 85 88 72  2  4  7 41 84 83 87 50 15  8 89 56 27 29
## [24] 26 73 60 65 55 63 96 79  6 54 94
intersect(x,y); 
## [1] 95  2  7 87 15 89
setdiff(x,y); # x - y
##  [1] 98 78 52  3 77 85 88 72  4 41 84 83 50  8
setdiff(y,x); # y - x
##  [1] 56 27 29 26 73 60 65 55 63 96 79  6 54 94

Arithmetic expressions

Vector recycling. Shorter vector are recycled to match the length of longest vector. Once length of all vectors are equal, then arithmentic operations are performed.

x=c(3,4,5,6);
y=c(6,7,8,9);
z=c(1,2);
p=c(9,10,11);
x+y;
## [1]  9 11 13 15
x+z;
## [1] 4 6 6 8
x+p;
## Warning in x + p: longer object length is not a multiple of shorter object
## length
## [1] 12 14 16 15

Arithmetic operators

x=1:3;
y=6:8;
z=10:12;
x;
## [1] 1 2 3
y;
## [1] 6 7 8
z;
## [1] 10 11 12
x-y;
## [1] -5 -5 -5
x*4;
## [1]  4  8 12
x*y;
## [1]  6 14 24
y/5;
## [1] 1.2 1.4 1.6
y%%5;
## [1] 1 2 3
y^3;
## [1] 216 343 512

Operator precedence

x+2*y+z;
## [1] 23 27 31
x+2*y/z;
## [1] 2.200000 3.272727 4.333333
(x+2)*y+z;
## [1] 28 39 52
(x+2)*(y+z);
## [1]  48  72 100
n=10;
1:n-1;
##  [1] 0 1 2 3 4 5 6 7 8 9
1:(n-1);
## [1] 1 2 3 4 5 6 7 8 9

Task: Now Go to Task page and finish Vector arithmetic

Conditional statements

x=10:30;
x;
##  [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
x > 15;
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
x == 15;
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
x > 15 & x %% 2==0;
##  [1] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE
## [12] FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE
x[x > 15 & x %% 2==0];
## [1] 16 18 20 22 24 26 28 30
x > 15 | x %% 2==0;
##  [1]  TRUE FALSE  TRUE FALSE  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
x[x > 15 | x %% 2==0];
##  [1] 10 12 14 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
# which() function returns the indices which satisfies the condition to TRUE
tempind=which(x > 15 & x %% 2==0);
tempind;
## [1]  7  9 11 13 15 17 19 21
x[tempind];
## [1] 16 18 20 22 24 26 28 30

Check for missing values

x=c(1:5,NA,NA,2:3,NA,NA,3);
is.na(x);
##  [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE
## [12] FALSE
which(is.na(x));
## [1]  6  7 10 11

Check data types and data structures

x1=1:10;
str(x1);
##  int [1:10] 1 2 3 4 5 6 7 8 9 10
x2=c(1.2, 2.3);
str(x2);
##  num [1:2] 1.2 2.3
x3=c("aaa","bbb");
str(x3);
##  chr [1:2] "aaa" "bbb"
x4=c('ccc','ddd');
str(x4);
##  chr [1:2] "ccc" "ddd"
x5=c(T,F);
str(x5);
##  logi [1:2] TRUE FALSE
str(letters);
##  chr [1:26] "a" "b" "c" "d" "e" "f" "g" "h" "i" ...

Implicit Data type conversion

Conversion Order: Logical -> Integer -> Numeric -> character

x1=c(1, "abc", TRUE);
str(x1);
##  chr [1:3] "1" "abc" "TRUE"
x2=c(1,TRUE);
str(x2);
##  num [1:2] 1 1

Explicit Data Type Conversion

x=c("1","2","3");
str(x);
##  chr [1:3] "1" "2" "3"
x=as.numeric(x);
str(x);
##  num [1:3] 1 2 3
x=1:5;
str(x);
##  int [1:5] 1 2 3 4 5
x=as.character(x);
str(x);
##  chr [1:5] "1" "2" "3" "4" "5"