Chapter 5 Data Visualization

5.1 Base Graphics

R has high-level and low-level plotting functions.

5.1.2 Saving plot

Steps: Create a graphics device (BMP, JPEG, PNG, TIFF) then plot and switch off the device.

## png 
##   2

5.1.3 Add title/subtitle/x-axis label/y-axis label to plot

5.1.6 Explore built-in colors

##  [1] "white"          "aliceblue"      "antiquewhite"   "antiquewhite1" 
##  [5] "antiquewhite2"  "antiquewhite3"  "antiquewhite4"  "aquamarine"    
##  [9] "aquamarine1"    "aquamarine2"    "aquamarine3"    "aquamarine4"   
## [13] "azure"          "azure1"         "azure2"         "azure3"        
## [17] "azure4"         "beige"          "bisque"         "bisque1"       
## [21] "bisque2"        "bisque3"        "bisque4"        "black"         
## [25] "blanchedalmond" "blue"           "blue1"          "blue2"         
## [29] "blue3"          "blue4"          "blueviolet"     "brown"         
## [33] "brown1"         "brown2"         "brown3"         "brown4"        
## [37] "burlywood"      "burlywood1"     "burlywood2"     "burlywood3"    
## [41] "burlywood4"     "cadetblue"      "cadetblue1"     "cadetblue2"    
## [45] "cadetblue3"     "cadetblue4"     "chartreuse"     "chartreuse1"   
## [49] "chartreuse2"    "chartreuse3"
## function (mode = "logical", length = 0L) 
## .Internal(vector(mode, length))
## <bytecode: 0x559bee972fd8>
## <environment: namespace:base>
## [1] "#FF0000FF" "#CCFF00FF" "#00FF66FF" "#0066FFFF" "#CC00FFFF"

5.1.14 Low level plotting functions (lines)

lines() draws lines on a plot. To draw lines we need x and y coordinates of points. The syntax: pass x- coordinates of all points as 1st argument and all y- coordinates as 2nd argument. lty and lwd can alsoplot(x, type=“o”, xlab = “Index”, ylab = “Expression values”, main= “Scatter plot”, lwd = 2, col = veccol, pch = 15, lty=2, cex.lab=1.2, cex.axis=1.2, cex= 1.2, cex.main=1.2, las = 1, ylim=c(0,150));

be used.

5.1.15 Low level plotting functions (legend)

legend() draws legends on a plot. To draw legend we need to specify where to draw legend by specifying x and y coordinates as 1st and 2nd argument. The legends to be written as 3rd argument. Colors of legends are passed by col parameter while lwd sets line width.

5.1.16 Adding texts to existing plot (text)

text() adds texts to existing plots. To add text, we need x- and y- coordinates which we pass as 1st and 2nd arguments respectively. Third argument (labels) is the texts that are to be written on the plot. Since we want to write the values of x on the points, the x and y coordinates of text will be exactly same as the points i.e. x-coordinates will be 1 to 50 while y coordinates will be value of x itself. The text to be added will also be x.

pos=3 will add the text on the top of point. pos=1/2/3/4 means below/left/top/right to point

cex=0.6 decrease the text size.

Offset is usedbarplot(A, names.arg = c(“Day1”,“Day2”,“Day3”,“Day4”,“Day5”), xlab=“Days”, ylab=“Revenue”, border=“red”); to put space between point and text.

5.1.17 Explore Line Plot

Create 3 vectors x1, x2, x3 with 50 elements with different range values. To visualize x1, x2 and x3 as lines, we need to first draw x1 using high-level plot() command then x2 and x3 can be drawn using low-level lines() command by adding lines to the already existing plots.

Below code plot 3-lines but we can’t see x2 and x3 lines since the plot(x1) will plot using x1 whose y-axis margin lies between 0-50 but x2 values between 50-100 and x3 between 100-150. So x2 and x3 will be out-of margin on the plot.

To correctly see x2 and x3, we fi{#morelines}rst set the ylim to 0-200

5.1.20 Grouped Bar plots

The argument beside=TRUE will convert stacked to grouped bar plot

space=c(0.1, 1) specifies to add 0.1 spacing between bars and spacing of 1 between each groups.

5.1.22 Density plot

density(x) function computes kernel density estimates of x which is plotted by plot(density(x)).

5.1.24 Box plots

Create a matrix of 1000*4 dimensions. Let’s assume that these are the gene expression values of 1000 genes measured across two days among control and treatment sample. Columns represents: Control-Day1, Control-Day2, Treatment-Day1 and Treatment-Day2.

##      x1  x2  x3  x4
## [1,] 30 139 206 370
## [2,] 27 134 269 345
## [3,] 93 162 227 379
## [4,] 40 170 278 380
## [5,] 15 194 258 322
## [6,] 26 142 206 316

5.1.26 Venn diagram

## 
## Attaching package: 'gplots'
## The following object is masked from 'package:stats':
## 
##     lowess

5.1.27 Heatmap

##          D1    D2    D3    D4    D5
## Gene1  0.59  1.01  0.36 -1.25 -1.11
## Gene2 -0.76 -1.42 -0.09 -0.93 -0.45
## Gene3 -0.37  0.76  0.06 -1.31  0.94
## Gene4 -0.19  1.59 -0.15  1.01  0.74
## Gene5  1.45  0.26  0.46 -0.30 -0.59
## Gene6 -1.24 -0.14  1.15  2.16 -2.72

## Warning in heatmap.2(m1, trace = "none", Colv = FALSE): Discrepancy: Colv
## is FALSE, while dendrogram is `both'. Omitting column dendogram.

## Warning in heatmap.2(m1, trace = "none", Rowv = FALSE): Discrepancy: Rowv
## is FALSE, while dendrogram is `both'. Omitting row dendogram.

The argument (n = 149) lets us define how many individuals colors we want to have in our palette. Obviously, the higher the number of individual colors, the smoother the transition will be; the number 149 should be sufficiently large enough for a smooth transition. By default, RColorBrewer will divide the colors evenly so that every color in our palette will be an interval of individual colors of similar size. However, sometimes we want to have a little skewed color range depending on the data we are analyzing. Our example dataset (m1) ranges from –3 to 3, and we are particularly interested in samples that have a (relatively) high expression: R values in the range between 2 to 3 and -2 to -3. In this case, we can define our color breaks “unevenly” by using the following code:

## Warning in image.default(z = matrix(z, ncol = 1), col = col, breaks =
## tmpbreaks, : unsorted 'breaks' will be sorted before use

## Warning in image.default(z = matrix(z, ncol = 1), col = col, breaks =
## tmpbreaks, : unsorted 'breaks' will be sorted before use

## Warning in image.default(z = matrix(z, ncol = 1), col = col, breaks =
## tmpbreaks, : unsorted 'breaks' will be sorted before use

5.1.28 Scatter plot Matrices

pairs() function is useful to draw a matrix of scatterplots. This is useful to get a global view of data distribution.

##   x1 x2 x3 x4
## 1 39 35 79 26
## 2 13  3 80 59
## 3 12 81 98 76
## 4 96 67 68  2
## 5 16 83 85 88
## 6 84 55 80 21

5.1.31 PCA (Prinicipal Component Analysis)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
## Importance of components:
##                           Comp.1    Comp.2     Comp.3      Comp.4
## Standard deviation     1.7083611 0.9560494 0.38308860 0.143926497
## Proportion of Variance 0.7296245 0.2285076 0.03668922 0.005178709
## Cumulative Proportion  0.7296245 0.9581321 0.99482129 1.000000000

Plot PCA objects using ggfortify pacakge

## Loading required package: ggplot2

Plot PC1 vs PC2

Plot PC2 and PC3

Draw Frame (draws convex for each cluster)

Pass original data for additional features

Draw PCA loadings

Show frame

Show Labels

Explore ggfortify package for further details. https://cran.r-project.org/web/packages/ggfortify/vignettes/plot_pca.html

5.1.32 Classical (Metric) Multidimensional Scaling

Multidimensional scaling takes a set of dissimilarities and returns a set of points such that the distances between the points are approximately equal to the dissimilarities.

##          D1    D2    D3    D4    D5
## Gene1  0.59  1.01  0.36 -1.25 -1.11
## Gene2 -0.76 -1.42 -0.09 -0.93 -0.45
## Gene3 -0.37  0.76  0.06 -1.31  0.94
## Gene4 -0.19  1.59 -0.15  1.01  0.74
## Gene5  1.45  0.26  0.46 -0.30 -0.59
## Gene6 -1.24 -0.14  1.15  2.16 -2.72

5.1.35 Network graphs

## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
## 
##     as_data_frame, groups, union
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union
##     G1 G2 G3 G4 G5 G6 G7 G8 G9 G10
## G1   0  1  0  0  0  0  1  1  0   1
## G2   1  0  1  0  1  1  0  1  0   1
## G3   1  1  0  1  1  1  1  0  0   1
## G4   0  0  0  1  1  1  1  0  0   0
## G5   1  0  1  0  1  0  0  0  1   1
## G6   0  1  1  0  0  1  0  1  0   0
## G7   0  0  0  1  1  0  1  0  0   1
## G8   1  0  1  0  1  0  1  1  1   1
## G9   1  0  1  1  1  0  0  0  0   0
## G10  1  1  0  1  0  1  0  1  1   1
## IGRAPH 7aa6332 UN-- 10 35 -- 
## + attr: name (v/c)
## + edges from 7aa6332 (vertex names):
##  [1] G1--G2  G1--G3  G1--G5  G1--G7  G1--G8  G1--G9  G1--G10 G2--G3 
##  [9] G2--G5  G2--G6  G2--G8  G2--G10 G3--G4  G3--G5  G3--G6  G3--G7 
## [17] G3--G8  G3--G9  G3--G10 G4--G5  G4--G6  G4--G7  G4--G9  G4--G10
## [25] G5--G7  G5--G8  G5--G9  G5--G10 G6--G8  G6--G10 G7--G8  G7--G10
## [33] G8--G9  G8--G10 G9--G10

Explore http://www.r-graph-gallery.com/portfolio/network/ for advanced network graphs.

5.2 Plot using ggplot2

5.2.1 Scatter plot

## # A tibble: 6 x 10
##   carat cut       color clarity depth table price     x     y     z
##   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
## 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
## 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
## 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
## 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
## 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
## 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48

5.2.3 Add geom_smooth() layer, linear modeling

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

5.2.4 Explore aesthetic parameter “col”

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

5.2.5 Assign aes() to individual layer

## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

5.2.6 Explore aesthetic parameter “shape”

## Warning: Using shapes for an ordinal variable is not advised
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).

5.2.7 Add axis lables and plot title using labs()

## Warning: Using shapes for an ordinal variable is not advised
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).

5.2.8 Change color pelette

## Warning: Using shapes for an ordinal variable is not advised
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).

5.2.9 Save the ggplot object and then print.

## Warning: Using shapes for an ordinal variable is not advised
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: The shape palette can deal with a maximum of 6 discrete values
## because more than 6 becomes difficult to discriminate; you have 8.
## Consider specifying shapes manually if you must have them.
## Warning: Removed 5445 rows containing missing values (geom_point).

5.2.11 Adjusting the legend title

You can change legned title. Based on the type of legend ggplot2 provides different function. For a legend representing color and if the color attribute is derived from discrete values, use scale_color_discrete() function. If legend correspond to shape and discrete use scale_shape_discrete(). Other functions are scale_shape_continuous(name=“legend title”). For fill attribute: scale_fill_continuous(name=“legend title”)