Chapter 8 Descriptive statistics

8.3 Handle missing values

## [1] 11  2 35 46 55
## [1] 11  2 35 46 55 NA
## [1] NA
## [1] 29.8
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     2.0    11.0    35.0    29.8    46.0    55.0       1

8.3.1 Shape / Data Distribution

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

8.4 Estimate Skewness and Kurtosis

Load the moments library

Calculate skewness. Skewness is a measure of symmetry.

Negative skewness: mean of the data < median and the data distribution is left-skewed.

Positive skewness: mean of the data > median and the data distribution is right-skewed.

## [1] 0.3117531
## [1] 0.3157671
## [1] -0.2721277
## [1] -0.1019342

Estimate kurtosis. kurtosis describes the tail shape of the data distribution.

The normal distribution has zero kurtosis and thus the standard tail shape. It is said to be mesokurtic.

Negative kurtosis would indicate a thin-tailed data distribution, and is said to be platykurtic.

Positive kurtosis would indicate a fat-tailed distribution, and is said to be leptokurtic.

## [1] 2.426432
## [1] 3.180976
## [1] 1.604464
## [1] 1.663933

8.5 Further with Skewness and Kurtosis.

Source: http://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm

Many classical statistical tests and intervals depend on normality assumptions. Significant skewness and kurtosis clearly indicate that data are not normal. If a data set exhibits significant skewness or kurtosis (as indicated by a histogram or the numerical measures), what can we do about it?

One approach is to apply some type of transformation to try to make the data normal, or more nearly normal. The Box-Cox transformation is a useful technique for trying to normalize a data set. In particular, taking the log or square root of a data set is often useful for data that exhibit moderate right skewness.

Another approach is to use techniques based on distributions other than the normal. For example, in reliability studies, the exponential, Weibull, and lognormal distributions are typically used as a basis for modeling rather than using the normal distribution.