# Image Compression In 10 Lines of R Code

ModelingRTools & Languagesimage compressingposted by Leihua Ye December 9, 2019 Leihua Ye

Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction

However, do you know we can use PCA to compress images?

In this post, I’ll walk through the process and explain how PCA can compress images in 10 lines of R code with simple maths described in the end.

*[Related Article: Automating Image Annotation with MAX]*

**# Install packages and load libraries**

#install.packages(“tidyverse”) #install.packages(“gbm”) #install.packages(“e1071”) #install.packages(“imager”) library(tidyverse) library(tree) library(randomForest) library(gbm) library(ROCR) library(e1071) library(imager) # load the dataset. This is a 100*100*1000 array of data. An array is a generalization of a matrix to more than 2 dimensions. The first two dimensions index the pixels in a 100*100 black and white image of a face; the last dimension is the index for one of 1000 face images. The dataset can be accessed at: https://cyberextruder.com/face-matching-data-set-download/. load(“faces_array.RData”) #PAC requires a single matrix. so, we need to transform the 100*100 matrix into a single vector (10,000). face_mat <- sapply(1:1000, function(i) as.numeric(faces_array[, , i])) %>% t # To visualize the image, we need a matrix. so, let's convert 10000 dimensional vector to a matrix plot_face <- function(image_vector) { plot(as.cimg(t(matrix(image_vector, ncol=100))), axes=FALSE, asp=1) } plot_face(face_mat[, sample(1000, 1)])

Here, we are trying to obtain the basic information of the dataset and constructing a new function for analysis.

**#check the average face**

face_average = colMeans(face_mat) plot_face(face_average)

To a large extent, we can understand “average face” as the baseline for other images. By adding or subtracting values from the average face, we can obtain other faces.

**#The above code doesn’t count to the 10 lines limit.#**

**# And, here it goes our 10 lines of code #**

# generate PCA results; # scale=TRUE and center=TRUE --> mean 0 and variance 1 pr.out = prcomp(face_mat,center=TRUE, scale=FALSE) # pr.out$sdev: the standard deviations of the principal components; # (pr.out$sdev)²: variance of the principal components pr.var=(pr.out$sdev)² # pve: variance explained by the principal component pve = pr.var/sum(pr.var) # cumulative explained variance cumulative_pve <-cumsum(pve)#see the math explanation attached in the endU = pr.out$rotation Z = t(pr.out$x) # Let's compress the 232nd face of the dataset and add the average face back and create four other images adopting the first 10,50,100, and 300 columns. par(mfrow=c(1,5)) plot_face(face_mat[232,]) for (i in c(10,50,100,300)) { plot_face((U[,1:i]%*%Z[1:i,])[,232]+face_average) }

We did it! The results are not too bad. We have the original image on the far left, followed by the four compressed images.

Simple Math Explanations.

PCA is closely related to the singular value decomposition (SVD) of a matrix. So, **x** = UD(V^T) = **z***(V^T),

where **x **is the 1000*10000 matrix,

- V: the matrix of eigenvectors (rotation returned by prcomp)
- D: standard deviation of the principal components (sdev returned by prcomp)
- So,
**z**= UD (the coordinates of the principal components in the rotated space (prcomp$x).

In other words, we can compress the images using the first k columns of V and the first k columns of z:

End of math.

*[Related Article: Wonders in Image Processing with Machine Learning]*

*Originally Posted Here*