15 Appendix
15.1 Brief introduction to Matrix
A vector is has length one. A matrix is an ordered array in 2 dimensions. A tensor is an ordered array in 3 dimensions.
Matrix are described in terms of rows and columns. A 4 by 5 matrix has 4 rows and 5 columns. Similarly a vector is a matrix with n rows and 1 column.
15.1.1 Mathematical operations
Addition is simple in matrix as long as matrices are of the same size so that they respect rows and columns.
\(A+B = B+A\)
Subtraction
\(A-B \neq B-A\)
Multiplication
\(A*B \neq B*A\)
15.1.2 Trace, Determinants, Rank
For a 2 rows x 2 columns matrix, \[M=\begin{bmatrix} a & c \\ b & d \end{bmatrix}=\begin{bmatrix} 2 & 3 \\ 1 & 4 \end{bmatrix}\]
It’s transpose is \[M^T =\begin{bmatrix} 2 & 1 \\ 3 & 4 \end{bmatrix}\]
The product of \(A^T A\) is given by \[\begin{bmatrix}2^2+1^2 & 2*3 + 1*4\\ 2*3+1*4 & 3^2 +4^2\end{bmatrix} \]
The determinant of a square matrix \(det(λ*I-M)\) is zero.
15.1.3 Invertible matrix
An invertible matrix M is a square matrix which when multiplied by its inverse \(M^{-1}\) result in a identity matrix where the diagonal consists of ones. The significance of invertible matrix will be seen in the section on collinearity when dealing with regression analysis. A positive semi-definite matrix is invertible and has full rank. It is defined as a matrix which can be obtained by the multiplication of a matrix and its transpose (denoted by T in upper case).
It is given as \(\begin{pmatrix}t-2 & 3\\ 1 & t-4\end{pmatrix}\)
The Trace(M) is the sum of the elements of the diagonal of a matrix.
A matrix in which the columns are linearly related are said to be rank deficient. An example is a matrix A with linearly related vectors. Column 1 is a multiple of column 2. \[A=\begin{bmatrix} 2 & 1 \\ 4 & 2 \end{bmatrix}\]
The rank of a given matrix is an expression of the number of linearly independent columns of that matrix. Given that row rank and column rank are equivalent, rank deficiency of a matrix is expressed as the difference between the lesser of the number of rows and columns, and the rank of the matrix. A matrix with rank of 1 is likely to be linearly related. Matrix which cannot be inverted is termed singular.
15.1.4 Vector and matrix norm
The vetor norm can be considered the distance of the vector from its origin and the matrix norm is the size of the matrix. The Frobenius norm is a measure of the size of the matrix. For the 2 x 2 matrix M above, It is given by
\(||M||^2=(A^T *A)=5+25=30\)
15.1.5 Sparse matrix
Sparse matrix is a matrix populated mostly by zeros. In a network sense it implies lack of cohesion in the network. Inverting sparse matrix is challenging due to the reduced rank associate with this type of matrix. The solutions require various form of penalisation.
15.1.6 Eigenvalue and Eigenvector
Eigenvector (characteristic vector) and eigenvalue (characteristic value) are useful in solving system of linear equations and providing quicker solution to solving task such as finding the nth power of a matrix. The eigenvector v of a n x n square matrix M is defined as a non-zero vector of M such that the product of M is equal to the product of the scalar eigenvalue λ and v.
Eigenvector analysis can be used to describe pattern. Here, the eigenvector has an interpretation in term of the direction of the data and the eigenvalue provides a scaling measure of the length or change in direction of the vector (when both are multiplied). Using the description above regarding finding the nth power of a matrix M, the eigenvectors remain unchanged but the eigenvalues change in proportion to the nth power of M.
The quadratic formula can be used to solve for eigenvalue. It’s given by
\((a- λ)(d- λ)-bc=λ^2 -(2+4)λ+(2*4-1*3)=λ^2 -trace(M)+determinant(M)\)
This can be reduced to \(λ^2-6λ+5=(λ-5)(λ-1)\)
The eigenvalues λ for this matrix M is 5 and 1
In terms of applications, eigen decomposition approach provides a non-bias method for discovering pattern in complex data.
In the section on regression, we can show that the eigenvalue can be interpreted in term of the variance of data. Eigenvalue can be described
In geometric term, the eigenvalue describes the length of its associated eigenvector and in regression analyss, it can be considered as the variance of the data. Eigenvalue is calculated by finding the solution for the root of characteristic polynomial equation.
15.2 Matrix approach to Regression
In the section below, a matrix approach to regression is provided. This step was performed as it explains for issues encountered when doing regression analyses. It is easier to explain collinearity using matrix. Examples of performing regression analyses and steps required to check for errors are given.
15.2.1 Linear regression
A matrix description of parameter estimation is given here because it is easier to describe the multiple regression (Smith 1998). \(Y= \beta_1X_1+ \beta_2X_2+...\beta_jX_j\) A matrix is an array of data as rows and columns. For the imaging data, the individual voxel is represented on each column and each row refers to another patient. A vector refers to a matrix of one column.
In matrix form, the multiple regression equation takes the form \(Y= \beta_j X + E\) where X is the predictor matrix, Y is the dependent matrix, β is the regression coefficient and E is the error term (not the intercept). The X is a \(j^{th}\) rows by ith columns matrix and Y and E is a \(j^{th}\) column vector.
Algebraic manipulation of equation 1, shows that the solution for β is \((X^TX)^{-1}X^TY\). \(X^T\) is the transpose of X such that the columns of are now written as rows. The correlation matrix of X is given by \(X^TX\). The solution for β is possible if the correlation matrix \(X^TX\) can be inverted.
The inverse of a matrix can be found if it is has a square shape (columns and rows of the matrix are equal) and the determinant of the matrix is not singular or nonzero (Smith 1998). The uniqueness of the matrix inverse is that when a matrix is multiplied by its inverse, the solution is an identity matrix. The diagonal elements of a matrix are ones and the remainders of the square matrix are zeros for an identity matrix. For a rectangular matrix, multiplication of the matrix by its Moore-Penrose pseudo-inverse results in an identity matrix.
The terms in the inverse of \(X^TX\) are divided by the determinants of \(X^TX\) . For simplicity, the determinant of a 2 x 2 matrix is given by \(ad-bc\) for a matrix A. \(A=\left[\begin{array}{cc}a & b\\c & d\end{array}\right]\) The inverse of this matrix A is given by \(\frac{1}{ad-bc}\left[\begin{array}{cc}d & -b\\-c & a\end{array}\right]\) From this equation, it can be seen that the determinant and the inverse exist if the result is nonzero. For a correlation matrix \(X^TX\) of the form \(\left[\begin{array}{cc}n & nX_*\\nX_* & nX_*^2\end{array}\right]\) then the determinant is zero. \(n \times nX_*^2 -nX_* \times nX_*\) Hence, there is no unique solution for this equation. If near collinearity exists and the determinant approach zero, there are infinite possible combinations that can result in a least squares estimate of the parameter. In this example, the matrix is singular then the columns of X are likely to be linearly related to each other (collinearity). In this case, the regression coefficient is unstable with the variance of the regression coefficient large. Further, small changes in the dependent variables lead to fluctuations in the regression solution.
15.2.1.1 Least squares
The least squares solution for the parameter β refers to the fitting of the line between the intercept and the variables of X such that the Euclidian distance between the observed variables Y and expected or predicted variables are as small as possible. The metric for the fit is the sum of squared errors SSE (or residual mean square error/residual sum of squares) and is given by \(SSE=\sum(Y-\beta X)^T(Y-\beta X)\) The variance-covariance matrix of \(\beta\) is given by \(Var(\beta)=\delta^2(X^TX)\)
15.2.1.2 Weighted least squares
In the section above, it was not stated explicitly but the least squares regression model is appropriate when the variances of the predictor variables are uniform (Smith 1998). In this case the variance of the error matrix is a diagonal matrix with equal diagonal elements. When there are unreliable data or errors in measurement of some of the data, the variance of the error matrix contains unequal diagonal elements. The consequence is that the least squares formula leads to instability of the parameter estimate. Weighted least squares regression is similar to least squares regression except that the variance matrix is weighted \(w\) by the variance of the columns of the predictor variables. \(\beta=(X^TV^-1)^-1X^TV^-1Y\).The diagonal matrix \(V\) contains weights expressed as 1/w along the diagonal elements. These weights are used to down play the importance of the regions where noise occurs and gives appropriate importance to the true data region. The result is a reduction in the variance of the regression coefficients and hence stability in their estimate. Weighted least squares is introduced here because the weighted PLS is used in the PLS-PLR model.
15.2.1.3 Collinearity
Collinearity or relatedness among the predictors is often forgotten in many analysis. This issue can lead to instability in the regression coefficients. There are several tests for collinearity: variance inflation factor and condition index. The variance inflation factor (VIF) is proportional to \(VIF = 1/(1-R^2)\). In this example, as the predictors become strongly correlated \(R^2\) apporaches 1 and VIF will approaches infinitity. Collinearity is present if VIF >10(Kleinbaum, Kupper, and Muller 1978). Collinearity can also be assessed by measuring the condition index (Phan et al. 2006). This can be given as ratio between the largest and the corresponding eignvalue
\(CI_i=\sqrt(\frac{\lambda_{max}}{\lambda_i})\).
Collinearity is present when the condition index is >30(Kleinbaum, Kupper, and Muller 1978).
15.2.1.4 Penalised regression
Penalized or Ridge regression is a method used to overcome collinearity in the columns of the predictor variables. From the discussion of collinearity given above, the inverse solution can be found by introducing a bias term to the correlation matrix. The effect of this bias term is that it leads to restriction in the size of the variance for the parameter estimate 𝛽 . The tuning parameter 𝜆 is added to the diagonal elements of the matrix to be inverted \(X^T X\) to encourage non-singularity. The mean squared error of parameter estimates decreases as 𝜆 value increases for a certain extent beyond zero.
15.2.2 Logistic regression
In logistic regression, there is no algebraic solution to determine the parameter estimate (β coefficient) and a numerical method (trial and error approach) such as maximum likelihood estimate is used to determine the parameter estimate. The General Linear Model \(Y=\beta X\) is modified to the form of the Generalised Linear Model (GLIM) by adding a non-linear link function \(g(\mu)\) . The equation now resembles \(Y=g(\beta X)\).
Consider the binary response (1, 0) as proportions of the predictor variables. is the probability of an event and is the probability of an event not occurring. The odds ratio \(OR\) is given by \(OR=\frac{1}{1-p}\). A logit transformation take the form \(logit_i(p_i)=ln(\frac{p_i}{1-p_i})=\sum(X_ij\beta_i)\). The logistic equation takes the form \(p_i=\frac{e^{X_j\beta_j}}{1-e^{X_j\beta_j}}\).