# 7.3 Constrained Optimization

Recap [Quadratic form] Assume A symmetric.

x^TAx

Examples

• x^2+y^2 (PD), x^2-y^2 (ID), x^2 (PSD), -x^2-0.5y^2 (ND)

• 2x^2+2xy+2y^2 eigenvectors: (1,1)/\sqrt2, (1,-1)/\sqrt2 eigenvalues 3,1

Recap [Change of Variables] to principal axes

Let A be an n\times n symmetric matrix. Then there exists an orthogonal change of variables x=Py that transforms x^TAx into y^TDy with no cross-product terms.

Example What values does x^2-2y^2 take on on the unit circle?

Terminology: Let m=min\{ x^TAx:\|x\|=1\}, M=\max\{ x^TAx:\|x\|=1\}.

Theorem 6 Let A be a symmetric matrix with m and M defined as above.

• Then M is the greatest eigenvalue and m is the least eigenvalue of A (as a real number, not in magnitude).

• x^TAx=M \Leftarrow x \text{EV corresponding to M}

• x^TAx=m \Leftarrow x \text{EV corresponding to m}

ExampleMaximize, minimize 2x^2+2xy+2y^2 eigenvectors: (1,1), (1,-1) eigenvalues 3,1

Proof Diagonalize A=PDP^{-1}, x=Py. P^TP=I. \|x\|=\|y\|. Then x^TAx=y^TDy\le \lambda_1 y^Ty=\lambda_1 x^Tx=\lambda_1. Pick x=Pe_1, achieved. Similarly for m.

Theorem 8 Let A be a symmetric n\times n matrix with D=\mathrm{diag}(\lambda_1,\dots,\lambda_n) in order, with P=(u_1,\dots,u_n) and P^TP=I such that A=PDP^{-1}.

Then the maximum value subject to the constraints x^Tx=1, x^Tu_1=0,\dots,x^Tu_{k-1}=0 is the eigenvalue \lambda_k, attained for x=u_k.

ExampleMaximize, minimize 2x^2+2xy+2y^2 eigenvectors: (1,1), (1,-1) eigenvalues 3,1 subject to x+y=0.

Example [Parks and Rec.] Suppose cost of building x hundred miles of road and improving y acres of parks is

4x^2+9y^2\le36

Utility q(x,y)=xy, indifference curve.

Renormalize to unit circle.

# 7.4 Singular value decomposition

Seems we can understand symmetric linear maps in terms of their effect on circles along certain axes. What about general (even rectangular) linear maps?

Example Consider

A=\begin{pmatrix} 4 & 11 & 14\\ 8 & 7 & -2 \end{pmatrix}

Unit circle to ellipse: (18,6) major (3,-9) minor

What can we say about \|Ax\|? Leads naturally to A^TA, which is symmetric.

A^TA=\begin{pmatrix} 80&100&40\\ 100&170&140\\ 40&140&200\end{pmatrix}

Eigenvalues 360, 90, 0.

Eigenvectors v_1=(1/3,2/3,2/3), v_2=(-2/3,-1/3,2/3), v_3=(2/3,-2/3,1/3)

Maximum value of \|Ax\|^2 attained for x=v_1. What value of Ax attains the point? What's the maximum value of \|Ax\|?

## Singular Values

Let A be an m\times n matrix.

Definition [Singular value] Square roots of the eigenvalues of A^TA are called singular values of A, written \sigma_i=\sqrt{\lambda_i}. Usually ordered \sigma_1\ge \cdots \ge \sigma_n.

Size A^TA? Why can I take square root? (\|Av_i\| nonnegative)

Example

• Singular values from examples above?
• When is the second singular value attained? (orthogonal to v_1)

Notice: Av_1\perp Av_2!

Theorem Suppose \{v_1,\dots,v_n\} is an orthonormal basis of \mathbb R^n consisting of eigenvectors of A^T A, arranged so that the corresponding eigenvalues of A^TA satisfy \lambda_1\ge \cdots\ge \lambda_n. Also suppose that A has r nonzero eigenvalues.

Then:

• \{Av_1,\dots,Av_r\} is an OB for \mathrm{Col} A.

• \mathrm{rank} A=r.

Proof

• Calculate (Av_i)^T(Av_j).

• Av_i\ne 0\Leftrightarrow 1\le i\le r.

• So Av_1,\dots,Av_n are linearly independent.

• They are in \mathrm{Col} A--obviously.

• They span \mathrm{Col} A because pick y\in\mathrm{Col} A, expand x in full basis.

• Thus \mathrm{rank}A =r.

Remark [Numerics] Rank is hard to get numerically. Row echelon form accumulates roundoff. SVD to the rescue! (Also: effective rank) (But: in general not a simple problem.)

## The Singular Value Decomposition

The previous theorem made it possible to capture the column space of A by applying A to the eigenvectors of A^TA. So: natural to project into eigenspace (of A^TA), apply stretches, land in column space.

Theorem [Singular Value Decomposition] Let A be an m\times n matrix with rank r.

• Then there exists an m\times n matrix \Sigma which is zero outside an r\times r non-zero diagonal subblock D, where D=\mathrm{diag}(\sigma_1, \dots,\sigma_r).

• There also exists an m\times m orthogonal matrix U

• There also exists an n\times n orthogonal matrix V

such that

A=U\Sigma V^T

Remark [Uniqueness] Anything that looks like this is called an SVD. \Sigma unique, U, V not.

Definition [Singular vectors]

• Columns of U: left singular vectors

• Columns of V: right singular vectors

============END OF CLASS=============

Proof

• Let u_i=Av_i normalized (i=1,\cdots,r)

• Extend to ONB.
• Create orthogonal matrices U and V, sizes match.

• Check that U\Sigma=AV. (V invertible!)

Example Create the SVD from the example.

Example 2 Find an SVD of \begin{pmatrix} 1 & 1\\ 0 & 1\\ -1 & 1\end{pmatrix}. A^TA=diag(2,3).

Example 3 Find an SVD of \begin{pmatrix} 1 & -1\\-2 & 2 \\ 2 & -2\end{pmatrix}. A^TA=[9,-9;-9;9]. Eigenvalues are 18,0.

Remark [Numerics] Using A^TA amplifies errors. Not used in real-world matrix code.

## Applications

Definition [Condition number] Digit loss through base-10 cancellation. \sigma_1/\sigma_n

• Col A is spanned by u_i (Thm 9)

• (Col A)^\perp=Nul A^T is spanned by u_i (i>r) (Thm 3, 6.1)

• Nul A is spanned by v_i (i>r)

• Row A=(Nul A)^\perp is spanned by v_i

Then:

• A maps Row A to Col A=Row A^T

• A maps Nul A to 0
• Nul A^T is left over

Theorem [Invertible Matrix] Let A be an n\times n matrix. All of the following are equivalent:

• (\mathrm{Col} A)^\perp=\{0\}

• (\mathrm{Nul} A)^\perp=\mathbb{R}^n

• Row A=\mathbb{R}^n.

• A has n nonzero singular values.

Definition [Reduced SVD] (can chop zeros)

Question: How to invert the SVD?

Definition [Moore-Penrose Pseudoinverse]

Remark [Least squares solutions] Consider an MP \hat x solution. Then A\hat x=U_rU_r^Tb is the orthogonal projection onto the column space of A, and thus a least-squares solution (Thm 10, Sec. 6.3).