What is the difference between LU and QR decomposition?

LU decomposition factors a square matrix into lower and upper triangular matrices (PA = LU) and is used primarily for solving linear systems Ax = b. It costs about 2n^3/3 operations. QR decomposition factors any matrix into an orthogonal matrix Q and upper triangular R (A = QR) and is used for least-squares problems and eigenvalue algorithms. QR costs about 2mn^2 operations but is more numerically stable. Use LU for square systems and QR for rectangular or ill-conditioned systems.

When should I use SVD instead of eigendecomposition?

Use SVD when you need to work with any matrix (rectangular, singular, or defective), want to find the rank or pseudoinverse, need the best low-rank approximation, or are doing PCA on a data matrix. Use eigendecomposition when you need eigenvalues and eigenvectors specifically, such as for stability analysis, matrix powers, matrix exponentials, or dynamical systems. For symmetric matrices, the two are closely related: singular values equal the absolute values of eigenvalues.

Why is Cholesky decomposition faster than LU?

Cholesky decomposition (A = LL^T) exploits the symmetry and positive definiteness of the matrix. Because L^T is just the transpose of L, only one triangular factor needs to be computed, cutting the work roughly in half compared to LU (n^3/3 vs 2n^3/3 flops). It also requires no pivoting since SPD matrices are inherently well-conditioned for this factorization. However, Cholesky only works for symmetric positive definite matrices.

What is the best decomposition for least squares problems?

QR decomposition is the standard choice for least-squares problems when the matrix has full column rank. It is fast and numerically stable. If the matrix may be rank-deficient, use SVD instead, which provides the minimum-norm least-squares solution and reveals the numerical rank. Avoid forming the normal equations (A^T A x = A^T b) unless the matrix is well-conditioned, because this approach squares the condition number.

What is the difference between Cholesky and LDL decomposition?

Cholesky decomposition (A = LL^T) requires the matrix to be symmetric AND positive definite, and involves computing square roots. LDL^T decomposition (A = LDL^T) works for any symmetric matrix, including indefinite ones (with pivoting), and avoids square roots. The diagonal matrix D absorbs the signs. Use Cholesky for SPD matrices (covariance matrices, Gram matrices) and LDL^T for symmetric indefinite matrices (KKT systems in constrained optimization, saddle-point problems).

Can I compute Jordan normal form numerically?

No. Jordan normal form should not be computed in floating-point arithmetic. It is discontinuous: infinitesimal perturbations (smaller than machine precision) can change the Jordan block structure entirely. For example, a 2x2 Jordan block with eigenvalue 0 becomes two distinct eigenvalues under any tiny perturbation. For numerical work with defective or nearly defective matrices, use Schur decomposition (A = QTQ*) instead, which always exists, is numerically stable, and places eigenvalues on the diagonal of the upper triangular matrix T.

Which Matrix Decomposition Should I Use? — Complete Guide

Table of Contents

Why Choosing the Right Decomposition Matters
Quick Decision Flowchart
Head-to-Head Comparisons
Master Comparison Table
Decompositions by Use Case
Common Mistakes to Avoid
Try the Calculators

Why Choosing the Right Decomposition Matters

Matrix decomposition is the single most important toolkit in numerical linear algebra. Every serious computation -- solving linear systems, fitting models, compressing data, analyzing stability -- relies on factoring matrices into structured pieces. But there are at least eight major decompositions, and choosing the wrong one means wasted computation, numerical instability, or flat-out incorrect results.

The difference is not academic. Choosing Cholesky over general LU when your matrix is symmetric positive definite cuts your computation in half and eliminates the need for pivoting. Using SVD when QR would suffice triples your runtime for no additional benefit. Attempting eigendecomposition on a defective matrix produces garbage unless you switch to Schur form. And applying LU to a rectangular least-squares problem simply does not work -- it requires a square matrix.

This guide gives you a concrete decision framework. We cover when each decomposition applies, how they compare head-to-head on speed, stability, and applicability, what to use for every common application domain, and the mistakes that trip up students and practitioners alike. By the end, you will never have to guess which decomposition to reach for.

The eight decompositions we cover are:

LU -- Lower-Upper triangular factorization (PA = LU)
QR -- Orthogonal-triangular factorization (A = QR)
SVD -- Singular Value Decomposition (A = UΣV^T)
Cholesky -- Symmetric positive definite factorization (A = LL^T)
Eigendecomposition -- Spectral factorization (A = PDP^-1)
Schur -- Unitary triangularization (A = QTQ^*)
Jordan Normal Form -- Canonical form with Jordan blocks (A = PJP^-1)
LDL^T -- Square-root-free symmetric factorization (A = LDL^T)

Quick Decision Flowchart

Walk through these questions in order. Each branch leads you to the best decomposition for your situation.

1 Is your matrix square (n × n)?

Use QR or SVD

Non-square matrices eliminate most decompositions. Use QR for least squares or SVD for rank analysis and pseudoinverse.

Yes

Continue to Step 2 below.

2 Is the matrix symmetric (or Hermitian)?

Yes

Is it positive definite?

Yes: Use Cholesky (A = LL^T) -- the fastest option.
No: Use LDL^T -- handles indefinite symmetric matrices without square roots.

Not symmetric. Continue to Step 3 below.

3 What is your primary goal?

Solve Ax = b

Use LU

LU with partial pivoting (PA = LU) is the standard for solving square linear systems. Especially efficient for multiple right-hand sides.

Eigenvalues

Use Eigendecomposition

If the matrix is diagonalizable, use A = PDP^-1. If it may be defective (repeated eigenvalues, missing eigenvectors), use Schur instead.

4 Still unsure? More specific goals:

Least Squares

Use QR

QR decomposition is the standard for least squares with full column rank. If rank-deficient, switch to SVD.

Rank / PCA

Use SVD

SVD reveals the rank, provides the best low-rank approximation (Eckart-Young theorem), and powers PCA, image compression, and recommendation systems.

This flowchart covers the most common scenarios. For edge cases -- defective matrices, theoretical canonical forms, and specialized applications -- read the detailed comparisons below.

Head-to-Head Comparisons

Each comparison addresses the key question: when should you pick one decomposition over the other? We cover speed, stability, and applicability.

LU vs QR

LU Strengths

Faster: 2n³/3 flops vs QR's 2n³ (roughly 3x faster for square matrices)
Natural for solving Ax = b with multiple right-hand sides
Directly computes determinants as a byproduct
Lower memory overhead

QR Strengths

Works on rectangular matrices (m × n, m ≥ n)
Unconditionally backward stable (no pivoting required for stability)
Natural for least-squares problems
Preserves orthogonality -- critical for ill-conditioned systems

Verdict: Use LU when you have a square system and care about speed. Use QR when the matrix is rectangular, the system is ill-conditioned, or you need a least-squares solution. For well-conditioned square systems, LU is the default choice in production numerical libraries like LAPACK.

QR vs SVD

QR Strengths

2-6x faster than SVD for least squares problems
Sufficient when the matrix has full column rank
Simpler to implement and understand
Excellent for streaming / online least squares (updating QR)

SVD Strengths

Handles rank-deficient matrices gracefully
Provides the minimum-norm solution for underdetermined systems
Reveals the numerical rank via singular values
Gives the best low-rank approximation (truncated SVD)

Verdict: Use QR for standard least-squares with full-rank matrices -- it is faster and sufficient. Switch to SVD when you suspect rank deficiency, need the pseudoinverse, or want to analyze the rank structure of the matrix. In practice, many libraries default to QR for lstsq and fall back to SVD when rank issues are detected.

Cholesky vs LU

Cholesky Strengths

Roughly 2x faster: n³/3 flops vs LU's 2n³/3
Half the memory (only stores one triangular factor)
No pivoting needed -- always numerically stable for SPD matrices
Built-in positive definiteness test: if the algorithm fails, the matrix is not SPD

LU Strengths

Works for any non-singular square matrix
No symmetry or positive definiteness requirement
Handles general linear systems without restrictions
More widely applicable as a general-purpose solver

Verdict: If your matrix is symmetric positive definite (covariance matrices, Gram matrices, kernel matrices, normal equations A^TA), always use Cholesky. It is strictly superior in every metric -- speed, memory, and stability. Fall back to LU only when the symmetry or positive definiteness requirement is not met.

Cholesky vs LDL^T

Cholesky (LL^T) Strengths

Simpler form: A = LL^T, only one factor to store
Slightly faster due to fewer bookkeeping operations
Standard in optimization, Kalman filters, Monte Carlo sampling
Well-supported in all major numerical libraries

LDL^T Strengths

No square roots needed -- avoids complex numbers for indefinite matrices
Works for symmetric indefinite matrices (with Bunch-Kaufman pivoting)
Reveals inertia: signs of D diagonal entries indicate positive/negative eigenvalue counts
Preferred for integer or exact arithmetic contexts

Verdict: Use Cholesky when the matrix is known to be SPD -- it is the gold standard. Use LDL^T when you have a symmetric matrix that might be indefinite (e.g., saddle-point systems in constrained optimization, KKT matrices), or when you want to avoid computing square roots entirely.

Eigendecomposition vs SVD

Eigen vs SVD

Eigendecomposition Strengths

Directly yields eigenvalues and eigenvectors
Natural for dynamical systems, stability, and modal analysis
Matrix powers: A^k = PD^kP^-1
Matrix exponential: e^A = Pe^DP^-1

SVD Strengths

Works for any matrix (rectangular, singular, or defective)
Always exists and is always numerically stable
Provides orthonormal bases for all four fundamental subspaces
Optimal low-rank approximation via the Eckart-Young theorem

Verdict: These serve fundamentally different purposes. Eigendecomposition answers "how does this linear transformation behave?" -- modes, stability, matrix functions. SVD answers "what is the geometry of this transformation?" -- stretching directions, rank, and best approximation. For symmetric matrices, the two are closely related: singular values equal the absolute values of the eigenvalues, and singular vectors equal eigenvectors (up to sign).

Eigendecomposition vs Schur

Eigen vs Schur

Eigendecomposition Strengths

Diagonal form is simpler to work with algebraically
Eigenvectors have direct physical interpretation
Easy to compute matrix functions (powers, exponentials)
Conceptually cleaner when the decomposition exists

Schur Decomposition Strengths

Always exists for any square matrix (eigendecomposition may not)
Uses unitary/orthogonal transformation (condition number = 1)
Works for defective matrices where eigendecomposition fails
Eigenvalues appear on the diagonal of the upper triangular factor T

Verdict: Use eigendecomposition (A = PDP^-1) when the matrix is diagonalizable and you need the eigenvectors. Use Schur decomposition (A = QTQ^*) when you need guaranteed existence and numerical stability, especially for computing matrix functions of non-symmetric matrices. In practice, most eigenvalue algorithms (including the QR algorithm) compute the Schur form first and then extract eigenvalues from it.

Schur vs Jordan Normal Form

Schur vs Jordan

Schur Strengths

Numerically computable to machine precision
Uses unitary similarity -- condition number is 1
Stable algorithms exist (QR iteration with shifts)
Practical and reliable for real-world floating-point computation

Jordan Strengths

Unique canonical form (up to block ordering)
Reveals the complete algebraic structure: geometric vs algebraic multiplicity
Theoretically complete characterization of a linear map
Essential for proofs and theory in advanced linear algebra

Verdict: Use Schur for any numerical computation. Jordan normal form is numerically unstable -- tiny perturbations (smaller than machine epsilon) can change the Jordan structure entirely. For example, a 2×2 Jordan block with eigenvalue 0 becomes two distinct eigenvalues at ±10^-8 under the smallest perturbation. Jordan form is a theoretical tool for proofs and textbook problems, not for floating-point computation. In numerical practice, Schur decomposition replaces Jordan form completely.

Master Comparison Table

All eight major decompositions at a glance. Use this table as a quick reference for matrix requirements, computational cost, and primary applications.

Decomposition	Form	Matrix Requirements	Complexity	Best For
LU	PA = LU	Square, non-singular	`2n³/3`	Solving Ax = b, determinants, matrix inversion
QR	A = QR	Any (m ≥ n)	`2mn²`	Least squares, eigenvalue algorithms, orthogonalization
SVD	A = UΣV^T	Any matrix	`~11mn²`	Rank, pseudoinverse, PCA, low-rank approximation
Cholesky	A = LL^T	Symmetric positive definite	`n³/3`	SPD systems, optimization, Monte Carlo, Kalman filters
Eigen	A = PDP^-1	Square, diagonalizable	`~10n³`	Eigenvalues, stability analysis, matrix powers, PCA
Schur	A = QTQ^*	Square	`~10n³`	Matrix functions, defective matrices, eigenvalue extraction
Jordan	A = PJP^-1	Square (theoretical only)	Not numerically stable	Theoretical analysis, canonical form, proofs
LDL^T	A = LDL^T	Symmetric	`n³/3`	Indefinite symmetric systems, inertia, KKT systems

A few notes on reading this table. Complexity figures are leading-order flop counts; constant factors and lower-order terms are omitted. "Any matrix" means the decomposition works for rectangular matrices of any dimension. "Diagonalizable" means the matrix must have n linearly independent eigenvectors, which is generic (almost all matrices satisfy this) but not universal.

Decompositions by Use Case

Not sure which decomposition fits your problem? Find your application below.

Solving Linear Systems (Ax = b)

The most common task in numerical computing

You have a coefficient matrix A and a right-hand side vector b, and need to find x. The choice depends entirely on the structure of A.

LU General square systems. The default choice. Compute PA = LU once, then solve Ly = Pb (forward substitution) and Ux = y (back substitution). If you have multiple right-hand sides b₁, b₂, ..., the LU factorization is reused -- only the O(n²) substitution is repeated each time.

Cholesky If A is symmetric positive definite, Cholesky is 2x faster than LU and uses half the storage. Always prefer it when applicable.

QR For ill-conditioned square systems where you need maximum numerical accuracy, QR provides better stability at the cost of roughly 3x more computation.

Least Squares Problems

Minimizing ||Ax - b||₂ when the system is overdetermined

You have more equations than unknowns (m > n) and want the best approximate solution. This is the foundation of linear regression, curve fitting, and data fitting across all of science and engineering.

QR The standard method. Decompose A = QR, then solve Rx = Q^Tb. Fast, numerically stable, and sufficient when A has full column rank.

SVD When A may be rank-deficient or nearly so. SVD gives the minimum-norm least-squares solution and lets you inspect which singular values are too small (numerically zero). The solution is x = VΣ⁺U^Tb.

Normal Equations Forming A^TAx = A^Tb and solving with Cholesky is the fastest approach, but it is numerically less stable because the condition number squares: cond(A^TA) = cond(A)². Only use this when A is well-conditioned.

Eigenvalue Problems

Finding eigenvalues and eigenvectors: Av = λv

Eigenvalue problems arise in vibration analysis, stability of dynamical systems, quantum mechanics, graph theory (spectral methods), and many areas of physics and engineering.

Eigendecomposition The direct approach: A = PDP^-1 where D contains eigenvalues and P contains the corresponding eigenvectors. For symmetric matrices, this is especially clean since P is orthogonal and all eigenvalues are real.

Schur When the matrix may be defective (non-diagonalizable) or when you only need eigenvalues without eigenvectors. The Schur form A = QTQ^* always exists and places eigenvalues on the diagonal of the upper triangular matrix T. Most practical eigenvalue algorithms compute the Schur form internally.

Dimensionality Reduction and PCA

Extracting the most important directions from high-dimensional data

Principal Component Analysis is arguably the most widely used dimensionality reduction technique in data science, compressing high-dimensional data while preserving maximum variance.

SVD Compute SVD of the centered data matrix directly: X = UΣV^T. The right singular vectors (columns of V) are the principal components. The singular values give the explained variance. This approach is numerically superior to the covariance method.

Eigendecomposition Applied to the covariance matrix C = X^TX/(n-1). Eigenvalues give variances, eigenvectors give principal directions. This works but squares the condition number and is less numerically stable than SVD applied directly to the data matrix.

Matrix Exponential and Differential Equations

Computing e^At for linear ODE systems dx/dt = Ax

The matrix exponential is central to solving linear ordinary differential equations, control theory (state-space models), and continuous-time Markov chains.

Eigendecomposition If A = PDP^-1, then e^At = Pe^DtP^-1, where e^Dt simply exponentiates each diagonal entry. Clean and efficient when A is diagonalizable.

Schur For non-diagonalizable or non-normal matrices, use the Schur form A = QTQ^*, then e^At = Qe^TtQ^*. The exponential of the upper triangular T can be computed efficiently using the Parlett recurrence. This is what production codes (like MATLAB's expm) use internally.

Optimization and Newton's Method

Solving Hx = -g at each iteration, where H is the Hessian

Newton-type methods for unconstrained and constrained optimization require solving a linear system at every iteration, where the coefficient matrix is the Hessian (or an approximation of it).

Cholesky When the objective is convex (Hessian is SPD), Cholesky is the fastest and most stable option. This is the standard in interior-point methods and trust-region methods when positive definiteness is guaranteed.

LDL^T For non-convex optimization where the Hessian can be indefinite (near saddle points, or for KKT systems in constrained optimization). LDL^T with Bunch-Kaufman pivoting handles indefiniteness gracefully and reveals the inertia of the Hessian.

LU General fallback for non-symmetric coefficient matrices or augmented systems that lose symmetry.

Signal Processing

Filtering, spectral estimation, beamforming, and source separation

Signal processing routinely works with correlation matrices, covariance matrices, Toeplitz structures, and subspace methods.

SVD Powers the MUSIC algorithm for spectral estimation, optimal rank-1 beamforming, and subspace-based methods for direction of arrival. The SVD separates signal and noise subspaces cleanly.

Eigendecomposition Applied to autocorrelation matrices for spectral analysis (ESPRIT, MUSIC). For real symmetric covariance matrices, eigendecomposition and SVD produce equivalent results.

Cholesky Whitening transforms use Cholesky: if C = LL^T, then L^-1x transforms correlated signals into uncorrelated (white) signals. Also used for matched filtering and prewhitening.

Machine Learning Applications

The linear algebra engine behind modern ML algorithms

Behind the high-level APIs, machine learning algorithms depend heavily on matrix decompositions at their computational core.

SVD Recommendation systems (collaborative filtering via low-rank matrix factorization), latent semantic analysis (LSA/LSI) for natural language processing, image compression, matrix completion (the Netflix Prize approach), and computing pseudoinverses for linear regression.

Eigendecomposition Spectral clustering (eigenvectors of the graph Laplacian), kernel PCA, Google's PageRank algorithm (dominant eigenvector of the link matrix), Laplacian eigenmaps for manifold learning, and the power method in iterative eigenvalue algorithms.

Cholesky Gaussian process regression (inverting the kernel matrix K via K = LL^T), sampling from multivariate Gaussian distributions, and preconditioning in iterative solvers. Cholesky also enables efficient log-determinant computation for GP marginal likelihood.

QR The numerical backbone of linear regression in production libraries like R, statsmodels, and scikit-learn. QR factorization ensures numerically stable coefficient estimates even for near-collinear predictors.

Common Mistakes to Avoid

These are the pitfalls that catch students, engineers, and even experienced practitioners off guard when choosing and applying matrix decompositions.

Using LU for Least Squares

LU decomposition requires a square matrix. If you have an overdetermined system (more equations than unknowns), you cannot apply LU directly to the rectangular matrix A. Students sometimes try to make it work by truncating rows or padding with zeros -- both produce wrong answers.

Fix: Use QR decomposition for least-squares problems. If you want to use triangular factorization, form the normal equations A^TAx = A^Tb and apply Cholesky to the (now square) matrix A^TA -- but be aware that this squares the condition number.

Assuming Every Matrix Has an Eigendecomposition

Not every square matrix is diagonalizable. Defective matrices (where the geometric multiplicity of an eigenvalue is less than its algebraic multiplicity) do not have n linearly independent eigenvectors and cannot be written as A = PDP^-1. A classic example is the matrix [[0, 1], [0, 0]] -- it has eigenvalue 0 with algebraic multiplicity 2 but geometric multiplicity 1.

Fix: Use Schur decomposition (A = QTQ^*) which always exists for any square matrix. If you need the full canonical structure for theoretical work, use Jordan normal form -- but never attempt to compute it with floating-point arithmetic.

Applying Cholesky to Non-SPD Matrices

Cholesky decomposition requires the matrix to be both symmetric AND positive definite. A common mistake is applying it to a matrix that is symmetric but has a negative or zero eigenvalue (semi-definite or indefinite). The algorithm will encounter a square root of a negative number and fail -- or worse, silently produce incorrect results in some implementations.

Fix: Verify positive definiteness before applying Cholesky. Quick checks: all diagonal entries must be positive, and the matrix must be diagonally dominant or have all positive eigenvalues. For symmetric indefinite matrices, use LDL^T with Bunch-Kaufman pivoting. For general square matrices, use LU.

Computing Jordan Normal Form Numerically

Jordan normal form is discontinuous: an infinitesimal perturbation can change the Jordan structure entirely. A 3×3 Jordan block splits into three 1×1 blocks under a perturbation smaller than machine epsilon. Any numerical computation of JNF in floating-point arithmetic is inherently meaningless because the result depends on rounding errors, not on the actual matrix.

Fix: Use Schur decomposition for all numerical work involving defective or nearly defective matrices. Reserve Jordan form exclusively for symbolic computation (computer algebra systems like Mathematica or SymPy), theoretical proofs, and textbook exercises with exact rational arithmetic.

Using SVD When QR Would Suffice

SVD is the "Swiss army knife" of decompositions, but it is significantly more expensive than QR. For a standard full-rank least-squares problem, using SVD costs 3-6 times more than QR with no additional benefit. Over-reliance on SVD is a common source of unnecessary computational expense in data pipelines and production systems.

Fix: Start with QR for least squares. Only upgrade to SVD if you observe rank deficiency, need the numerical rank or singular values explicitly, or need the minimum-norm solution for an underdetermined system. Profile your code -- the difference matters at scale.

Forming the Normal Equations (A^TA) for Ill-Conditioned Problems

Solving A^TAx = A^Tb (the normal equations approach) squares the condition number: cond(A^TA) = cond(A)². If A has condition number 10⁶, the normal equations have condition number 10¹², and you lose 12 digits of precision in double-precision arithmetic (which only has about 16 digits). You are left with roughly 4 correct digits.

Fix: Use QR or SVD directly on A instead of forming A^TA. Only form the normal equations when A is well-conditioned (cond(A) < 10⁴ as a rough rule) and the 2x speedup of Cholesky on A^TA is critical for your performance requirements.

Which Matrix Decomposition Should I Use?

Why Choosing the Right Decomposition Matters

Quick Decision Flowchart

Head-to-Head Comparisons

LU vs QR

LU Strengths

QR Strengths

QR vs SVD

QR Strengths

SVD Strengths

Cholesky vs LU

Cholesky Strengths

LU Strengths

Cholesky vs LDLT

Cholesky (LLT) Strengths

LDLT Strengths

Eigendecomposition vs SVD

Eigendecomposition Strengths

SVD Strengths

Eigendecomposition vs Schur

Eigendecomposition Strengths

Schur Decomposition Strengths

Schur vs Jordan Normal Form

Schur Strengths

Jordan Strengths

Master Comparison Table

Decompositions by Use Case

Solving Linear Systems (Ax = b)

The most common task in numerical computing

Least Squares Problems

Minimizing ||Ax - b||2 when the system is overdetermined

Eigenvalue Problems

Finding eigenvalues and eigenvectors: Av = λv

Dimensionality Reduction and PCA

Extracting the most important directions from high-dimensional data

Matrix Exponential and Differential Equations

Computing eAt for linear ODE systems dx/dt = Ax

Optimization and Newton's Method

Solving Hx = -g at each iteration, where H is the Hessian

Signal Processing

Filtering, spectral estimation, beamforming, and source separation

Machine Learning Applications

The linear algebra engine behind modern ML algorithms

Common Mistakes to Avoid

Using LU for Least Squares

Assuming Every Matrix Has an Eigendecomposition

Applying Cholesky to Non-SPD Matrices

Computing Jordan Normal Form Numerically

Using SVD When QR Would Suffice

Forming the Normal Equations (ATA) for Ill-Conditioned Problems

Try the Calculators

LU Decomposition

QR Decomposition

Singular Value Decomposition

Cholesky Decomposition

Eigendecomposition

Cholesky vs LDL^T

Cholesky (LL^T) Strengths

LDL^T Strengths

Minimizing ||Ax - b||₂ when the system is overdetermined

Computing e^At for linear ODE systems dx/dt = Ax

Forming the Normal Equations (A^TA) for Ill-Conditioned Problems