Classical Principal Component Analysis

Steps

We go through the following steps to produce a classical principal component analysis.

Load raw spot yield data
Truncate the data to create a rectangular matrix
Take Logarithmns
Transform truncated dataset:
a. taking logarithmns
b. differencing
c. de-meaning
?? do we need to standardise the data somehow ??
Calculate a co-variance matrix (or is it correlation matrix)
Eigenvector and eigenvalue decomposition
Dimensionality Reduction (sort components by explained variance)
Projecting Co-ordinates (project the data using lower dimensional representation)

Reconstruction Steps

To go from principal component scores back to an approximation of the actual yield curve, reverse the preprocessing in the opposite order it was applied:

Project scores back through the loadings (undoes Step 9): $\widetilde{\mathbf{Y}}_c^{\text{approx}} = \mathbf{z}_3\,\mathbf{W}_3^\top$
Add the column means back (undoes Step 5): $\widetilde{\mathbf{Y}}^{\text{approx}} = \widetilde{\mathbf{Y}}_c^{\text{approx}} + \vec{\widetilde{y}}$
Undo the 12-month differencing (undoes Step 4): add back the actual logged yield from 12 months prior, $y^{\text{approx}}{s+12,n} = \widetilde{y}^{\text{approx}}{s,n} + y_{s,n}$ — this needs a real anchor value, not something recoverable from the PCA output alone
Exponentiate to return to yield units (undoes Step 3): $\text{yield}^{\text{approx}} = e^{y^{\text{approx}}}$

Only step 1 is lossy relative to a full (all-71-component) reconstruction — steps 2–4 are exact inverses, so the approximation error introduced by keeping only 3 components carries straight through unchanged.

Reason for each steps

PCA works in a vector space. a matrix is a sequence of vectors
a
b
c
8.