Create Rectangular Matrix

Principal component analysis requires a rectangular matrix from which covariances can be calculated. Consequently we need to ensure we have the same number of data points for each term.
Examining the raw dataset created in the previous step we see inconsistent data availability across terms.
The latest available observation date varies by term.

In the steps below we ......

Data Boundary by Term

The dataset of spot yields contains gaps insofar that the whole set of observation dates is not consistently available for all terms.

⚠️ ** how do we call the python script in the charts folder ... **

link to overallstructure of code

We want to choose a range of observation dates and terms that reduces the need to fill in gaps in the dataset.

We have spot yield data for terms 0.5 up to 40. The first step to identify a calibration dataset is to identify the first and last data point for each term. This gives us an initial idea of the size of the dataset available.

We make a judgement call about which terms to retain (and observation dates) to retain. If there are gaps in the data we use linear interpolation to fill them.