# StumpyMatrixProfiler 

This class is used to detect anomalies/novelties in time series data using [STUMPY matrix profiling](https://stumpy.readthedocs.io/en/latest/Tutorial_The_Matrix_Profile.html) algorithm and plotting the results. The matrix profile is a distance profile between subsequences of the input time series data. The algorithm involves sliding a window across the time series data and calculating the matrix profile. The subsequence corresponding to each discovered anomaly/novelty in the time series data will have a distance profile value greater than other subsequences corresponding to a similar or identical anomaly/novelty.

##Configuration

### Required Configuration
The Stumpy matrix profiler requires the following configuration: 

- `local_dir`: Location of a local directory to output files generated by this component. 

- `model_target`: The column in the source data that contains the label that the model will try to predict.

### Optional Configuration
The Stumpy matrix profiler has no optional configuration.

### Default Configuration
The Stumpy matrix profiler uses the following optional configuration: 

- `stumpy_analysis_image_name`: The file name of the generated image. Defaults to `STUMPY_DISCORD_ANALYSIS.PNG`.

- `stumpy_window_size`: A list of window sizes. Each entry will generate a new plot. Defaults to `[30]`. 

- `stumpy_num_discords`: The total number of discords to include in the analysis. These will be the mosts most likely to be identified as anomalies. Defaults to 3. 



## Methods

### check_data
```python
def check_data(self, data, *args, **kwargs):
```
Method for detecting anomalies/novelties in time series data using STUMPY matrix profiling algorithm and plotting the results.

**Arguments**:

- `data` (pandas.DataFrame): Input time series data.

**Returns**:

- `data_report` *(None)*: Stumpy does not generate a report, so this returns `None` in order to stay consistent with the method signiture. 

- `file_path` *(string)*: Path to the exported data check report.

- `checks_status` *(string)*: The status of the checks. Currently this always returns "Pass". 

**Examples:** 
```python
import pandas as pd
from lolpop.component import StumpyMatrixProfiler, StdOutLogger

# define input data
my_data = pd.DataFrame({'ds': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 
                        'y': [4.0, 5.0, 6.0, 1.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0]})

#create component configuration
kwargs = {
    "conf" : {
        "config": {
            "local_dir": "/tmp/artifacts",
            "stumpy_window_size": [3], 
            "stumpy_num_discords": 1,
            "model_target": "y",
        },
    },
    "component": {
        "logger": StdOutLogger(),
    }
}

# instantiate the data checker class
data_checker = StumpyMatrixProfiler(**kwargs)

#run the checks
_, file_path, check_status = data_checker.check_data(data=my_data)

#print report path and checks status
print(f"Data check report saved at {file_path}. Checks status: {checks_status}")
```

### __plot_mp
```python
def __plot_mp(self, axs, m, h, mp, discords, i): 
```
Private method for plotting the STUMPY matrix and marking the discovered anomalies/novelties.

**Arguments**:

- `axs` (list of matplotlib axes): List of matplotlib axes to plot the matrix.
- `m`  (int): Subsequence length for STUMPY matrix.
- `h` (int): Height of box for marking the anomalies/novelties.
- `mp` (numpy array): STUMPY matrix to be plotted.
- `discords` (list of int): Indices of the discovered anomalies/novelties in the time series data.
- `i` (int): Index in axs list where the plot will be made.

**Returns**:

- None

**Examples:**



