m66

A time series data points are anomalous if the 6th median is 6 standard deviations (six-sigma) from the time series 6th median standard deviation and persists for x_windows, where x_windows = int(window / 2).

This algorithm finds SIGNIFICANT changepoints in a time series, similar to PELT and Bayesian Online Changepoint Detection, however it is more robust to instantaneous outliers and more conditionally selective of changepoints.

See the docstrings - https://earthgecko-skyline.readthedocs.io/en/latest/skyline.custom_algorithms.html#module-custom_algorithms.m66

See the custom_algorithm source - https://github.com/earthgecko/skyline/blob/master/skyline/custom_algorithms/m66.py

m66 TCPDBench results

https://github.com/alan-turing-institute/TCPDBench

Seeing as there is a changepoint algorithm benchmarking app might as well test it. It would probably score low seeing as it is not detecting all changepoints by design, only significant changepoints. But it scores but than expected, it is places 6th overall.

Heatmap

Best highlighted

Top and bottom 3

Results and rank

Testing m66 with TCPDBench

“If you want to climb the mountain, you most do all the hard things and climb it.”

—Kukuczka

Apart from some lacking docs and deps bugs with TCPDBench, got there in the end.

Build datasets on CentOS 8

Set up TCPD and TCPDBench on a CentOS 8 server

https://github.com/alan-turing-institute/TCPD

https://github.com/alan-turing-institute/TCPDBench

yum install texlive
yum install latekmk

PYTHON_VERSION="3.8.11"
PYTHON_MAJOR_VERSION="3.8"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="TCPDBench-py3811"
cd "${PYTHON_VIRTUALENV_DIR}/projects"
virtualenv --python="${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}/bin/python${PYTHON_MAJOR_VERSION}" "$PROJECT"

cd /opt/python_virtualenv/projects/$PROJECT/
source bin/activate

# dataset
# As per https://github.com/alan-turing-institute/TCPD#using-the-command-line
git clone https://github.com/alan-turing-institute/TCPD
cd TCPD
/opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" install -r requirements.txt

Installing collected packages: six, urllib3, pytz, python-dateutil, numpy, idna, charset-normalizer, certifi, soupsieve, requests, regex, pyrsistent, pandas, multitasking, lxml, et-xmlfile, chardet, attrs, yfinance, Pillow, openpyxl, jsonschema, diff-match-patch, clevercsv, beautifulsoup4
Successfully installed Pillow-8.3.1 attrs-21.2.0 beautifulsoup4-4.9.3 certifi-2021.5.30 chardet-4.0.0 charset-normalizer-2.0.4 clevercsv-0.7.0 diff-match-patch-20200713 et-xmlfile-1.1.0 idna-3.2 jsonschema-3.2.0 lxml-4.6.3 multitasking-0.0.9 numpy-1.21.1 openpyxl-3.0.7 pandas-1.3.1 pyrsistent-0.18.0 python-dateutil-2.8.2 pytz-2021.1 regex-2021.8.3 requests-2.26.0 six-1.16.0 soupsieve-2.2.1 urllib3-1.26.6 yfinance-0.1.63
(TCPDBench-py3811) [root@server TCPD]

/opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" list

(TCPDBench-py3811) [root@server TCPD] /opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" list
Package            Version
------------------ ---------
attrs              21.2.0
beautifulsoup4     4.9.3
certifi            2021.5.30
chardet            4.0.0
charset-normalizer 2.0.4
clevercsv          0.7.0
diff-match-patch   20200713
et-xmlfile         1.1.0
idna               3.2
jsonschema         3.2.0
lxml               4.6.3
multitasking       0.0.9
numpy              1.21.1
openpyxl           3.0.7
pandas             1.3.1
Pillow             8.3.1
pip                21.2.4
pyrsistent         0.18.0
python-dateutil    2.8.2
pytz               2021.1
regex              2021.8.3
requests           2.26.0
setuptools         57.4.0
six                1.16.0
soupsieve          2.2.1
urllib3            1.26.6
wheel              0.37.0
yfinance           0.1.63
(TCPDBench-py3811) [root@server TCPD]

Build datasets

/opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" build_tcpd.py -v collect

(TCPDBench-py3811) [root@server TCPD] /opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" build_tcpd.py -v collect
Running collect action for dataset: apple ... ok
Running collect action for dataset: bee_waggle_6 ... ok
Running collect action for dataset: bitcoin ... ok
Running collect action for dataset: iceland_tourism ... ok
Running collect action for dataset: measles ... ok
Running collect action for dataset: occupancy ... ok
Running collect action for dataset: ratner_stock ... ok
Running collect action for dataset: robocalls ... ok
Running collect action for dataset: scanline_126007 ... ok
Running collect action for dataset: scanline_42049 ... ok
(TCPDBench-py3811) [root@server TCPD]

Check datasets

/opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" ./utils/check_checksums.py -v -c ./checksums.json -d ./datasets

(TCPDBench-py3811) [root@server TCPD] /opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" ./utils/check_checksums.py -v -c ./checksums.json -d ./datasets
Checking apple.json
Checking bank.json
Checking bee_waggle_6.json
Checking bitcoin.json
Checking brent_spot.json
Checking businv.json
Checking centralia.json
Checking children_per_woman.json
Checking co2_canada.json
Checking construction.json
Checking debt_ireland.json
Checking gdp_argentina.json
Checking gdp_croatia.json
Checking gdp_iran.json
Checking gdp_japan.json
Checking global_co2.json
Checking homeruns.json
Checking iceland_tourism.json
Checking jfk_passengers.json
Checking lga_passengers.json
Checking measles.json
Checking nile.json
Checking occupancy.json
Checking ozone.json
Checking quality_control_1.json
Checking quality_control_2.json
Checking quality_control_3.json
Checking quality_control_4.json
Checking quality_control_5.json
Checking rail_lines.json
Checking ratner_stock.json
Checking robocalls.json
Checking run_log.json
Checking scanline_126007.json
Checking scanline_42049.json
Checking seatbelts.json
Checking shanghai_license.json
Checking uk_coal_employ.json
Checking unemployment_nl.json
Checking us_population.json
Checking usd_isk.json
Checking well_log.json
All ok.
(TCPDBench-py3811) [root@server TCPD]

Set up TCPDBench

cd /opt/python_virtualenv/projects/$PROJECT/
source bin/activate
# As per https://github.com/alan-turing-institute/TCPDBench#getting-started
git clone --recurse-submodules https://github.com/alan-turing-institute/TCPDBench
cd TCPDBench
/opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" install -r ./analysis/requirements.txt

Installing collected packages: sortedcontainers, intervaltree, termcolor, tabulate, scipy, labella, colorama
Successfully installed colorama-0.4.4 intervaltree-3.1.0 labella-0.9.8 scipy-1.7.1 sortedcontainers-2.4.0 tabulate-0.8.9 termcolor-1.1.0
(TCPDBench-py3811) [root@server TCPDBench]

Install m66 requirement

cd /opt/python_virtualenv/projects/$PROJECT/
bin/"pip${PYTHON_MAJOR_VERSION}" install bottleneck

(TCPDBench-py3811) [root@server TCPDBench-py3811] bin/"pip${PYTHON_MAJOR_VERSION}" install bottleneck
Collecting bottleneck
Using cached Bottleneck-1.3.2-cp38-cp38-linux_x86_64.whl
Requirement already satisfied: numpy in ./lib/python3.8/site-packages (from bottleneck) (1.21.1)
Installing collected packages: bottleneck
Successfully installed bottleneck-1.3.2
(TCPDBench-py3811) [root@server TCPDBench-py3811]

Errors on CentOS 8

Latex / texlive error. And seemingly no texlive-standalone package on CentOS 8 :(

cd /opt/python_virtualenv/projects/$PROJECT/TCPDBench
make results
...
...
python ./analysis/scripts/rank_plots.py -i analysis/output/tables/best_cover_uni_full.json -o analysis/output/rankplots/rankplot_best_cover_uni.tex -b max --type best

Warning: Filtering out RBOCPDMS due to insufficient results.

Latexmk: This is Latexmk, John Collins, 18 June 2019, version: 4.65.
Rule 'pdflatex': The following rules & subrules became out-of-date:
    'pdflatex'
------------
Run number 1 of rule 'pdflatex'
------------
------------
Running 'pdflatex  --interaction=nonstopmode -recorder -output-directory="/tmp/tmp16cu66k5"  "/tmp/tmp16cu66k5/labella_text.tex"'
------------
Latexmk: applying rule 'pdflatex'...
This is pdfTeX, Version 3.14159265-2.6-1.40.19 (TeX Live 2018) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
(/tmp/tmp16cu66k5/labella_text.tex
LaTeX2e <2017-04-15>
Babel <3.17> and hyphenation patterns for 3 language(s) loaded.

! LaTeX Error: File `standalone.cls' not found.

Type X to quit or <RETURN> to proceed,
or enter new name. (Default extension: cls)

Enter file name:
! Emergency stop.
<read *>

l.3 \begin
        {document}^^M
!  ==> Fatal error occurred, no output PDF file produced!
Transcript written on /tmp/tmp16cu66k5/labella_text.log.
Latexmk: Missing input file: 'standalone.cls' from line
'! LaTeX Error: File `standalone.cls' not found.'
Collected error summary (may duplicate other messages):
pdflatex: Command for 'pdflatex' gave return code 1
    Refer to '/tmp/tmp16cu66k5/labella_text.log' for details
Latexmk: Use the -f option to force complete processing,
unless error was exceeding maximum runs, or warnings treated as errors.
=== TeX engine is 'pdfTeX'
Latexmk: Errors, so I did not complete making targets

Traceback (most recent call last):
File "./analysis/scripts/rank_plots.py", line 151, in <module>
    main()
File "./analysis/scripts/rank_plots.py", line 145, in main
    make_rank_plot(
File "./analysis/scripts/rank_plots.py", line 90, in make_rank_plot
    tl = TimelineTex(plot_data, options=options)
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/timeline.py", line 509, in __init__
    super().__init__(items, options=options, output_mode="tex")
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/timeline.py", line 145, in __init__
    self.items = self.parse_items(dicts, output_mode=output_mode)
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/timeline.py", line 179, in parse_items
    it = Item(
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/timeline.py", line 97, in __init__
    self.width, self.height = self.get_text_dimensions()
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/timeline.py", line 107, in get_text_dimensions
    width, height = text_dimensions(
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/tex.py", line 139, in text_dimensions
    width, height = get_latex_dims(tex, silent=silent,
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/tex.py", line 106, in get_latex_dims
    compile_latex(fname, tmpdirname, latexmk_options, silent=silent)
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/tex.py", line 94, in compile_latex
    raise (e)
File "/opt/python_virtualenv/projects/TCPDBench-py3811/lib/python3.8/site-packages/labella/tex.py", line 89, in compile_latex
    output = subprocess.check_output(command, stderr=subprocess.STDOUT)
File "/opt/python_virtualenv/versions/3.8.11/lib/python3.8/subprocess.py", line 415, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "/opt/python_virtualenv/versions/3.8.11/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['latexmk', '--pdf', '--outdir=/tmp/tmp16cu66k5', '--interaction=nonstopmode', '/tmp/tmp16cu66k5/labella_text.tex']' returned non-zero exit status 12.
make: *** [Makefile:226: analysis/output/rankplots/rankplot_best_cover_uni.tex] Error 1
(TCPDBench-py3811) [root@server TCPDBench]

Needs latex so needs to be done local.

Trying POCing on Ubuntu 14.04 laptop as CentOS 8 has no texlive-standalone

PYTHON_VERSION="3.8.6"
PYTHON_MAJOR_VERSION="3.8"
PYTHON_VIRTUALENV_DIR="/opt/python_virtualenv"
PROJECT="TCPDBench-py386"
cd "${PYTHON_VIRTUALENV_DIR}/projects"
virtualenv --python="${PYTHON_VIRTUALENV_DIR}/versions/${PYTHON_VERSION}/bin/python${PYTHON_MAJOR_VERSION}" "$PROJECT"

cd /opt/python_virtualenv/projects/$PROJECT/
source bin/activate

# dataset
# As per https://github.com/alan-turing-institute/TCPD#using-the-command-line
git clone https://github.com/alan-turing-institute/TCPD
cd TCPD
/opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" install -r requirements.txt

# Self signed ssl cert errors
/opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" build_tcpd.py -v collect

# (TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPD$ /opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" build_tcpd.py                                                           -v collect
# Running collect action for dataset: apple ... ok
# Running collect action for dataset: bee_waggle_6 ... Error occurred (URLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self                                                           signed certificate in certificate chain (_ssl.c:1124)'))) when trying to download zip. Retrying in 5 seconds <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# Error occurred (URLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1124)                                                          '))) when trying to download zip. Retrying in 5 seconds <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>
# Error occurred (URLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1124)                                                          '))) when trying to download zip. Retrying in 5 seconds <_io.TextIOWrapper name='<stderr>' mode='w' encoding='utf-8'>

Due to openssl version on Ubuntu 14.04

Use dataset that where built on CentOS 8

# rsync datasets from server
rsync -avz --exclude __pycache__/ -e 'ssh -o "StrictHostKeyChecking=no" -i  -l root -ax -o ClearAllForwardings=yes' server:/opt/python_virtualenv/projects/TCPDBench-py3811/TCPD/datasets/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPD/datasets/

# verify OK
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPD$ /opt/python_virtualenv/projects/$PROJECT/bin/"python${PYTHON_MAJOR_VERSION}" ./utils/check_checksums.py -v -c ./checksums.json -d ./datasets
Checking apple.json
Checking bank.json
Checking bee_waggle_6.json
Checking bitcoin.json
Checking brent_spot.json
Checking businv.json
Checking centralia.json
Checking children_per_woman.json
Checking co2_canada.json
Checking construction.json
Checking debt_ireland.json
Checking gdp_argentina.json
Checking gdp_croatia.json
Checking gdp_iran.json
Checking gdp_japan.json
Checking global_co2.json
Checking homeruns.json
Checking iceland_tourism.json
Checking jfk_passengers.json
Checking lga_passengers.json
Checking measles.json
Checking nile.json
Checking occupancy.json
Checking ozone.json
Checking quality_control_1.json
Checking quality_control_2.json
Checking quality_control_3.json
Checking quality_control_4.json
Checking quality_control_5.json
Checking rail_lines.json
Checking ratner_stock.json
Checking robocalls.json
Checking run_log.json
Checking scanline_126007.json
Checking scanline_42049.json
Checking seatbelts.json
Checking shanghai_license.json
Checking uk_coal_employ.json
Checking unemployment_nl.json
Checking us_population.json
Checking usd_isk.json
Checking well_log.json
All ok.
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPD$

Run on Ubuntu machine

cd /opt/python_virtualenv/projects/TCPDBench-py386/
# As per https://github.com/alan-turing-institute/TCPDBench#getting-started
git clone --recurse-submodules https://github.com/alan-turing-institute/TCPDBench
cd TCPDBench
/opt/python_virtualenv/projects/$PROJECT/bin/"pip${PYTHON_MAJOR_VERSION}" install -r ./analysis/requirements.txt

# Install m66 requirement
cd /opt/python_virtualenv/projects/$PROJECT/
bin/"pip${PYTHON_MAJOR_VERSION}" install bottleneck

cd /opt/python_virtualenv/projects/$PROJECT/TCPDBench
make results
# OK apart from one

Copy datasets on Ubuntu machine

# First, obtain the Turing Change Point Dataset and follow the instructions
# provided there. Copy the dataset files to a datasets directory in this
# repository
mkdir /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets
rsync -avz /opt/python_virtualenv/projects/TCPDBench-py386/TCPD/datasets/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/

Move exisitng abed_results

cd /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench
mv abed_results old_abed_results

(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench$ mv abed_results old_abed_results

Install abed with error

cd /opt/python_virtualenv/projects/TCPDBench-py386
bin/pip3.8 install abed

# ERROR
...
gcc -pthread _configtest.o -lvt.mpi -o _configtest
/usr/bin/ld: cannot find -lvt.mpi
collect2: error: ld returned 1 exit status
failure.
...
_configtest.c:2:17: fatal error: mpi.h: No such file or directory
#include <mpi.h>
                ^
compilation terminated.
failure.
removing: _configtest.c _configtest.o
error: Cannot compile MPI programs. Check your configuration!!!
----------------------------------------
ERROR: Failed building wheel for mpi4py
Failed to build mpi4py
ERROR: Could not build wheels for mpi4py which use PEP 517 and cannot be installed directly
(TCPDBench-py386)

Much googling, considered docker route, no easier unless you forked and cloned your repo with Dockerfile …

Reading mpi4py docs

sudo apt-get install openmpi-bin openmpi-doc libopenmpi-dev

Success

cd /opt/python_virtualenv/projects/TCPDBench-py386
bin/pip3.8 install abed
...
Successfully installed Fabric3-1.14.post1 abed-0.1.2 backports.lzma-0.0.14 bcrypt-3.2.0 bz2file-0.98 cryptography-3.4.7 dominate-2.6.0 gitdb-4.0.7 gitpython-3.1.18 mpi4py-3.1.0 paramiko-2.7.2 progressbar-2.5 pynacl-1.4.0 smmap-4.0.0 tqdm-4.62.0
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386$

jupyter. Let us see this data to fit m66 to work with it.

cd /opt/python_virtualenv/projects/TCPDBench-py386
source bin/activate
bin/pip3.8 install jupyter
jupyter notebook &

https://github.com/earthgecko/skyline/blob/v5.0.0-alpha/tests/20210814.POC.task4236.test.m66.with.TCPDBench.ipynb.py

Got it to run eventually and analysed just m66 and zero and got results but cannot plot them with make results.

Latex breaks

Perhaps to update you need to run all

(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench$ abed status
There are 41664 tasks left to be done, out of 41664 tasks defined.
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench$

And …

> You may want to run these experiments in parallel on a large number of cores, > as the expected runtime is on the order of 21 days on a single core. Once this > command starts running the experiments you will see result files appear in the > staging directory.

https://github.com/alan-turing-institute/AnnotateChange

See if we can plot what the others scored at least.

cd /opt/python_virtualenv/projects/TCPDBench-py386
source bin/activate
git clone https://github.com/alan-turing-institute/AnnotateChange
cd AnnotateChange

/opt/python_virtualenv/projects/TCPDBench-py386/bin/pip3.8 install -r requirements.txt

cp .env.example .env.development
sed -i 's/DB_TYPE=mysql/DB_TYPE=sqlite3/g' .env.development

./flask.sh db upgrade

./flask.sh admin add --auto-confirm-email

A bit flaky, looks like it emails the annotations to the user.

So from annotations it looks like they had users [‘6’, ‘7’, ‘8’, ‘9’, ‘10’, ‘12’, ‘13’, ‘14’]

Did the annotations and not all of them did all the timeseries. I can see you they used AnnotateChange to do that as well.

m66 is a LEGIT changepoint detection algorithm!

rm -rf analysis/output/
# I think this is the first time it has run properly with best
make -k results
rm -rf analysis/output
rsync -az --exclude best_m66/ --exclude default_m66/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results.original/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results/
make -k results
rm -rf analysis/output/
make -k results
rm -rf analysis/output/
make -k results
rm -rf stagedir/0
abed reload_tasks
mpiexec -np 2 abed local
rsync -az --exclude best_m66/ --exclude default_m66/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results.original/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results/
make -k results
rm -rf abed_results/
rm -rf analysis/output/
abed reload_tasks
mpiexec -np 2 abed local
rsync -az --exclude best_m66/ --exclude default_m66/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results.original/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results/
make -k results
rm -rf abed_results/
rm -rf analysis/output/
abed reload_tasks
mpiexec -np 2 abed local
rsync -az --exclude best_m66/ --exclude default_m66/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results.original/ /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/abed_results/
make -k results
../bin/pip3.8 install pdflatex
../bin/pip3.8 install xhtml2pdf
history | tail -n 50
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench$ date
Sun Aug 15 21:28:11 BST 2021
(TCPDBench-py386) gary@mc11:/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench$

results

"""
best_cover_avg
m66 0.795455
bocpd       0.789368
segneigh    0.783592
binseg      0.780245
amoc        0.745989
bocpdms     0.743791
pelt        0.725448
ecp 0.720113
kcpa        0.625981
zero        0.578767
prophet     0.576222
cpnp        0.552106
wbs 0.428036
rfpop       0.414154


best_f1_avg
bocpd       0.879675
binseg      0.855901
segneigh    0.854902
m66 0.841738
amoc        0.798912
ecp 0.796599
pelt        0.787211
kcpa        0.683167
cpnp        0.665980
zero        0.662375
bocpdms     0.620328
prophet     0.534355
wbs 0.532729
rfpop       0.530942

default_cover_avg
binseg      0.705799
amoc        0.701605
pelt        0.688798
segneigh    0.676410
bocpd       0.636019
bocpdms     0.633351
rbocpdms    0.628623
zero        0.582727
m66 0.582623
prophet     0.539869
cpnp        0.535341
ecp 0.522748
rfpop       0.392427
wbs 0.330206
kcpa        0.061955

default_f1_avg
binseg      0.744400
pelt        0.709992
amoc        0.703711
bocpd       0.689622
segneigh    0.675545
zero        0.668967
m66 0.648660
cpnp        0.606694
ecp 0.597710
bocpdms     0.507137
rfpop       0.499453
prophet     0.487742
rbocpdms    0.446714
wbs 0.411706
kcpa        0.111007
"""

Test params in their algorithms

make venv numpy before scipy

But make venv throw error and numpy needs to be installed before scipy.

Building wheels for collected packages: scipy
Building wheel for scipy (setup.py) ... error
ERROR: Command errored out with exit status 1:
command: /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/rbocpdms/venv/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py'"'"'; __file__='"'"'/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' bdist_wheel -d /tmp/pip-wheel-xdsqoeuc
    cwd: /tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/
Complete output (9 lines):
/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py:114: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp
Traceback (most recent call last):
    File "<string>", line 1, in <module>
    File "/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py", line 474, in <module>
    setup_package()
    File "/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py", line 450, in setup_package
    from numpy.distutils.core import setup
ModuleNotFoundError: No module named 'numpy'
----------------------------------------
ERROR: Failed building wheel for scipy
Running setup.py clean for scipy
ERROR: Command errored out with exit status 1:
command: /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/rbocpdms/venv/bin/python -u -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py'"'"'; __file__='"'"'/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' clean --all
    cwd: /tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b
Complete output (11 lines):
/tmp/pip-install-qro7kaiy/scipy_d9e7a8590f554c5c851b4993f896117b/setup.py:114: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
    import imp

`setup.py clean` is not supported, use one of the following instead:

    - `git clean -xdf` (cleans all files)
    - `git clean -Xdf` (cleans all versioned files, doesn't touch
                        files that aren't checked into the git repo)

Add `--force` to your command to use it anyway if you must (unsupported).

----------------------------------------
ERROR: Failed cleaning build dir for scipy
Failed to build scipy
Installing collected packages: six, pytz, python-dateutil, pyparsing, py, pluggy, numpy, more-itertools, kiwisolver, cycler, attrs, atomicwrites, scipy, pytest, matplotlib
    Running setup.py install for scipy ... \^canceled
ERROR: Operation cancelled by user
WARNING: You are using pip version 21.2.3; however, version 21.2.4 is available.
You should consider upgrading via the '/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/rbocpdms/venv/bin/python -m pip install --upgrade pip' command.
make: *** [execs/python/rbocpdms/venv] Error 1

Test params bug

No bugs.

They type the arguments and the make_param_dict functions works as desired in their algorithm.

gary@mc11:~$ tail /tmp/TCPDBench.bocpdms.debug.parameters.txt
parameters: {'intensity': 100.0, 'prior_a': 100.0, 'prior_b': 1.0, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4}
detector = run_bocpdms(mat, {'intensity': 100.0, 'prior_a': 100.0, 'prior_b': 1.0, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4})
agrs: Namespace(input='/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/well_log.json', intensity=200.0, output=None, prior_a=0.01, prior_b=100.0, threshold=100, use_timeout=True)
defaults: {'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4}
parameters: {'intensity': 200.0, 'prior_a': 0.01, 'prior_b': 100.0, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4}
detector = run_bocpdms(mat, {'intensity': 200.0, 'prior_a': 0.01, 'prior_b': 100.0, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4})
agrs: Namespace(input='/opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/well_log.json', intensity=50.0, output=None, prior_a=0.01, prior_b=0.01, threshold=100, use_timeout=True)
defaults: {'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4}
parameters: {'intensity': 50.0, 'prior_a': 0.01, 'prior_b': 0.01, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4}
detector = run_bocpdms(mat, {'intensity': 50.0, 'prior_a': 0.01, 'prior_b': 0.01, 'threshold': 100, 'use_timeout': True, 'S1': 1, 'S2': 1, 'intercept_grouping': None, 'prior_mean_scale': 0, 'prior_var_scale': 1, 'lower_AR': 1, 'upper_AR': 4})
gary@mc11:~$

abed no source in shell

I had to pass . instead of source otherwise abed taks failed.

[2021-08-17 08:21:33] Executing: 'source ./opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/activate && /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/python /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/cpdbench_bocpdms.py -i /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/well_log.json --intensity 50 --prior-a 1.0 --prior-b 1.0 --threshold 100 --use-timeout'
Error: There was an error executing: 'source ./opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/activate && /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/python /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/cpdbench_bocpdms.py -i /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/well_log.json --intensity 100 --prior-a 0.01 --prior-b 0.01 --threshold 100 --use-timeout'. Here is the error:
/bin/sh: 1: source: not found
Error: There was an error executing: 'source ./opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/activate && /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/bocpdms/venv/bin/python /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/execs/python/cpdbench_bocpdms.py -i /opt/python_virtualenv/projects/TCPDBench-py386/TCPDBench/datasets/well_log.json --intensity 50 --prior-a 1.0 --prior-b 1.0 --threshold 100 --use-timeout'. Here is the error:

#    "best_bocpdms": (
#        "source {execdir}/python/bocpdms/venv/bin/activate && python {execdir}/python/cpdbench_bocpdms.py -i {datadir}/{dataset}/{dataset}.json --intensity {intensity} --prior-a {prior_a} --prior-b {prior_b} --threshold 100 --use-timeout"
#    ),
    "best_bocpdms": (
        ". {execdir}/python/bocpdms/venv/bin/activate && python {execdir}/python/cpdbench_bocpdms.py -i {datadir}/{dataset}/{dataset}.json --intensity {intensity} --prior-a {prior_a} --prior-b {prior_b} --threshold 100 --use-timeout"
    ),

Heatmaps of results

import pandas as pd
algorithms_results = {
    'best_cover_avg': {
        'm66': 0.795455,
        'bocpd': 0.789368,
        'segneigh': 0.783592,
        'binseg': 0.780245,
        'amoc': 0.745989,
        'bocpdms': 0.743791,
        'pelt': 0.725448,
        'ecp': 0.720113,
        'kcpa': 0.625981,
        'zero': 0.578767,
        'prophet': 0.576222,
        'cpnp': 0.552106,
        'wbs': 0.428036,
        'rfpop': 0.414154},
    'best_f1_avg': {
        'bocpd': 0.879675,
        'binseg': 0.855901,
        'segneigh': 0.854902,
        'm66': 0.841738,
        'amoc': 0.798912,
        'ecp': 0.796599,
        'pelt': 0.787211,
        'kcpa': 0.683167,
        'cpnp': 0.665980,
        'zero': 0.662375,
        'bocpdms': 0.620328,
        'prophet': 0.534355,
        'wbs': 0.532729,
        'rfpop': 0.530942},
    'default_cover_avg': {
        'binseg': 0.705799,
        'amoc': 0.701605,
        'pelt': 0.688798,
        'segneigh': 0.676410,
        'bocpd': 0.636019,
        'bocpdms': 0.633351,
        'rbocpdms': 0.628623,
        'zero': 0.582727,
        'm66': 0.582623,
        'prophet': 0.539869,
        'cpnp': 0.535341,
        'ecp': 0.522748,
        'rfpop': 0.392427,
        'wbs': 0.330206,
        'kcpa': 0.061955},
    'default_f1_avg': {
        'binseg': 0.744400,
        'pelt': 0.709992,
        'amoc': 0.703711,
        'bocpd': 0.689622,
        'segneigh': 0.675545,
        'zero': 0.668967,
        'm66': 0.648660,
        'cpnp': 0.606694,
        'ecp': 0.597710,
        'bocpdms': 0.507137,
        'rfpop': 0.499453,
        'prophet': 0.487742,
        'rbocpdms': 0.446714,
        'wbs': 0.411706,
        'kcpa': 0.111007,
    }
}

df = pd.DataFrame.from_dict(algorithms_results, orient='columns')
df.style.background_gradient(cmap='Greens').background_gradient(cmap='Greens').set_properties(**{'font-size': '12px'})

df.style.highlight_max(color='lightblue', axis=0)

def highlight_top3(s):
    result = []
    is_large = s.nlargest(3).values
    is_small = s.nsmallest(3).values
    for i in s:
        if i in is_large:
            result.append('background-color: lightgreen')
        elif i in is_small:
            result.append('background-color: #FFCCCB')
        else:
            result.append('')
    return result
df.style.apply(highlight_top3)

# results and rank
algorithms_results_and_rank = algorithms_results.copy()
for score in list(algorithms_results.keys()):
    l = []
    for algo in algorithms_results[score]:
        l.append([algo, algorithms_results[score][algo]])
    sorted_l = sorted(l, key=lambda x: x[1], reverse=True)
    rank_dict = {}
    for algo in algorithms_results[score]:
        for index, item in enumerate(sorted_l):
            if item[0] == algo:
                rank_dict[algo] = index + 1
    metric = '%s rank' % score
    algorithms_results_and_rank[metric] = rank_dict.copy()
df = pd.DataFrame.from_dict(algorithms_results_and_rank, orient='columns')
df.style.background_gradient(cmap='Greens').background_gradient(cmap='Blues_r', subset=['best_cover_avg rank','best_f1_avg rank','default_cover_avg rank','default_f1_avg rank']).set_properties(**{'font-size': '12px'})