Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Binseg incorrect for small data, l2 loss and min_size=1 #242

Closed
tdhock opened this issue Mar 23, 2022 · 4 comments
Closed

Binseg incorrect for small data, l2 loss and min_size=1 #242

tdhock opened this issue Mar 23, 2022 · 4 comments

Comments

@tdhock
Copy link

tdhock commented Mar 23, 2022

Hi @deepcharles thanks for putting your implementation of binary segmentation online. I am trying to use it to compute the full path of binary segmentation models, but the predict method is giving me an exception for large numbers of breakpoints, and incorrect changepoints for small numbers of breakpoints. Here is a small data example:

import ruptures as rpt
import numpy as np
rpt.version.version
data_list = [0.0, 0.1, 1.2, 1.0]
N_data = len(data_list)
data_mat = np.array(data_list).reshape(N_data,1)
algo = rpt.Binseg(model="l2",min_size=1,jump=1).fit(data_mat)
computed_break_dict={}
for n_bkps in range(N_data):
    try:
        result = algo.predict(n_bkps=n_bkps)
    except Exception as e:
        result = e
    computed_break_dict[n_bkps] = result
computed_break_dict
expected_break_dict = {
    0:[4],
    1:[2,4],
    2:[2,3,4],
    3:[1,2,3,4]
    }

In this case there are four data points. I expected the first change to be after the second data point, etc (as shown in expected_break_dict above). Instead I got the results in computed_break_dict, which are not expected:

  • I expected the result for n_bkps=1 to be a list of two segment ends, but I only got one.
  • I expected the results for n_bkps=2 or 3 to be a list of segment ends, but I got BadSegmentationParameters exceptions.
    Am I using Binseg correctly, and is this a bug? If I am not using it correctly, can you please explain the correct way?
    Thanks in advance!
    Here is the output I got from running the code above on my system:
Python 3.7.6 | packaged by conda-forge | (default, Jun  1 2020, 18:11:50) [MSC v.1916 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import ruptures as rpt
>>> import numpy as np
>>> rpt.version.version
'1.1.6'
>>> data_list = [0.0, 0.1, 1.2, 1.0]
>>> N_data = len(data_list)
>>> data_mat = np.array(data_list).reshape(N_data,1)
>>> algo = rpt.Binseg(model="l2",min_size=1,jump=1).fit(data_mat)
>>> computed_break_dict={}
>>> for n_bkps in range(N_data):
...     try:
...         result = algo.predict(n_bkps=n_bkps)
...     except Exception as e:
...         result = e
...     computed_break_dict[n_bkps] = result
>>> computed_break_dict
{0: [4], 1: [4], 2: BadSegmentationParameters(), 3: BadSegmentationParameters()}
>>> expected_break_dict = {
...     0:[4],
...     1:[2,4],
...     2:[2,3,4],
...     3:[1,2,3,4]
...     }
@deepcharles
Copy link
Owner

Hi,

Thanks for raising this issue. Here, the min_size (minimum segment length) is not handled correctly, resulting in bad segmentations without raising a BadSegmentation error. We will correct it.

@oboulant
Copy link
Collaborator

Hi @tdhock ,

Indeed, and we have to address it !

But, just so you know, if you use a min_size >= 2, then you have the expected results for the given min_size.

Here, the misbehaviour is due to the fact that, from the user perspective you think you have a min_size to 1, but in really it is overwritten by the minimum value for a given cost (here L2).
See here for the minimum value for L2 cost.
See here for the place where the set value (you set value to 1) is overwritten to 2.

Hope it helps !

@tdhock
Copy link
Author

tdhock commented Mar 25, 2022

ok, but for the l2 cost (square loss change in normal mean, constant variance) the min_size can be 1, so I still expect a valid segmentation should be returned (not an exception).

@tdhock tdhock changed the title Binseg incorrect for small data Binseg incorrect for small data, l2 loss and min_size=1 Mar 31, 2022
@oboulant
Copy link
Collaborator

Closing the issue since it has been addressed in #255

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants