You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this paper, we introduced a new methodology for calculating multi-sequence metrics called MSAS. We should add the MSAS-related metrics to SDMetrics so that users with sequential data can use them for evaluation.
Expected behavior
Add a metric called StatisticMSAS that performs the MSAS algorithm for a given statistic.
Data compatibility: 1 ID column (representing the sequence key), and 1 continuous column (datetime or numerical)
Parameters:
(required) real_data: A tuple of 2 pandas.Series objects. The first represents the sequence key of the real data and the second represents a continuous column of data.
(required) synthetic_data: A tuple of 2 pandas.Series objects. The first represents the sequence key of the synthetic data and the second represents a continuous column of data.
statistic: A string representing the statistic function to use when computing MSAS
(default) 'mean': The arithmetic mean
'median': The median value
'std': The standard deviation
'min': The min value
'max': The max value
Output: A score in range [0, 1] -- 0 being the worst and 1 being the best
How does it work? The sequence key determines which continuous values belong to which sequence. This metric computes a statistic for all sequences in the real and synthetic data, and then compares those distributions.
Calculate the statistic value of each sequence in the real data (call this distribution D_r)
Calculate the statistic value of each sequence in the synthetic data (call this distribution D_s)
Now apply the KSComplement metric to compare the similarities of the distributions (D_r, D_s). Return this score.
The text was updated successfully, but these errors were encountered:
Problem Description
In this paper, we introduced a new methodology for calculating multi-sequence metrics called MSAS. We should add the MSAS-related metrics to SDMetrics so that users with sequential data can use them for evaluation.
Expected behavior
Add a metric called
StatisticMSAS
that performs the MSAS algorithm for a given statistic.Data compatibility: 1 ID column (representing the sequence key), and 1 continuous column (datetime or numerical)
Parameters:
real_data
: A tuple of 2 pandas.Series objects. The first represents the sequence key of the real data and the second represents a continuous column of data.synthetic_data
: A tuple of 2 pandas.Series objects. The first represents the sequence key of the synthetic data and the second represents a continuous column of data.statistic
: A string representing the statistic function to use when computing MSAS'mean'
: The arithmetic mean'median'
: The median value'std'
: The standard deviation'min'
: The min value'max'
: The max valueOutput: A score in range [0, 1] -- 0 being the worst and 1 being the best
How does it work? The sequence key determines which continuous values belong to which sequence. This metric computes a statistic for all sequences in the real and synthetic data, and then compares those distributions.
The text was updated successfully, but these errors were encountered: