We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
For some reason, I was unable to run soda with lesser number of rows
create table Employee ( id int primary key, name varchar(255) ); insert into Employee (id, name) values (1, 'Alice'); insert into Employee (id, name) values (2, 'Bob'); insert into Employee (id, name) values (3, 'Alice'); insert into Employee (id, name) values (11, 'Alice'); insert into Employee (id, name) values (12, 'Bob'); insert into Employee (id, name) values (13, 'Alice'); insert into Employee (id, name) values (21, 'Alice'); insert into Employee (id, name) values (22, 'Bob'); insert into Employee (id, name) values (23, 'Alice'); insert into Employee (id, name) values (31, 'Alice'); insert into Employee (id, name) values (32, 'Bob'); insert into Employee (id, name) values (33, 'Alice'); insert into Employee (id, name) values (41, 'Alice'); insert into Employee (id, name) values (42, 'Bob'); insert into Employee (id, name) values (43, 'Alice'); insert into Employee (id, name) values (51, 'Alice'); insert into Employee (id, name) values (52, 'Bob'); insert into Employee (id, name) values (53, 'Alice');
checks for Employee: - row_count = 18 - distribution_difference(name) < 0.05: method: chi_square distribution reference file: ./distribution.yaml
with distribution.yaml:
distribution.yaml
dataset: employee column: name distribution_type: categorical distribution_reference: weights: - 0.7 - 0.3 bins: - Alice - Bob
chi_square statistic is close to zero, since the number of Alice rows is 12 and Bob's is 6
the statistic value is high (~0.6)
When I change the order of weights but not the bins, the statistic is OK
The text was updated successfully, but these errors were encountered:
CLOUD-8980
Sorry, something went wrong.
No branches or pull requests
Steps to reproduce
data.sql
For some reason, I was unable to run soda with lesser number of rows
with
distribution.yaml
:Expected behavior
chi_square statistic is close to zero, since the number of Alice rows is 12 and Bob's is 6
Actual behavior
the statistic value is high (~0.6)
Misc
When I change the order of weights but not the bins, the statistic is OK
The text was updated successfully, but these errors were encountered: