DataEval Change Log

v0.78.0

🌟 Feature Release

bff82522 - Add collate function and convert packaged datasets to MAITE protocols
- Changes all dataset utility classes to use MAITE protocol formats (MNIST, CIFAR10, and VOCDetection)
- Addes collate to aggregate (and encode) MAITEdatasets into images/embeddings, targets, and metadata

🛠️ Improvements and Enhancements

d9e0f8b0 - Enforce embeddings on functions/methods that take embedding inputs

v0.77.1

🛠️ Improvements and Enhancements

9a420f7d - Update Assess the data space tutorial to fit JATIC DR-2.3
3ab63f3e - Integrate clusterer speed improvements with numba

v0.77.0

🌟 Feature Release

a1974e41 - Add global config module to control default device and max processes

👾 Fixes

c5ca814d - Enforce unit interval in OOD detector and coverage metric
41c4437b - CoverageOutput attributes renamed for clarity

Attributes renamed:
- indices -> uncovered_indices
- radii -> critical_value_radii
- critical_value -> coverage_radius
99631a94 - Fix ax.hist on small ranges in NumPy 2.1+

v0.76.1

🛠️ Improvements and Enhancements

a8a4cd4f - Remove merge from preprocess and address metadata array length inconsistencies
f8061eca - Add option to return dropped keys from metadata utility functions
a4ddbed1 - Add pandas dependency to all extras option
5b05981e - Expose dropped keys from nested lists and inconsistent keys in metadata merge and preprocess

📝 Miscellaneous

a20766ec - Updates to documentation
961ad923 - Miscellaneous docs changes

v0.76.0

🌟 Feature Release

4647edca - Expose flatten metadata function and update docstring

🛠️ Improvements and Enhancements

27d34a0c - Incorporating NAWCAD feedback to improve the documentation for the stat functions, outliers class and coverage class

📝 Miscellaneous

c9998971 - Switch themes to sphinx-immaterial, enable graphviz and restructure documentation
3dedde8f - Adding templates for auto generation of docs
8aaa89f3 - add deep dive prototype
d9d902f3 - Allow for float type bounding boxes
9bc0f910 - Add additional code coverage tests
15f1ae84 - Add logging to output metadata decorator
01cef92b - Split conftest for tests and doctests
adc8e293 - Publish MR docs and code coverage to deployment environments
a3fc1f6c - Moves document link to body to match other header titles
69892fd3 - Visibility enhancements to BalanceOutput.plot() heatmap
b6ab03a6 - Simplify docker build script for docs

v0.75.0

🌟 Feature Release

3aa12cb3 - Refactor bias metadata helpers

Metadata preprocessing functions have been moved from dataeval.metrics.bias.metadata_preprocessing to dataeval.utils.metadata.

🛠️ Improvements and Enhancements

ed98b6b1 - Return empty string for hashes on too small images

pchash now returns empty string when attempting to perform perception hashing against images or chips that are too small to meaningfully hash. Duplicates also ignore empty perception hashes to avoid false positive detections.
b144fa1c - Change torch to be required dependency

PyTorch is now a required dependency and the torch extra is no longer required for full functionality

📝 Miscellaneous

6e4474b2 - Refactor utils and fix associated docstrings, documentation and notebooks
ff87cee6 - Update documentation and CI pipelines to comply with SDP DR-3
aa7d9205 - Updated README.md format, added tagline and cdao funding acknowledgment
82559846 - Replace manual markdown files with autoapi generated rst files

v0.74.2

📝 Miscellaneous

e7a284de - Update dataset split unit tests
f8731a44 - Add initial logging framework and unit test
771dc1d1 - Add conda tests to pipeline
2d9fd55a - Update RTD yaml to use uv for installation
0ab99a7f - Initial prototyping of underspecification tests

v0.74.1

📝 Miscellaneous

102664de - Remove tensorflow from project
e782dad1 - Refactor OutputMetadata and clean up set_metadata decorator
80aae3a6 - Just use KSOutput as a MappingOutput instance instead of extracting the dict attribute it no longer has.
b738e01f - Allow docker cmds within dev container
16839b46 - Add MappingOutput class
e2cfda94 - Made metadata_tools/ks_compare compatible with new KSOutput class.

v0.74.0

🌟 Feature Release

73c1e1be - Implement PyTorch AutoEncoder based OOD detector

Adds initial PyTorch based Autoencoder OOD detector available when installed with the torch extra.

🛠️ Improvements and Enhancements

70794b5f - Moved discretization of metadata out of bias functions

📝 Miscellaneous

4d94e602 - Added test assertions for how_to notebooks
7723e242 - Introduce Pytorch OOD detector, with its new training procedure, into OOD howto notebook.
f5ac4bdd - Added new KSOutput class and adapted tests and other functions accordingly
3a01a81a - Introduce new Pytorch OOD detection into prototype metadata demo notebooks.
dc155554 - Fix torch gmm functions and enable tests
a715c1ef - Adjust docs to incorporate new metadata function
0719bad0 - Update dependencies to remove hdbscan

v0.73.1

👾 Fixes

cac3e2b8 - Fixes drift with pre-processing and shuffles MNIST by default

📝 Miscellaneous

bacbd0e7 - Use build script specifically for docs
0a87e912 - docker build for docs only
671b60a5 - Prototype function to infer whether a 1D sample is continuous or discrete
d0b8004a - Use explicit re namespace for compile, search, sub, and MULTILINE
502ca2df - Change to nox for automation test scripts
5b46ebea - Add new bias functional tests and set groundwork for rediscretization

v0.73.0

🌟 Feature Release

e055acf0 - Metadata utility function to merge, extend and flatten metadata
95b28ae1 - Adjust bias plotting functions to return figure

📝 Miscellaneous

532f92a2 - Minimum spanning tree and Clusterer are rewritten using numba for large code speed up
7377e012 - Switch jobs to use uv and tox natively
7af75016 - Add lazyloading for tensorflow modules

v0.72.2

🛠️ Improvements and Enhancements

ba52ef2e - Refactor away _internal module

📝 Miscellaneous

6e55451c - Integration of distribution compare and OOD MI metadata tools (continued)
e4f82173 - Streamlined tests
ac8fe3ee - Fix type mismatch on training AEGMM
6289c7d0 - Add plotting helper functions to diversity and balance
14d0cfd4 - Integration of low-level metadata drift/OOD exploration functions

v0.72.1

📝 Miscellaneous

32ba1f29 - Data split tests
76f73770 - Updated glossary and other files to use new style of links
20efd27e - Add support for Python 3.12

v0.72.0

🌟 Feature Release

14ef382c - Update dependencies for conda compatibility

v0.71.1

🛠️ Improvements and Enhancements

97849b01 - Update support for tensorflow >=2.16 with explicit keras v2

📝 Miscellaneous

85bafa30 - Swap brightness and darkness
96a30ad0 - Make optional checks more granular
55ca81d6 - Use native int for dict keys for Outliers
639e140b - silence warnings for docs and doctest

v0.71.0

🌟 Feature Release

cdae8a17 - Parallelize existing stats metric functions and introduce dedicated channelstats function

Running statistical analysis functions take significant time against large datasets. Due to the natural parallelism of analyzing individual images, we introduced parallel processing leveraging the multiprocessing library to accelerate processing times.

Affected functions:
- datasetstats
- dimensionstats
- hashstats
- pixelstats
- visualstats
Additionally, channelstats was added which performs the functionality of datasetstats but only for the functions that support per-channel stat calculation, pixelstats and visualstats.

📝 Miscellaneous

552668a0 - Update EDA part 1 tutorial with miscellaneous changes

v0.70.1

🛠️ Improvements and Enhancements

d1cdcda5 - API changes with supporting documentation updates

📝 Miscellaneous

5ecd4d3a - expose datasets API
6c19bba7 - Make sufficiency args more permissive
1bc2d067 - Improving MNIST class
d23b3461 - Extract small-scope reusable functions from tools made for prototype Associate[Drift|OOD]withMetadataTutorial notebooks.
5bea9512 - remove tf-iogcs-fs

v0.70.0

🌟 Feature Release

71e7ff06 - Integrate labelstats function
f40bf0e4 - Redesign stats functions for expansion to per-box, per-channel, and boxratiostats

🛠️ Improvements and Enhancements

72390edc - Change input format of balance and diversity to be the same as parity

👾 Fixes

f598c46a - Update pytorch to 2.2.0+

📝 Miscellaneous

b8f0d502 - Create copy on to_numpy by default
04a71337 - Fix CI docs job to load on build
9286f5e8 - Skip or rework MNIST based unit tests
704f44e3 - Investigate the use of metadata to help explain observed dataset drifts and OOD examples
e25f84f3 - Expose SufficiencyOutput and move class methods to output class
742a084c - Adding algorithm compatability/requirements table
7ce85be7 - Misc concept documentation

v0.69.4

📝 Miscellaneous

7bca6ed4 - Unified all MNIST and MNIST corrupt datasets to a single internal MNIST class
66ad1c92 - new drift detector: multivariate domain classifier

v0.69.3

📝 Miscellaneous

6745e39d - Document: Class Label Statistical Independence and Coverage Documentation
1f7689ac - Adding bias tutorial (parity-balance-diversity)

v0.69.2

📝 Miscellaneous

f7d5bac3 - Adds stats for bounding boxes
18be58a3 - Adding label stats
809d1d7a - Always produce p-val and distance metrics for drift
5cd7c205 - Improving imagestats and channelstats functions
b379d44c - Add dataset splitting features
80b68a73 - Use regex to replace markdown links
1d99455a - Tag LKG at the correct commit SHA
ad0e368b - Always run tasks

v0.69.1

📝 Miscellaneous

d9068a2c - Fix release and changelog script

v0.69.0

📝 Miscellaneous

63ab70d7 - Remove automatic update of documentation notebooks

v0.68.0

🌟 Feature Release

47b48e14 - Allow Duplicates and Outliers detectors to take in multiple StatsOutput objects

📝 Miscellaneous

65d8f3de - Combine classwise bias metric outputs with non-classwise
ccfd72ef - Adding clustering/coverage tutorial
6d09d710 - Add CONTRIBUTING.md
72387d9c - Updated version replacement script to include cache files
5285f01b - Prototype Performance Estimation
3ae16116 - concept pages for balance and diversity, rescale Simpson diversity
3e16a905 - Switching documentation themes to the pydata theme

v0.67.0

🌟 Feature Release

a0b04800 - Refactor DataEval functions and classes and update documentation
- Changes DataEval functions and classes to be more hierarchical in modules:
  - detectors
    - drift (DriftCVS, DriftKS, DriftMMD, DriftUncertainty)
    - linters (Clusterer, Duplicates, Outliers)
    - ood (OOD_AE, OOD_AEGMM, OOD_LLR, OOD_VAE, OOD_VAEGMM)
  - flags (ImageStat)
  - metrics
    - bias (balance, coverage, diversity, parity)
    - estimators (ber, divergence, uap)
    - stats (imagestats, channelstats)
  - workflows (Sufficiency)
- Backends have been moved from models to tensorflow and torch
- Renamed following classes:
  - Linter -> Outliers
  - parity -> label_parity
  - parity_metadata -> parity
  - DriftOutput -> DriftBaseOutput
  - DriftUnivariateOutput -> DriftOutput
- Miscellaneous fixes:
  - Documentation updated
  - Streamlined optional import checks in the __init__.py tree
  - Fixed misspelling in glossary

👾 Fixes

84aae760 - balance test cleanup

📝 Miscellaneous

6d09d710 - Add CONTRIBUTING.md
72387d9c - Updated version replacement script to include cache files
5285f01b - Prototype Performance Estimation
3ae16116 - concept pages for balance and diversity, rescale Simpson diversity
3e16a905 - Switching documentation themes to the pydata theme
d50d9cd1 - Update Landing Page
2fd7fa59 - Author drift detection tutorial
49b5af42 - Use uv instead of pyenv for python deployment
0f6eb6b0 - Pin notebooks on release to specific version
4f101a4e - Adjust imagestats and channelstats reference guides to new format
0ee82ede - Only build data image in main pipeline
7b84ceb5 - Improve test coverage
d3c5258a - Add StatsOutput as input type for linter and duplicates
cf73393a - Updates drift reference guides and concept page
4ce5cdf7 - Adjust model reference guides to new format
17195a2b - Adjust parity reference guides to new format
e9761b4d - Adjust out of distribution reference guides to new format
eaf707a7 - Adjust uap reference guide to new format
335ac3be - Adjust sufficiency reference guide to new format
3a866f01 - Change Optional[Type] to Type | None per 3.10+ standards

v0.66.0

🌟 Feature Release

a0b04800 - Refactor DataEval functions and classes and update documentation
- Changes DataEval functions and classes to be more hierarchical in modules:
  - detectors
    - drift (DriftCVS, DriftKS, DriftMMD, DriftUncertainty)
    - linters (Clusterer, Duplicates, Outliers)
    - ood (OOD_AE, OOD_AEGMM, OOD_LLR, OOD_VAE, OOD_VAEGMM)
  - flags (ImageStat)
  - metrics
    - bias (balance, coverage, diversity, parity)
    - estimators (ber, divergence, uap)
    - stats (imagestats, channelstats)
  - workflows (Sufficiency)
- Backends have been moved from models to tensorflow and torch
- Renamed following classes:
  - Linter -> Outliers
  - parity -> label_parity
  - parity_metadata -> parity
  - DriftOutput -> DriftBaseOutput
  - DriftUnivariateOutput -> DriftOutput
- Miscellaneous fixes:
  - Documentation updated
  - Streamlined optional import checks in the __init__.py tree
  - Fixed misspelling in glossary

🛠️ Improvements and Enhancements

5f730baa - Refactor ImageStats and ChannelStats as metric functions

👾 Fixes

84aae760 - balance test cleanup
3ebd278c - handle float-type categorical variables in balance metric
066b7153 - Fixes modzscore to account for division by 0

📝 Miscellaneous

d50d9cd1 - Update Landing Page
2fd7fa59 - Author drift detection tutorial
49b5af42 - Use uv instead of pyenv for python deployment
0f6eb6b0 - Pin notebooks on release to specific version
4f101a4e - Adjust imagestats and channelstats reference guides to new format
0ee82ede - Only build data image in main pipeline
7b84ceb5 - Improve test coverage
d3c5258a - Add StatsOutput as input type for linter and duplicates
cf73393a - Updates drift reference guides and concept page
4ce5cdf7 - Adjust model reference guides to new format
17195a2b - Adjust parity reference guides to new format
e9761b4d - Adjust out of distribution reference guides to new format
eaf707a7 - Adjust uap reference guide to new format
335ac3be - Adjust sufficiency reference guide to new format
3a866f01 - Change Optional[Type] to Type | None per 3.10+ standards
fe1e292d - Use output dataclass with metadata
b3f6a027 - Unify handling of image reshaping

v0.65.0

🛠️ Improvements and Enhancements

5f730baa - Refactor ImageStats and ChannelStats as metric functions

👾 Fixes

3ebd278c - handle float-type categorical variables in balance metric
066b7153 - Fixes modzscore to account for division by 0

📝 Miscellaneous

fe1e292d - Use output dataclass with metadata
b3f6a027 - Unify handling of image reshaping

v0.64.0

🌟 Feature Release

bea0446c - Torch Dataset Reader

🛠️ Improvements and Enhancements

eda88822 - Refactor metrics

📝 Miscellaneous

a4b8e919 - Created new documentation issue templates
1028d082 - Remove is_arraylike function
dbcecec6 - Refactored read_dataset to handle common dataset returns
61b1f854 - Updated Workflow Landing Page
cf96c7f2 - Run doctest in CI pipeline
ecfcf89b - Adjusted notebooks to work on google colab and added environment requirements
5f863782 - Update remaining metric output to NamedTuple
e58f4dba - Add metadata parity documentation
6319a1d4 - Adding Duplicates concept
787545f5 - Adding ImageStats and ChannelStats concept document
7826405c - Update Data Cleaning concept
50047116 - Change to Semantic Versioning
9e43399c - Bayes Error Rate - explanation documentation
266ad738 - Updated BER docstrings with NDArray, shapes, and examples

v0.63.0

🛠️ Improvements and Enhancements

3225cf18 - Convert remaining metrics and detectors to ArrayLike
5d88b82a - Add Torch and Tensorflow interop through ArrayLike protocol and to_numpy converter
d3342275 - Refactor linter and duplicates to call evaluate with data
65d5aaa8 - Refactor metrics to call evaluate with data

v0.61.0

🛠️ Improvements and Enhancements

cd59debb - Release DataEval v0.61.0!

DAML is now officially rebranded as DataEval! New name, same great camel flavor.

v0.56.0

🌟 Feature Release

64416675 - Update clusterer class and documentation
- Clusterer detector released
This class assists in exploratory data analysis of unlabeled data by identifying duplicates and outliers. Additional information on usage is available in our documentation.

v0.55.0

🌟 Feature Release

278b4dc1 - Release Linter, Duplicates, ImageStats, ChannelStats and Parity

Linter, Duplicates detectors and ImageStats, ChannelStats, and Parity metrics are now released. The existing metrics available have also been moved into different modules (detectors and workflows) that better reflect their functionality.
- detectors
  - Drift detectors: DriftCVM, DriftKS, DriftMMD, DriftUncertainty and supporting classes
  - Out-of-distribution detectors: OOD_AE, OOD_AEGMM, OOD_LLR, OOD_VAE, OOD_VAEGMM and supporting classes
  - Linter
  - Duplicates
- metrics
  - BER
  - Divergence
  - Parity
  - ImageStats
  - ChannelStats
  - UAP
- workflows
  - Sufficiency

v0.54.0

🛠️ Improvements and Enhancements

58263ac7 - Move niter param to evaluate and calculate and retain curve coefficients in output dictionary

This change enhances the output of the Sufficiency metric to provide the coefficients for the learning curve by measure/class when running the metric. These parameters were previously recalculated each call to project and plot. The parameters are provided as a Dict[str, np.ndarray] under the _CURVE_PARAMS_ key in the output dictionary.

v0.53.0

🌟 Feature Release

322fc830 - Add parameter k to BER estimator for KNN to enable k>1 for better consistency with ground truth in certain cases

v0.52.0

🛠️ Improvements and Enhancements

07b12ac2 - Fully integrate outlier detection

Outlier Detection API has been changed. Additional details are available in our documentation.

v0.51.0

🌟 Feature Release

2ed88a07 - Implement Drift Detection Metrics

This change adds 4 types of Drift Detection metrics which allow for the detection of potential drift in the dataset.
- Kolmogorov-Smirnov
- Cramér-von Mises
- Maximum Mean Discrepancy
- Classifier Uncertainty
The conceptual source is derived from Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift and the implementation is derived from Alibi-Detect v0.11.4.

v0.45.0

🚧 Deprecations and Removals

5cc48bec - Divergence metric naming corrected to HP Divergence

Divergence metric output now returns a dictionary of { "divergence": float, "error": int } instead of { "dpdivergence": float, "error": int }. Code, documentation and tutorials have been updated to the correct nomenclature of HP (Henze-Penrose) divergence.

v0.44.6

🌟 Feature Release

41b20d3a - Add rules for release label pipeline workflow and merge request release template

🛠️ Improvements and Enhancements

7ee53c9c - Update Divergence default to MST

v0.44.2

🛠️ Improvements and Enhancements

1468aa5c - Switch to markdown and updated docs

v0.43.0

🛠️ Improvements and Enhancements

670a0db5 - Add support for classwise Sufficiency metrics
b96ee099 - Have sufficiency train and eval functions take indices and batch size instead of a DataLoader

v0.42.2

🛠️ Improvements and Enhancements

5225c491 - Change output classes to dictionaries
45040682 - Make Sufficiency a stateful class and revise SufficiencyOutput
7c5fdcff - Pass method as a parameter to determine metric algorithm to use
2e883f6d - Add better optimizer to find global minimum
c3c78680 - Expose AETrainer to public API to use model multiple times after training

👾 Fixes

93564b95 - Updating pyproject.toml and lock file to set dependency less than numpy 2.0

v0.42.0

🛠️ Improvements and Enhancements

601cfae8 - Sufficiency Plotting of Multiple Metrics during one run
3d68a6f1 - Add parameter to plot function for optional file output

🚧 Deprecations and Removals

a6ce3e72 - Remove UAP_MST metric

v0.40.2

🛠️ Improvements and Enhancements

f3eddaed - Flavor 2 - Remove models from metrics entirely

v0.40.1

🚧 Deprecations and Removals

db888bb7 - Remove usage of DamlDataset for ARiA metrics

v0.38.1

🛠️ Improvements and Enhancements

42617f43 - Enable GPU functionality in pytorch features

v0.38.0

🌟 Feature Release

c9b5116e - ARiA Autoencoder as PyTorch Model

🛠️ Improvements and Enhancements

8fe97232 - Add export_model functionality and improve test coverage
42cc77ea - Add empirical upper bound to UAP metric output

👾 Fixes

636dfdaf - update project with version metadata

v0.36.1

🌟 Feature Release

7d1a599f - Implement the uap class

v0.36.0

🛠️ Improvements and Enhancements

0799523b - Object detection model training

v0.29.0

🌟 Feature Release

166df3b0 - Implement Dataset Sufficiency Metric

🛠️ Improvements and Enhancements

5c4e6e06 - Use convolutional autoencoder for BER and Divergence metrics

👾 Fixes

c78e5502 - Sufficiency typecheck bugfix

v0.28.5

🛠️ Improvements and Enhancements

9d1c354c - Add fit_dataset, format_dataset to DpDivergence & BER

v0.28.4

👾 Fixes

c39e009e - Fix typecheck issues found with pyright-1.1.333

v0.26.13

🌟 Feature Release

949e09bd - Add kNN BER implementation

v0.26.10

🛠️ Improvements and Enhancements

dab0a8ff - Handle MST edge cases

v0.26.4

🛠️ Improvements and Enhancements

bf31996f - BER lower bound capability

v0.25.11

🛠️ Improvements and Enhancements

dfe0bddb - Add support for python 3.11

v0.25.4

🛠️ Improvements and Enhancements

2ca285cc - update BER metric to return a dataclass instead of dict

v0.25.3

👾 Fixes

67f08b27 - Fix: Alibi-detect-models-have-fixed-architecture-shapes

v0.25.2

🛠️ Improvements and Enhancements

db4adaff - 69 convert metric output dictionary to dataclass

v0.24.8

🌟 Feature Release

79614577 - Implement Multiclass MST version of BER

v0.24.6

🌟 Feature Release

2ad9fed5 - Implement BER estimate

v0.23.1

🌟 Feature Release

99d2fd22 - Implement outlier detection metrics using the alibi-detect VAE method

v0.23.0

🌟 Feature Release

85eb2c1f - Implement outlier detection metrics using the alibi-detect auto-encoder method

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

DataEval Change Log

v0.78.0

v0.77.1

v0.77.0

v0.76.1

v0.76.0

v0.75.0

v0.74.2

v0.74.1

v0.74.0

v0.73.1

v0.73.0

v0.72.2

v0.72.1

v0.72.0

v0.71.1

v0.71.0

v0.70.1

v0.70.0

v0.69.4

v0.69.3

v0.69.2

v0.69.1

v0.69.0

v0.68.0

v0.67.0

v0.66.0

v0.65.0

v0.64.0

v0.63.0

v0.61.0

v0.56.0

v0.55.0

v0.54.0

v0.53.0

v0.52.0

v0.51.0

v0.45.0

v0.44.6

v0.44.2

v0.43.0

v0.42.2

v0.42.0

v0.40.2

v0.40.1

v0.38.1

v0.38.0

v0.36.1

v0.36.0

v0.29.0

v0.28.5

v0.28.4

v0.26.13

v0.26.10

v0.26.4

v0.25.11

v0.25.4

v0.25.3

v0.25.2

v0.24.8

v0.24.6

v0.23.1

v0.23.0