🌟 Feature Release
-
bff82522
- Add collate function and convert packaged datasets to MAITE protocols- Changes all dataset utility classes to use
MAITE
protocol formats (MNIST
,CIFAR10
, andVOCDetection
) - Addes
collate
to aggregate (and encode)MAITE
datasets into images/embeddings, targets, and metadata
- Changes all dataset utility classes to use
🛠️ Improvements and Enhancements
d9e0f8b0
- Enforce embeddings on functions/methods that take embedding inputs
🛠️ Improvements and Enhancements
9a420f7d
- Update Assess the data space tutorial to fit JATIC DR-2.33ab63f3e
- Integrate clusterer speed improvements with numba
🌟 Feature Release
a1974e41
- Add global config module to control default device and max processes
👾 Fixes
-
c5ca814d
- Enforce unit interval in OOD detector and coverage metric -
41c4437b
- CoverageOutput attributes renamed for clarityAttributes renamed:
indices
->uncovered_indices
radii
->critical_value_radii
critical_value
->coverage_radius
-
99631a94
- Fix ax.hist on small ranges in NumPy 2.1+
🛠️ Improvements and Enhancements
a8a4cd4f
- Remove merge from preprocess and address metadata array length inconsistenciesf8061eca
- Add option to return dropped keys from metadata utility functionsa4ddbed1
- Add pandas dependency toall
extras option5b05981e
- Expose dropped keys from nested lists and inconsistent keys in metadata merge and preprocess
📝 Miscellaneous
a20766ec
- Updates to documentation961ad923
- Miscellaneous docs changes
🌟 Feature Release
4647edca
- Expose flatten metadata function and update docstring
🛠️ Improvements and Enhancements
27d34a0c
- Incorporating NAWCAD feedback to improve the documentation for the stat functions, outliers class and coverage class
📝 Miscellaneous
c9998971
- Switch themes to sphinx-immaterial, enable graphviz and restructure documentation3dedde8f
- Adding templates for auto generation of docs8aaa89f3
- add deep dive prototyped9d902f3
- Allow for float type bounding boxes9bc0f910
- Add additional code coverage tests15f1ae84
- Add logging to output metadata decorator01cef92b
- Split conftest for tests and doctestsadc8e293
- Publish MR docs and code coverage to deployment environmentsa3fc1f6c
- Moves document link to body to match other header titles69892fd3
- Visibility enhancements to BalanceOutput.plot() heatmapb6ab03a6
- Simplify docker build script for docs
🌟 Feature Release
-
3aa12cb3
- Refactor bias metadata helpersMetadata preprocessing functions have been moved from
dataeval.metrics.bias.metadata_preprocessing
todataeval.utils.metadata
.
🛠️ Improvements and Enhancements
-
ed98b6b1
- Return empty string for hashes on too small imagespchash
now returns empty string when attempting to perform perception hashing against images or chips that are too small to meaningfully hash.Duplicates
also ignore empty perception hashes to avoid false positive detections. -
b144fa1c
- Change torch to be required dependencyPyTorch is now a required dependency and the
torch
extra is no longer required for full functionality
📝 Miscellaneous
6e4474b2
- Refactor utils and fix associated docstrings, documentation and notebooksff87cee6
- Update documentation and CI pipelines to comply with SDP DR-3aa7d9205
- Updated README.md format, added tagline and cdao funding acknowledgment82559846
- Replace manual markdown files with autoapi generated rst files
📝 Miscellaneous
e7a284de
- Update dataset split unit testsf8731a44
- Add initial logging framework and unit test771dc1d1
- Add conda tests to pipeline2d9fd55a
- Update RTD yaml to use uv for installation0ab99a7f
- Initial prototyping of underspecification tests
📝 Miscellaneous
102664de
- Remove tensorflow from projecte782dad1
- Refactor OutputMetadata and clean up set_metadata decorator80aae3a6
- Just use KSOutput as a MappingOutput instance instead of extracting the dict attribute it no longer has.b738e01f
- Allow docker cmds within dev container16839b46
- Add MappingOutput classe2cfda94
- Made metadata_tools/ks_compare compatible with new KSOutput class.
🌟 Feature Release
-
73c1e1be
- Implement PyTorch AutoEncoder based OOD detectorAdds initial PyTorch based Autoencoder OOD detector available when installed with the
torch
extra.
🛠️ Improvements and Enhancements
70794b5f
- Moved discretization of metadata out of bias functions
📝 Miscellaneous
4d94e602
- Added test assertions for how_to notebooks7723e242
- Introduce Pytorch OOD detector, with its new training procedure, into OOD howto notebook.f5ac4bdd
- Added new KSOutput class and adapted tests and other functions accordingly3a01a81a
- Introduce new Pytorch OOD detection into prototype metadata demo notebooks.dc155554
- Fix torch gmm functions and enable testsa715c1ef
- Adjust docs to incorporate new metadata function0719bad0
- Update dependencies to remove hdbscan
👾 Fixes
cac3e2b8
- Fixes drift with pre-processing and shuffles MNIST by default
📝 Miscellaneous
bacbd0e7
- Use build script specifically for docs0a87e912
- docker build for docs only671b60a5
- Prototype function to infer whether a 1D sample is continuous or discreted0b8004a
- Use explicit re namespace for compile, search, sub, and MULTILINE502ca2df
- Change to nox for automation test scripts5b46ebea
- Add new bias functional tests and set groundwork for rediscretization
🌟 Feature Release
e055acf0
- Metadata utility function to merge, extend and flatten metadata95b28ae1
- Adjust bias plotting functions to return figure
📝 Miscellaneous
532f92a2
- Minimum spanning tree and Clusterer are rewritten using numba for large code speed up7377e012
- Switch jobs to use uv and tox natively7af75016
- Add lazyloading for tensorflow modules
🛠️ Improvements and Enhancements
ba52ef2e
- Refactor away _internal module
📝 Miscellaneous
6e55451c
- Integration of distribution compare and OOD MI metadata tools (continued)e4f82173
- Streamlined testsac8fe3ee
- Fix type mismatch on training AEGMM6289c7d0
- Add plotting helper functions to diversity and balance14d0cfd4
- Integration of low-level metadata drift/OOD exploration functions
📝 Miscellaneous
32ba1f29
- Data split tests76f73770
- Updated glossary and other files to use new style of links20efd27e
- Add support for Python 3.12
🌟 Feature Release
14ef382c
- Update dependencies for conda compatibility
🛠️ Improvements and Enhancements
97849b01
- Update support for tensorflow >=2.16 with explicit keras v2
📝 Miscellaneous
85bafa30
- Swap brightness and darkness96a30ad0
- Make optional checks more granular55ca81d6
- Use native int for dict keys for Outliers639e140b
- silence warnings for docs and doctest
🌟 Feature Release
-
cdae8a17
- Parallelize existing stats metric functions and introduce dedicated channelstats functionRunning statistical analysis functions take significant time against large datasets. Due to the natural parallelism of analyzing individual images, we introduced parallel processing leveraging the
multiprocessing
library to accelerate processing times.Affected functions:
datasetstats
dimensionstats
hashstats
pixelstats
visualstats
Additionally,
channelstats
was added which performs the functionality ofdatasetstats
but only for the functions that support per-channel stat calculation,pixelstats
andvisualstats
.
📝 Miscellaneous
552668a0
- Update EDA part 1 tutorial with miscellaneous changes
🛠️ Improvements and Enhancements
d1cdcda5
- API changes with supporting documentation updates
📝 Miscellaneous
5ecd4d3a
- expose datasets API6c19bba7
- Make sufficiency args more permissive1bc2d067
- Improving MNIST classd23b3461
- Extract small-scope reusable functions from tools made for prototype Associate[Drift|OOD]withMetadataTutorial notebooks.5bea9512
- remove tf-iogcs-fs
🌟 Feature Release
71e7ff06
- Integrate labelstats functionf40bf0e4
- Redesign stats functions for expansion to per-box, per-channel, and boxratiostats
🛠️ Improvements and Enhancements
72390edc
- Change input format of balance and diversity to be the same as parity
👾 Fixes
f598c46a
- Update pytorch to 2.2.0+
📝 Miscellaneous
b8f0d502
- Create copy onto_numpy
by default04a71337
- Fix CI docs job to load on build9286f5e8
- Skip or rework MNIST based unit tests704f44e3
- Investigate the use of metadata to help explain observed dataset drifts and OOD examplese25f84f3
- Expose SufficiencyOutput and move class methods to output class742a084c
- Adding algorithm compatability/requirements table7ce85be7
- Misc concept documentation
📝 Miscellaneous
7bca6ed4
- Unified all MNIST and MNIST corrupt datasets to a single internal MNIST class66ad1c92
- new drift detector: multivariate domain classifier
📝 Miscellaneous
6745e39d
- Document: Class Label Statistical Independence and Coverage Documentation1f7689ac
- Adding bias tutorial (parity-balance-diversity)
📝 Miscellaneous
f7d5bac3
- Adds stats for bounding boxes18be58a3
- Adding label stats809d1d7a
- Always produce p-val and distance metrics for drift5cd7c205
- Improving imagestats and channelstats functionsb379d44c
- Add dataset splitting features80b68a73
- Use regex to replace markdown links1d99455a
- Tag LKG at the correct commit SHAad0e368b
- Always run tasks
📝 Miscellaneous
d9068a2c
- Fix release and changelog script
📝 Miscellaneous
63ab70d7
- Remove automatic update of documentation notebooks
🌟 Feature Release
47b48e14
- Allow Duplicates and Outliers detectors to take in multiple StatsOutput objects
📝 Miscellaneous
65d8f3de
- Combine classwise bias metric outputs with non-classwiseccfd72ef
- Adding clustering/coverage tutorial6d09d710
- Add CONTRIBUTING.md72387d9c
- Updated version replacement script to include cache files5285f01b
- Prototype Performance Estimation3ae16116
- concept pages for balance and diversity, rescale Simpson diversity3e16a905
- Switching documentation themes to the pydata theme
🌟 Feature Release
-
a0b04800
- Refactor DataEval functions and classes and update documentation- Changes DataEval functions and classes to be more hierarchical in modules:
- detectors
- drift (DriftCVS, DriftKS, DriftMMD, DriftUncertainty)
- linters (Clusterer, Duplicates, Outliers)
- ood (OOD_AE, OOD_AEGMM, OOD_LLR, OOD_VAE, OOD_VAEGMM)
- flags (ImageStat)
- metrics
- bias (balance, coverage, diversity, parity)
- estimators (ber, divergence, uap)
- stats (imagestats, channelstats)
- workflows (Sufficiency)
- detectors
- Backends have been moved from
models
totensorflow
andtorch
- Renamed following classes:
Linter
->Outliers
parity
->label_parity
parity_metadata
->parity
DriftOutput
->DriftBaseOutput
DriftUnivariateOutput
->DriftOutput
- Miscellaneous fixes:
- Documentation updated
- Streamlined optional import checks in the
__init__.py
tree - Fixed misspelling in glossary
- Changes DataEval functions and classes to be more hierarchical in modules:
👾 Fixes
84aae760
- balance test cleanup
📝 Miscellaneous
6d09d710
- Add CONTRIBUTING.md72387d9c
- Updated version replacement script to include cache files5285f01b
- Prototype Performance Estimation3ae16116
- concept pages for balance and diversity, rescale Simpson diversity3e16a905
- Switching documentation themes to the pydata themed50d9cd1
- Update Landing Page2fd7fa59
- Author drift detection tutorial49b5af42
- Use uv instead of pyenv for python deployment0f6eb6b0
- Pin notebooks on release to specific version4f101a4e
- Adjust imagestats and channelstats reference guides to new format0ee82ede
- Only build data image in main pipeline7b84ceb5
- Improve test coveraged3c5258a
- Add StatsOutput as input type for linter and duplicatescf73393a
- Updates drift reference guides and concept page4ce5cdf7
- Adjust model reference guides to new format17195a2b
- Adjust parity reference guides to new formate9761b4d
- Adjust out of distribution reference guides to new formateaf707a7
- Adjust uap reference guide to new format335ac3be
- Adjust sufficiency reference guide to new format3a866f01
- Change Optional[Type] to Type | None per 3.10+ standards
🌟 Feature Release
-
a0b04800
- Refactor DataEval functions and classes and update documentation- Changes DataEval functions and classes to be more hierarchical in modules:
- detectors
- drift (DriftCVS, DriftKS, DriftMMD, DriftUncertainty)
- linters (Clusterer, Duplicates, Outliers)
- ood (OOD_AE, OOD_AEGMM, OOD_LLR, OOD_VAE, OOD_VAEGMM)
- flags (ImageStat)
- metrics
- bias (balance, coverage, diversity, parity)
- estimators (ber, divergence, uap)
- stats (imagestats, channelstats)
- workflows (Sufficiency)
- detectors
- Backends have been moved from
models
totensorflow
andtorch
- Renamed following classes:
Linter
->Outliers
parity
->label_parity
parity_metadata
->parity
DriftOutput
->DriftBaseOutput
DriftUnivariateOutput
->DriftOutput
- Miscellaneous fixes:
- Documentation updated
- Streamlined optional import checks in the
__init__.py
tree - Fixed misspelling in glossary
- Changes DataEval functions and classes to be more hierarchical in modules:
🛠️ Improvements and Enhancements
5f730baa
- Refactor ImageStats and ChannelStats as metric functions
👾 Fixes
84aae760
- balance test cleanup3ebd278c
- handle float-type categorical variables in balance metric066b7153
- Fixes modzscore to account for division by 0
📝 Miscellaneous
d50d9cd1
- Update Landing Page2fd7fa59
- Author drift detection tutorial49b5af42
- Use uv instead of pyenv for python deployment0f6eb6b0
- Pin notebooks on release to specific version4f101a4e
- Adjust imagestats and channelstats reference guides to new format0ee82ede
- Only build data image in main pipeline7b84ceb5
- Improve test coveraged3c5258a
- Add StatsOutput as input type for linter and duplicatescf73393a
- Updates drift reference guides and concept page4ce5cdf7
- Adjust model reference guides to new format17195a2b
- Adjust parity reference guides to new formate9761b4d
- Adjust out of distribution reference guides to new formateaf707a7
- Adjust uap reference guide to new format335ac3be
- Adjust sufficiency reference guide to new format3a866f01
- Change Optional[Type] to Type | None per 3.10+ standardsfe1e292d
- Use output dataclass with metadatab3f6a027
- Unify handling of image reshaping
🛠️ Improvements and Enhancements
5f730baa
- Refactor ImageStats and ChannelStats as metric functions
👾 Fixes
3ebd278c
- handle float-type categorical variables in balance metric066b7153
- Fixes modzscore to account for division by 0
📝 Miscellaneous
fe1e292d
- Use output dataclass with metadatab3f6a027
- Unify handling of image reshaping
🌟 Feature Release
bea0446c
- Torch Dataset Reader
🛠️ Improvements and Enhancements
eda88822
- Refactor metrics
📝 Miscellaneous
a4b8e919
- Created new documentation issue templates1028d082
- Remove is_arraylike functiondbcecec6
- Refactored read_dataset to handle common dataset returns61b1f854
- Updated Workflow Landing Pagecf96c7f2
- Run doctest in CI pipelineecfcf89b
- Adjusted notebooks to work on google colab and added environment requirements5f863782
- Update remaining metric output to NamedTuplee58f4dba
- Add metadata parity documentation6319a1d4
- Adding Duplicates concept787545f5
- Adding ImageStats and ChannelStats concept document7826405c
- Update Data Cleaning concept50047116
- Change to Semantic Versioning9e43399c
- Bayes Error Rate - explanation documentation266ad738
- Updated BER docstrings with NDArray, shapes, and examples
🛠️ Improvements and Enhancements
3225cf18
- Convert remaining metrics and detectors to ArrayLike5d88b82a
- Add Torch and Tensorflow interop through ArrayLike protocol and to_numpy converterd3342275
- Refactor linter and duplicates to call evaluate with data65d5aaa8
- Refactor metrics to call evaluate with data
🛠️ Improvements and Enhancements
-
cd59debb
- Release DataEval v0.61.0!DAML is now officially rebranded as DataEval! New name, same great camel flavor.
🌟 Feature Release
-
64416675
- Update clusterer class and documentationClusterer
detector released
This class assists in exploratory data analysis of unlabeled data by identifying duplicates and outliers. Additional information on usage is available in our documentation.
🌟 Feature Release
-
278b4dc1
- Release Linter, Duplicates, ImageStats, ChannelStats and ParityLinter
,Duplicates
detectors andImageStats
,ChannelStats
, andParity
metrics are now released. The existing metrics available have also been moved into different modules (detectors
andworkflows
) that better reflect their functionality.detectors
- Drift detectors:
DriftCVM
,DriftKS
,DriftMMD
,DriftUncertainty
and supporting classes - Out-of-distribution detectors:
OOD_AE
,OOD_AEGMM
,OOD_LLR
,OOD_VAE
,OOD_VAEGMM
and supporting classes Linter
Duplicates
- Drift detectors:
metrics
BER
Divergence
Parity
ImageStats
ChannelStats
UAP
workflows
Sufficiency
🛠️ Improvements and Enhancements
-
58263ac7
- Move niter param to evaluate and calculate and retain curve coefficients in output dictionaryThis change enhances the output of the
Sufficiency
metric to provide the coefficients for the learning curve by measure/class when running the metric. These parameters were previously recalculated each call to project and plot. The parameters are provided as aDict[str, np.ndarray]
under the_CURVE_PARAMS_
key in the output dictionary.
🌟 Feature Release
322fc830
- Add parameterk
to BER estimator for KNN to enablek>1
for better consistency with ground truth in certain cases
🛠️ Improvements and Enhancements
-
07b12ac2
- Fully integrate outlier detectionOutlier Detection API has been changed. Additional details are available in our documentation.
🌟 Feature Release
-
2ed88a07
- Implement Drift Detection MetricsThis change adds 4 types of Drift Detection metrics which allow for the detection of potential drift in the dataset.
- Kolmogorov-Smirnov
- Cramér-von Mises
- Maximum Mean Discrepancy
- Classifier Uncertainty
The conceptual source is derived from Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift and the implementation is derived from Alibi-Detect v0.11.4.
🚧 Deprecations and Removals
-
5cc48bec
- Divergence metric naming corrected to HP DivergenceDivergence metric output now returns a dictionary of
{ "divergence": float, "error": int }
instead of{ "dpdivergence": float, "error": int }
. Code, documentation and tutorials have been updated to the correct nomenclature of HP (Henze-Penrose) divergence.
🌟 Feature Release
41b20d3a
- Add rules for release label pipeline workflow and merge request release template
🛠️ Improvements and Enhancements
7ee53c9c
- Update Divergence default to MST
🛠️ Improvements and Enhancements
1468aa5c
- Switch to markdown and updated docs
🛠️ Improvements and Enhancements
670a0db5
- Add support for classwise Sufficiency metricsb96ee099
- Have sufficiency train and eval functions take indices and batch size instead of a DataLoader
🛠️ Improvements and Enhancements
5225c491
- Change output classes to dictionaries45040682
- Make Sufficiency a stateful class and revise SufficiencyOutput7c5fdcff
- Pass method as a parameter to determine metric algorithm to use2e883f6d
- Add better optimizer to find global minimumc3c78680
- Expose AETrainer to public API to use model multiple times after training
👾 Fixes
93564b95
- Updating pyproject.toml and lock file to set dependency less than numpy 2.0
🛠️ Improvements and Enhancements
601cfae8
- Sufficiency Plotting of Multiple Metrics during one run3d68a6f1
- Add parameter to plot function for optional file output
🚧 Deprecations and Removals
a6ce3e72
- Remove UAP_MST metric
🛠️ Improvements and Enhancements
f3eddaed
- Flavor 2 - Remove models from metrics entirely
🚧 Deprecations and Removals
db888bb7
- Remove usage of DamlDataset for ARiA metrics
🛠️ Improvements and Enhancements
42617f43
- Enable GPU functionality in pytorch features
🌟 Feature Release
c9b5116e
- ARiA Autoencoder as PyTorch Model
🛠️ Improvements and Enhancements
8fe97232
- Add export_model functionality and improve test coverage42cc77ea
- Add empirical upper bound to UAP metric output
👾 Fixes
636dfdaf
- update project with version metadata
🌟 Feature Release
7d1a599f
- Implement the uap class
🛠️ Improvements and Enhancements
0799523b
- Object detection model training
🌟 Feature Release
166df3b0
- Implement Dataset Sufficiency Metric
🛠️ Improvements and Enhancements
5c4e6e06
- Use convolutional autoencoder for BER and Divergence metrics
👾 Fixes
c78e5502
- Sufficiency typecheck bugfix
🛠️ Improvements and Enhancements
9d1c354c
- Add fit_dataset, format_dataset to DpDivergence & BER
👾 Fixes
c39e009e
- Fix typecheck issues found with pyright-1.1.333
🌟 Feature Release
949e09bd
- Add kNN BER implementation
🛠️ Improvements and Enhancements
dab0a8ff
- Handle MST edge cases
🛠️ Improvements and Enhancements
bf31996f
- BER lower bound capability
🛠️ Improvements and Enhancements
dfe0bddb
- Add support for python 3.11
🛠️ Improvements and Enhancements
2ca285cc
- update BER metric to return a dataclass instead of dict
👾 Fixes
67f08b27
- Fix: Alibi-detect-models-have-fixed-architecture-shapes
🛠️ Improvements and Enhancements
db4adaff
- 69 convert metric output dictionary to dataclass
🌟 Feature Release
79614577
- Implement Multiclass MST version of BER
🌟 Feature Release
2ad9fed5
- Implement BER estimate
🌟 Feature Release
99d2fd22
- Implement outlier detection metrics using the alibi-detect VAE method
🌟 Feature Release
85eb2c1f
- Implement outlier detection metrics using the alibi-detect auto-encoder method