You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ColumnSynthesizer is expected to independently model each column.
For numerical or datetime sdtypes, it should learn a univariate GMM during fit. Then during sample, it can create data from it.
For categorical or boolean sdtypes, it should learn the frequencies of each category. Then during sample, it can create data using those frequencies as weights.
For other sdtypes (such as id, pii, etc.), it can simply use the RegexGenerator or AnonymizedFaker to generate values from scratch (no learning is expected)
How does this synthesizer know which type is which? It should use the provided metadata as the ground source of truth.
What is actually observed
Similar to the UniformSynthesizer (see #248), this synthesizer just lets the RDT HyperTransformer decide which column is which sdtype (based on the data).
It should be referencing the metadata, since the metadata is the source of truth.
The text was updated successfully, but these errors were encountered:
npatki
changed the title
The IndependentSynthesizer should follow the sdtypes in the metadata (not the data's dtypes)
The ColumnSynthesizer should follow the sdtypes in the metadata (not the data's dtypes)
Jan 8, 2025
Environment Details
What is expected
The
ColumnSynthesizer
is expected to independently model each column.numerical
ordatetime
sdtypes, it should learn a univariate GMM during fit. Then during sample, it can create data from it.categorical
orboolean
sdtypes, it should learn the frequencies of each category. Then during sample, it can create data using those frequencies as weights.id
,pii
, etc.), it can simply use theRegexGenerator
orAnonymizedFaker
to generate values from scratch (no learning is expected)How does this synthesizer know which type is which? It should use the provided metadata as the ground source of truth.
What is actually observed
Similar to the
UniformSynthesizer
(see #248), this synthesizer just lets the RDT HyperTransformer decide which column is which sdtype (based on the data).It should be referencing the metadata, since the metadata is the source of truth.
The text was updated successfully, but these errors were encountered: