SynthPop

SynthPop generates tabular synthetic data with gaussian copulas.

Motivation

We want to model the joint distribution of {X,y} so we can draw more samples. Having more samples from a statistically identical distribution could (a) reduce overfitting or (b) preserve privacy (by creating a dataset with identical statistical properties without revealing groundtruth).

Example

You have a few samples from the following distribution.

With SynthPop, you can generate more samples from that distribution by (a) fitting a Guassian copula to those observations and (b) drawing samples from that multivariate Gaussian.

from SynthPop import Copula

data = np.load("data.npy")  # ground truth of 100 samples

Generator = Copula()
Generator.fit(data)  # fit a Guassian so it has a similar distribution
x1, x2 = Generator.sample(k=1000)  # draw as many samples as you need

By fitting on synthetic data, we can often improve model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
Example.ipynb		Example.ipynb
Gaussian Copulas.ipynb		Gaussian Copulas.ipynb
README.md		README.md
SynthPop.py		SynthPop.py
data.npy		data.npy
joint.png		joint.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SynthPop

Motivation

Example

About

Releases

Packages

Languages

mynameisvinn/SynthPop

Folders and files

Latest commit

History

Repository files navigation

SynthPop

Motivation

Example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages