Purpose

Provide a script that makes it easy to generate a TPC-H data set and upload the output files to an AWS S3 bucket.

The generated files are chunked to support the parallel load capabilities provided by Amazon Redshift, AWS Glue, AWS EMR, etc.

The collection of files generated for each table are placed in their own S3 prefix which is necessary for certain services such as Redshift Spectrum.

Instructions

git clone https://github.com/electrum/tpch-dbgen

cd tpch-dbgen

make

cd..

git clone https://github.com/matwerber1/tpch-dbgen-to-aws-s3

cd tpch-dbgen-to-aws-s3
./run.sh

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh