This dataflow program will take input file path as argument and generate seperate output file for following -
- Counts of all words
- Word(s) with the lowest count
- Word(s) with the highest count
- Sum of all words
- Clone this project as maven project
- build the project.
- execute the dataflow with the following command
mvn compile exec:java -Dexec.mainClass=hello.projects.dataflowwc.dataflowwc.WordCountApp -Dexec.args="--runner=DataflowRunner --inputFile= --outputFileBucketName= --gcpTempLocation=<Temp location in GCP bucket, eg- gs://bucketName/temp/> --tempLocation=<Temp location in GCP bucket, eg- gs://bucketName/temp/> --region=us-east1 --project="
OR you can create templete with the same command as above without exec and use that template to invoke dataflow from cloud function/ Pub Sub/ etc..