-
Notifications
You must be signed in to change notification settings - Fork 14
alignment command
Nacho edited this page Jun 30, 2015
·
1 revision
The 'alignment' command allows you to process BAM sequence files both in a local scenario or in a Hadoop cluster.
Assuming you are in the hpg-bigdata folder, type the following command to see the available alignment sub-commands for the Hadoop scenario:
$ build/bin/hpg-bigdata.sh alignment
Usage: hpg-bigdata.sh alignment <subcommand> [options]
Subcommands:
convert Converts BAM files to different big data formats such as Avro and Parquet
stats Compute some stats for a file containing alignments according to the GA4GH/Avro model
depth Compute the depth (or coverage) for a given file containing alignments according to the GA4GH/Avro model
For a local scenario, use the script hpg-bigdata-local.sh:
$ build/bin/hpg-bigdata-local.sh alignment
Usage: hpg-bigdata-local.sh alignment <subcommand> [options]
Subcommands:
convert Converts BAM files to different big data formats such as Avro
Converts BAM files to different big data formats such as Avro and Parquet according to the GA4GH schema models. In the local scenario, only Avro is available.
Hadoop scenario:
$ build/bin/hpg-bigdata.sh alignment convert -h
Usage: hpg-bigdata.sh alignment convert [options]
Options:
* -i, --input STRING HDFS input file in BAM format [null]
--to-parquet To save the output file in Parquet format [false]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
* -o, --output STRING HDFS output file to store the BAM alignments according to the GA4GH/Avro model [null]
-h, --help This parameter prints this help [false]
--conf STRING Set the configuration file [null]
-v, --verbose BOOLEAN This parameter set the level of the logging [false]
-x, --compression STRING Accepted values: snappy, deflate, bzip2, xz, null. Default: snappy [snappy]
Example:
$ hadoop fs -mkdir /test
$ hadoop fs -copyFromLocal build/data/test.bam /test
$ hadoop fs -ls /test
Found 1 items
-rw-r--r-- 1 jtarraga supergroup 11755 2015-06-30 16:32 /test/test.bam
$ hadoop fs -mkdir /out
$ build/bin/hpg-bigdata.sh alignment convert -i /test/test.bam -o /out/test.bam.avro --to-parquet
...
...
$ hadoop fs -ls /out/test.bam.avro
Found 4 items
-rw-r--r-- 1 jtarraga supergroup 0 2015-06-30 16:33 /out/test.bam.avro/_SUCCESS
-rw-r--r-- 1 jtarraga supergroup 32608 2015-06-30 16:33 /out/test.bam.avro/part-r-00000.avro
-rw-r--r-- 1 jtarraga supergroup 552 2015-06-30 16:33 /out/test.bam.avro/part-r-00000.avro.header
drwxr-xr-x - jtarraga supergroup 0 2015-06-30 16:33 /out/test.bam.avro/to-parquet
$ hadoop fs -ls /out/test.bam.avro/to-parquet
Found 4 items
-rw-r--r-- 1 jtarraga supergroup 0 2015-06-30 16:33 /out/test.bam.avro/to-parquet/_SUCCESS
-rw-r--r-- 1 jtarraga supergroup 16217 2015-06-30 16:33 /out/test.bam.avro/to-parquet/_common_metadata
-rw-r--r-- 1 jtarraga supergroup 21021 2015-06-30 16:33 /out/test.bam.avro/to-parquet/_metadata
-rw-r--r-- 1 jtarraga supergroup 37268 2015-06-30 16:33 /out/test.bam.avro/to-parquet/part-m-00000.snappy.parquet
Local scenario:
$ build/bin/hpg-bigdata-local.sh alignment convert -h
Usage: hpg-bigdata-local.sh alignment convert [options]
Options:
--conf STRING Set the configuration file [null]
-x, --compression STRING Accepted values: snappy, deflate, bzip2, xz. Default: snappy [snappy]
-v, --verbose BOOLEAN This parameter set the level of the logging [false]
-h, --help This parameter prints this help [false]
* -i, --input STRING Local input file in BAM format [null]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
--to-bam Convert back to BAM fomat. In this case, the input file has to be saved in the GA4GH/Avro model, and the output file will be in BAM format [false]
* -o, --output STRING Local output file to store the BAM alignments according to the GA4GH/Avro model [null]
Example:
$ mkdir /tmp/out
$ build/bin/hpg-bigdata-local.sh alignment convert -i build/data/test.bam -o /tmp/out/test.bam.avro
$ ls -ltr /tmp/out/test.bam.avro
-rw-rw-r-- 1 jtarraga jtarraga 20348 jun 30 16:37 /tmp/out/test.bam.avro
In a local scenario, you can convert back to bam from avro, using the --to-bam option:
$ build/bin/hpg-bigdata-local.sh alignment convert -i /tmp/out/test.bam.avro -o /tmp/out/test.bam.avro.bam --to-bam
$ ls -lt build/data/test.bam /tmp/out/test.bam.avro.bam
-rw-rw-r-- 1 jtarraga jtarraga 11779 jun 30 16:39 /tmp/out/test.bam.avro.bam
-rw-rw-r-- 1 jtarraga jtarraga 11755 jun 30 15:29 build/data/test.bam
Hadoop scenario:
$ build/bin/hpg-bigdata.sh alignment stats -h
Usage: hpg-bigdata.sh alignment stats [options]
Options:
* -o, --output STRING Local output directory to save stats results in JSON format [null]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
-h, --help This parameter prints this help [false]
--conf STRING Set the configuration file [null]
* -i, --input STRING HDFS input file containing alignments stored accordin
Example:
$ mkdir /tmp/out-bam-stats
$ build/bin/hpg-bigdata.sh alignment stats -i /out/test.bam.avro/part-r-00000.avro -o /tmp/out-bam-stats/
...
...
$ ls -ltr /tmp/out-bam-stats/
total 8
-rw-r--r-- 1 jtarraga jtarraga 4562 jun 30 16:43 stats.json
$ cat /tmp/out-bam-stats/stats.json
{"num_mapped": 176, "num_unmapped": 0, "num_paired": 176, "num_mapped_first": 88, "num_mapped_second": 88, "num_mismatches": 151, "nu...
...
Hadoop scenario:
$ build/bin/hpg-bigdata.sh alignment depth -h
Usage: hpg-bigdata.sh alignment depth [options]
Options:
* -o, --output STRING Local output directory to save stats results in a text file [null]
-L, --log-level STRING Set the level log, values: debug, info, warning, error, fatal [info]
-h, --help This parameter prints this help [false]
--conf STRING Set the configuration file [null]
* -i, --input STRING HDFS input file containing alignments stored accordin
Example:
$ mkdir /tmp/out-bam-depth
$ build/bin/hpg-bigdata.sh alignment depth -i /out/test.bam.avro/part-r-00000.avro -o /tmp/out-bam-depth/
...
...
$ ls -ltr /tmp/out-bam-depth/
total 5088
-rw-r--r-- 1 jtarraga jtarraga 5208096 jun 30 16:47 depth.txt
$ head /tmp/out-bam-depth/depth.txt
1 2080000 0
1 2080001 0
1 2080002 0
1 2080003 0
1 2080004 0
1 2080005 0
1 2080006 0
1 2080007 0
1 2080008 0
1 2080009 0