Java code for reading HBase exported sequence

We used HBase export utility to take daily dump/backup of HBase tables. Dump data is further ingested to s3 to make it available for data science (i.e. DS) team. Since DS team works with a subset of data where the subset is not defined by time range. We need to read dump data and create segregated data such that it is usable by DS team. Data segregation can be achieved through spark, but being startup company we can’t afford to run continuous spark job just for data ingestion, we need something standalone that can run on spot instances and need not to be a spark based.

Sequence file are generated by below hbase export utiltiy

hbase org.apache.hadoop.hbase.mapreduce.Driver export <tablename> <destination-dir>

Below code is tested for sequence file generated on HBase 1.3.x and 1.4.x

Add dependecies

compile group: 'org.apache.hbase', name: 'hbase-client', version: '1.4.2'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '2.4.0'
compile group: 'org.apache.hadoop', name: 'hadoop-core', version: '1.2.1'
compile group: 'org.apache.hbase', name: 'hbase-mapreduce', version: '2.1.0'

Gist

Configuration conf = new Configuration();
conf.setStrings(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY, conf.get(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY),
       ResultSerialization.class.getName(),
       WritableSerialization.class.getName()
);
FileSystem fs = null;

try {
   fs = FileSystem.get(conf);
   Path inputPath = new Path("./hbase-sequence-file-reader/src/main/resources/sequencefiles/part-m-00000");
   SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputPath, conf);

   WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();

   Result result = null;

   while (reader.next(key)){
       String skey = Bytes.toString(((ImmutableBytesWritable)key).get());
       result = (Result) reader.getCurrentValue(result);
       NavigableMap<byte[], byte[]> resultMap = result.getFamilyMap(Bytes.toBytes("d"));
       System.out.println(skey);
       resultMap.forEach((k, v) -> {
           System.out.println(Bytes.toString(k) +" "+Bytes.toString(v));
       });
   }
   reader.close();

} catch (IOException e) {
   e.printStackTrace();
} catch (IllegalAccessException e) {
   e.printStackTrace();
} catch (InstantiationException e) {
   e.printStackTrace();
}

Code is avaliable at ReadHBaseSequenceFile.java file, below code not tested

java -jar hbase-sequence-file-reader-1.0-SNAPSHOT.jar
1
address.city hyderabad
address.pincode 500081
age 27
name Chetana
2
address.city hyderabad
address.pincode 500084
age 25
name Nilesh

Back to Main Document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!