Skip to content

Latest commit

 

History

History
70 lines (59 loc) · 2.68 KB

read-hbase-sequence-file.md

File metadata and controls

70 lines (59 loc) · 2.68 KB

Java code for reading HBase exported sequence

We used HBase export utility to take daily dump/backup of HBase tables. Dump data is further ingested to s3 to make it available for data science (i.e. DS) team. Since DS team works with a subset of data where the subset is not defined by time range. We need to read dump data and create segregated data such that it is usable by DS team. Data segregation can be achieved through spark, but being startup company we can’t afford to run continuous spark job just for data ingestion, we need something standalone that can run on spot instances and need not to be a spark based.

Sequence file are generated by below hbase export utiltiy

hbase org.apache.hadoop.hbase.mapreduce.Driver export <tablename> <destination-dir>

Below code is tested for sequence file generated on HBase 1.3.x and 1.4.x

Add dependecies
compile group: 'org.apache.hbase', name: 'hbase-client', version: '1.4.2'
compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '2.4.0'
compile group: 'org.apache.hadoop', name: 'hadoop-core', version: '1.2.1'
compile group: 'org.apache.hbase', name: 'hbase-mapreduce', version: '2.1.0'
Gist
Configuration conf = new Configuration();
conf.setStrings(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY, conf.get(CommonConfigurationKeysPublic.IO_SERIALIZATIONS_KEY),
       ResultSerialization.class.getName(),
       WritableSerialization.class.getName()
);
FileSystem fs = null;

try {
   fs = FileSystem.get(conf);
   Path inputPath = new Path("./hbase-sequence-file-reader/src/main/resources/sequencefiles/part-m-00000");
   SequenceFile.Reader reader = new SequenceFile.Reader(fs, inputPath, conf);

   WritableComparable key = (WritableComparable) reader.getKeyClass().newInstance();

   Result result = null;

   while (reader.next(key)){
       String skey = Bytes.toString(((ImmutableBytesWritable)key).get());
       result = (Result) reader.getCurrentValue(result);
       NavigableMap<byte[], byte[]> resultMap = result.getFamilyMap(Bytes.toBytes("d"));
       System.out.println(skey);
       resultMap.forEach((k, v) -> {
           System.out.println(Bytes.toString(k) +" "+Bytes.toString(v));
       });
   }
   reader.close();

} catch (IOException e) {
   e.printStackTrace();
} catch (IllegalAccessException e) {
   e.printStackTrace();
} catch (InstantiationException e) {
   e.printStackTrace();
}
Code is avaliable at ReadHBaseSequenceFile.java file, below code not tested
java -jar hbase-sequence-file-reader-1.0-SNAPSHOT.jar
1
address.city hyderabad
address.pincode 500081
age 27
name Chetana
2
address.city hyderabad
address.pincode 500084
age 25
name Nilesh

Back to Main Document