Skip to content

Hive XSL UDTF Home

jameskrobinson edited this page Jun 7, 2021 · 4 revisions

The Hive XSL UDTF is a proof-of-concept project, to explore the capabilities of Hive UDTF's and to see whether a UDTF, in combination with an XSL parser, can solve a number of common issues found when processing complex XML in Hive.

In its current form, it allows a Hive user to do the following:

  • Transform complex XML into tabular data using a user-supplied transformation schema;
  • Transform complex XML into name-value-pairs;
  • Transform XML to JSON, JSON to XML and JSON to CSV, assuming the source XML /JSON is stored in Hive;
  • Transform XML into csv without having to supply a csv schema;
  • Transform any XML into tabular data, deriving a schema at query-time using an XSD. I refer to this as dynamic-schema-on-read.

It has been tested with Finastra Summit XML, SWIFT ISO 20022 XML and ISDA's FPML. It will probably work with FPML derivatives (MxML, SWML, etc) with minimal modification.

Though it works as expected, please keep in mind:

  • It's a proof-of-concept and thus a bit rough around the edges!
  • It's a work in progress - see the backlog page for things I plan to do, and their priority;

To gain an understanding of the UDTF, and the motivations behind it, please read through the following pages:

Project background

Example 1 - basic usage

Example 2 - name value pairs

Example 3 - working with JSON

Example 4 - going schema-less with CSV

Example 5 - the holy grail... dynamic schema on read

Readme.md