A plugin for Pentaho Data Integration (Kettle) that adds support for DuckDB in the table input/output step.
This project was born out of experimentation with Apache HOP and the realization that it supported DuckDB out of the box, while recent versions of Kettle did not 😥
After a weekend of tinkering with the Kettle SDK and drawing inspiration from the DuckDB plugin in Apache HOP, this plugin brings the same functionality to Kettle.
Current and past releases can be found on the Releases page. Each release includes pre-compiled packages (zip files) containing the necessary JAR file(s) for installation, compiled for a specific version of PDI and DuckDB (see release notes). Download the most recent release that matches your version of Kettle.
📝 Releases may not always keep pace with the latest versions of DuckDB or Pentaho Community Edition. If you need a more recent version of DuckDB, see the Build Instructions for compiling your own package.
- Unpack the zip file into the
plugins
directory of your local Kettle install (\data-integration\plugins
). - The zip file should include the plugin and necessary JDBC driver for DuckDB, with the following strucutre:
data-integration\
plugins\
duckdb-kettle-plugin-x-x-x\
lib\
duckdb_jdbc-x.x.x.jar
duckdb-kettle-plugin-x-x-x.jar
- Restart Kettle (Spoon).
After installing the plugin, DuckDB should be available as a connection type from a table input or output step, similar to a SQLite connection.
Before building the plugin, ensure you have the following installed and configured on your local machine:
- Maven, version 3+
- Java JDK 11 (or OpenJDK)
- The Pentaho Maven settings.xml file in your home
.m2
directory
- Clone this repository locally:
git clone https://github.com/forgineer/duckdb-kettle-plugin.git
- Review the
pom.xml
file and update the<version>
tags and to match the same Pentaho Data Integration (Kettle) and DuckDB versions you intend to use. Verify each version on the Maven repository.
<properties>
<!-- Pentaho Data Integration (Kettle) Version -->
<!-- Ex: 9.3.0.0-428, 9.4.0.0-343, etc. -->
<kettle.version>9.4.0.0-343</kettle.version>
<!-- DuckDB Version (JDBC driver) -->
<!-- Ex: 0.10.2, 1.0.0, etc. -->
<duckdb.version>1.1.0</duckdb.version>
...
</properties>
- Update the jar file name of the JDBC driver in the main source (
DuckDBDatabaseMeta.java
) to match the version found in thepom.xml
file. Again, verify the file name and version on the Maven repository or directly from DuckDB.
@Override
public String[] getUsedLibraries() {
// The version should match POM
return new String[] {"duckdb_jdbc-1.1.0.jar"};
}
- Package the jar file:
mvn package
This will create a jar and zip file inside of the target
directory:
- duckdb-kettle-plugin-x.x.x.jar
- duckdb-kettle-plugin-x.x.x.zip
We welcome contributions to the duckdb-kettle-plugin project. Before submitting a pull request, please:
- Raise an issue to discuss the proposed changes.
- Ensure that the issue is clear and concise, and that we've discussed and agreed on the changes.