Skip to content

Latest commit

 

History

History
127 lines (105 loc) · 10.3 KB

README.md

File metadata and controls

127 lines (105 loc) · 10.3 KB

Flytekit python plugins

All flytekitplugins maintained by the core team are added here. It is not necessary to add plugins here, but this is a good starting place.

Currently available plugins

Plugin Installation Description Version Type
AWS Sagemaker Training bash pip install flytekitplugins-awssagemaker Installs SDK to author Sagemaker built-in and custom training jobs in python PyPI version fury.io Backend
Hive Queries bash pip install flytekitplugins-hive Installs SDK to author Hive Queries that can be executed on a configured hive backend using Flyte backend plugin PyPI version fury.io Backend
K8s distributed PyTorch Jobs bash pip install flytekitplugins-kfpytorch Installs SDK to author Distributed pyTorch Jobs in python using Kubeflow PyTorch Operator PyPI version fury.io Backend
K8s native tensorflow Jobs bash pip install flytekitplugins-kftensorflow Installs SDK to author Distributed tensorflow Jobs in python using Kubeflow Tensorflow Operator PyPI version fury.io Backend
Papermill based Tasks bash pip install flytekitplugins-papermill Execute entire notebooks as Flyte Tasks and pass inputs and outputs between them and python tasks PyPI version fury.io Flytekit-only
Pod Tasks bash pip install flytekitplugins-pod Installs SDK to author Pods in python. These pods can have multiple containers, use volumes and have non exiting side-cars PyPI version fury.io Flytekit-only
spark bash pip install flytekitplugins-spark Installs SDK to author Spark jobs that can be executed natively on Kubernetes with a supported backend Flyte plugin PyPI version fury.io Backend
AWS Athena Queries bash pip install flytekitplugins-athena Installs SDK to author queries executed on AWS Athena PyPI version fury.io Backend
DOLT bash pip install flytekitplugins-dolt Read & write dolt data sets and use dolt tables as native types PyPI version fury.io Flytekit-only
Pandera bash pip install flytekitplugins-pandera Use Pandera schemas as native Flyte types, which enable data quality checks. PyPI version fury.io Flytekit-only
SQLAlchemy bash pip install flytekitplugins-sqlalchemy Write queries for any database that supports SQLAlchemy PyPI version fury.io Flytekit-only

Have a Plugin Idea?

Please file an issue

Development

Flyte plugins are structured as micro-libs and can be authored in an independent repository. The plugins maintained by the core team are maintained in this repository and provide a simple way of discovery. When authoring plugins here are some tips

  1. The folder name has to be flytekit-*. e.g. flytekit-hive. In case you want to group for a specific service then use flytekit-aws-athena.

  2. Flytekit plugins uses a concept called Namespace packages. Thus the package structure is very important. Use the following python package structure,

    flytekit-myplugin/
       - README.md
       - setup.py
       - flytekitplugins/
           - myplugin/
              - __init__.py
       - tests
           - __init__.py
    

    NOTE the inner package flytekitplugins DOES NOT have an __init__.py file.

  3. The published packages have to be named as flytekitplugins-{package-name}, where {package-name} is a unique identifier for the plugin.

  4. The setup.py has the following template. You can simply copy paste it and edit the TODO sections

from setuptools import setup

# TODO put the plugin name here
PLUGIN_NAME = "<plugin-name e.g. pandera>"

# TODO decide if the plugin is regular or `data`
# for regular plugins
microlib_name = f"flytekitplugins-{PLUGIN_NAME}"
# For data/persistence plugins
# microlib_name = f"flytekitplugins-data-{PLUGIN_NAME}"

# TODO add additional requirements
plugin_requires = ["flytekit>=0.21.3,<1.0.0", "<other requirements>"]

__version__ = "0.0.0+develop"

setup(
    name=microlib_name,
    version=__version__,
    author="flyteorg",
    author_email="admin@flyte.org",
    # TODO Edit the description
    description="My awesome plugin.....",
    # TODO alter the last part of the following URL
    url="https://github.com/flyteorg/flytekit/tree/master/plugins/flytekit-...",
    long_description=open("README.md").read(),
    long_description_content_type="text/markdown",
    namespace_packages=["flytekitplugins"],
    packages=[f"flytekitplugins.{PLUGIN_NAME}"],
    install_requires=plugin_requires,
    license="apache2",
    python_requires=">=3.7",
    classifiers=[
        "Intended Audience :: Science/Research",
        "Intended Audience :: Developers",
        "License :: OSI Approved :: Apache Software License",
        "Programming Language :: Python :: 3.7",
        "Programming Language :: Python :: 3.8",
        "Topic :: Scientific/Engineering",
        "Topic :: Scientific/Engineering :: Artificial Intelligence",
        "Topic :: Software Development",
        "Topic :: Software Development :: Libraries",
        "Topic :: Software Development :: Libraries :: Python Modules",
    ],
    # TODO OPTIONAL
    # FOR Plugins where auto-loading on installation is desirable, please uncomment this line and ensure that the
    # __init__.py has the right modules available to be loaded, or point to the right module
    # entry_points={"flytekit.plugins": [f"{PLUGIN_NAME}=flytekitplugins.{PLUGIN_NAME}"]},
)
  1. Each plugin should have a README.md, which describes how to install it, and has a simple example for it.

  2. Each plugin should have its own tests package NOTE it has an __init__.py file.

  3. There may be some cases in which you might want to Auto-load some of your modules when the plugin is installed. This is especially true for data-plugins and type-plugins. In such cases, you can add a special directive in the setup.py which will instruct flytekit to automatically load the prescribed modules. Following shows an excerpt from the flytekit-data-fsspec plugin's setup.py

setup(
    entry_points={"flytekit.plugins": [f"{PLUGIN_NAME}=flytekitplugins.{PLUGIN_NAME}"]},
)
  1. Examples:
  • Example of a simple python task that allows adding some python side functionality only flytekit-greatexpectations
  • Example of a TypeTransformer or a Type Plugin flytekit-pandera. These plugins add new types to Flyte and tell Flyte how to transform them and add additional features through types. Remeber, Flyte is a multi-lang system and type transformers allow marshalling between flytekit and backend and other languages.
  • Example of TaskTemplate plugin, which also allows plugin writers to supply a prebuilt container for runtime. flytekit-sqlalchemy
  • Example of SQL backend plugin. The actual query invocation is done by a backend plugin. flytekit-snowflake
  • Example of a Meta plugin, that can wrap other tasks flytekit-papermill
  • Example of a plugin that modifies the execution command flytekit-spark OR flytekit-aws-sagemaker
  • Example that allows executing the user container with some other context modifications flytekit-kf-tensorflow
  • Example of a Persistence Plugin, that allows data to be stored to different persistence layers flytekit-data-fsspec

Refer to this Blog to understand the idea of microlibs

Unit tests

Plugins should have their own unit tests