Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

feat: Update README with Docker Image Deployment Instructions and Enhance Release Details #39

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

gbsDojo
Copy link
Collaborator

@gbsDojo gbsDojo commented Aug 23, 2024

Description:
This pull request updates the README.md file with new sections and enhancements to provide clearer guidance and improve the documentation for the dojo-beam-transforms project. The following key changes are included in this PR:

  1. Docker Image Deployment Instructions:

    • Added a new section detailing the benefits of saving Docker images, along with step-by-step instructions for building, tagging, and pushing Docker images to a registry.
    • The example provided uses Google Cloud's Artifact Registry, but the instructions also cover other common registries such as Docker Hub, Amazon ECR, and Azure Container Registry.
  2. Release Details:

    • Expanded the section on dependency versions for Release 1.0.0, with a clear highlight of the Apache Beam SDK version (2.58.1) and the compatible Python versions (3.8, 3.9, 3.10).
    • These updates ensure that users have all the necessary information to correctly configure their environments and dependencies.
  3. Apache Beam SDK Update:

    • Updated the project to use Apache Beam SDK version 2.58.1. This version includes several improvements but comes with specific warnings and known issues:
      • Kafka IO Warning: Offsets are not committed when using io.kafka.ReadFromKafka configured with commit_offset_in_finalize. This issue is tracked in Apache Beam GitHub issue #32196.
      • Data Corruption Issue: Pipelines that read data from Cloud Storage using GcsIO, either directly or through BigQueryIO, may experience data corruption on Apache Beam Python SDK versions 2.53.0 to 2.58.1. This issue is detailed in Apache Beam GitHub issue #32169.
    • It’s important to note that version 2.58.1 will be deprecated on August 16, 2025, as per the Google Cloud documentation.
  4. Table of Contents:

    • Generated and included a table of contents for easier navigation within the README file, allowing users to quickly find relevant sections.

Rationale:
These updates enhance the usability and clarity of the project documentation, ensuring that users are well-informed about the deployment process and potential issues with the Apache Beam SDK. The README now provides comprehensive instructions for Docker image deployment and highlights critical information about the dependencies and versioning used in the dojo-beam-transforms project.

@gbsDojo gbsDojo added documentation Improvements or additions to documentation enhancement New feature or request labels Aug 23, 2024
@gbsDojo gbsDojo changed the title Updating SDK to 2.58.1 feat: Update README with Docker Image Deployment Instructions and Enhance Release Details Aug 23, 2024
@gbsDojo gbsDojo merged commit 0498cd1 into main Aug 26, 2024
4 checks passed
@gbsDojo gbsDojo deleted the feat/update_sdk branch August 26, 2024 16:34
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants