Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Refactor Harvest Reporter to be a property of HarvestSource #5073

Closed
3 of 7 tasks
btylerburton opened this issue Feb 6, 2025 · 0 comments
Closed
3 of 7 tasks

Refactor Harvest Reporter to be a property of HarvestSource #5073

btylerburton opened this issue Feb 6, 2025 · 0 comments
Assignees
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0

Comments

@btylerburton
Copy link
Contributor

btylerburton commented Feb 6, 2025

User Story

Currently in the reporter we document:

  job_status = {
      "status": "complete",
      "date_finished": datetime.now(timezone.utc),
      "records_added": results["action"]["create"],
      "records_updated": results["action"]["update"],
      "records_deleted": results["action"]["delete"],
      "records_ignored": results["action"][None],
      "records_errored": results["status"]["error"],
      "records_validated": results["validity"][True],
  }

In order to better track the results of a harvest job, datagovteam wants to make the reporter a property of the harvest_source and have it count the results of the operation as they happen vs. creating a summary once the job is complete.

This ensures:

  1. We accurately count deletes, as they are currently misrepresented since the record is already deleted from the code at the time of reporting
  2. We can create a report in the case of a critical exception which fails the job

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

  • GIVEN I am running a harvest
    WHEN a record is processed then we will count the change in its state as it happens
    THEN when we arrive at the time to report, we report the results that are available rather than doing math in process.

Background

[Any helpful contextual notes or links to artifacts/evidence, if needed]

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

  • Create a new reporter object with the documented attributes above
  • Make it a property of harvest_source
  • Update the attributes as the harvest source is processed

Added:

  • Update harvest job as records are processed
  • Add percentage complete to jobs display page
  • Update job with stats even in case of critical error
@btylerburton btylerburton added the H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0 label Feb 6, 2025
@btylerburton btylerburton moved this to 🏗 In Progress [8] in data.gov team board Feb 10, 2025
@btylerburton btylerburton self-assigned this Feb 12, 2025
@btylerburton btylerburton moved this from 🏗 In Progress [8] to 👀 Needs Review [2] in data.gov team board Feb 27, 2025
@btylerburton btylerburton moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Feb 28, 2025
@btylerburton btylerburton moved this from ✔ Done to 👀 Needs Review [2] in data.gov team board Feb 28, 2025
@btylerburton btylerburton moved this from 👀 Needs Review [2] to ✔ Done in data.gov team board Feb 28, 2025
@btylerburton btylerburton closed this as completed by moving to ✔ Done in data.gov team board Feb 28, 2025
@tdlowden tdlowden moved this from ✔ Done to 🗄 Closed in data.gov team board Mar 12, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
H2.0/Harvest-Runner Harvest Source Processing for Harvesting 2.0
Projects
Status: 🗄 Closed
Development

No branches or pull requests

1 participant