Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Give up trying to pin a CID after a given time threshold #812

Open
4 tasks
mbommerez opened this issue Dec 17, 2021 · 2 comments
Open
4 tasks

Give up trying to pin a CID after a given time threshold #812

mbommerez opened this issue Dec 17, 2021 · 2 comments
Labels
topic/pot Issues handled by PT. topic/psa

Comments

@mbommerez
Copy link

mbommerez commented Dec 17, 2021

If you send a pinning request for CID, but CID doesn't exist / node is offline, our PIN will stay stuck in "queued" status.
We should abandon the operation after a given time threshold has passed.

Update

This can be taken care of by the Pinning API in Elastic Provider, when it takes over from Cluster.

Impact

  • (Infra) Decrease load on Cluster, which translates to a decreased use of resources
  • (Biz) Reduce chances of an overwhelmed cluster in the near future
  • (User) If we land this, automatically clean up hanging requests translates to less housekeeping the user would have to do to "clean up" requests.

Acceptance Criteria

  • After a given time threshold giveUpThreshold, the Cluster should stop trying to get and pin a given CID, if there are no more recent PinningRequests for the same CID or Uploads
  • PinninRequests that were created before giveUpThreshold should report a failed status if there are no more recent PinningRequests.
  • PinninRequests that were created after giveUpThreshold should report their effective status, based on cluster state.
  • Ability to clean existing Pinning Requests.

Notes.

  • What happens if there's a pinning request for CID_A, which is "expired" but a chunked upload for the same CID_A exists. In this case, we might have 2 scenarios:
    • A chunk upload is in progress
    • A chunk upload is failed in practice
  • consider removing nonexistent CIDs from the content table.
  • The suggested threshold for giveUpThreshold, is 1 day. Could be even smaller, let's parametrise it for easy updating.
  • At the moment cluster could report failed transient states, I wonder if those shouldn't be reflected to psa statuses?
    We should consider never sending a failed status until threshold is reached.
@mbommerez mbommerez added kind/bug A bug in existing code (including security flaws) need/triage Needs initial labeling and prioritization labels Dec 17, 2021
@flea89 flea89 mentioned this issue Dec 17, 2021
@mbommerez mbommerez added the topic/pot Issues handled by PT. label Jan 20, 2022
@mbommerez
Copy link
Author

To be discussed with @alanshaw

@dchoi27 dchoi27 added P2 Medium: Good to have, but can wait until someone steps up stack/write-services and removed need/triage Needs initial labeling and prioritization labels Mar 21, 2022
@mbommerez
Copy link
Author

Discussed with @alanshaw @flea89 @francois-potato.

All things that cannot be pinned, will be added to a separate queue that keeps growing. In the meantime cluster will keep trying to pin it. This is not an immediate concern but in the future cluster might fall over if the queue grows too much.

We need to find a way for these CIDs to be dropped from cluster.

We need to define the threshold (i.e. after how long, not how many times tried). We also need to find a way to surface this information to the user - a sort of perma-failed status.

@flea89 flea89 changed the title Pinning non existent CID blocks request in queue Give up trying to pin after a given threshold May 11, 2022
@flea89 flea89 changed the title Give up trying to pin after a given threshold Give up pinning provided CID after a given threshold May 11, 2022
@mbommerez mbommerez changed the title Give up pinning provided CID after a given threshold Give up pinning provided CID after a given time threshold Jul 21, 2022
@mbommerez mbommerez changed the title Give up pinning provided CID after a given time threshold Give up trying to pin a CID after a given time threshold Jul 21, 2022
@mbommerez mbommerez removed pi/psa-follow-up stack/write-services kind/bug A bug in existing code (including security flaws) P2 Medium: Good to have, but can wait until someone steps up labels Jul 21, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
topic/pot Issues handled by PT. topic/psa
Projects
None yet
Development

No branches or pull requests

2 participants