-
Notifications
You must be signed in to change notification settings - Fork 14
Fallback Mechanism in Unified PanDA Queues
Unified PanDA Queues (also known as Grand-Unified-Queues (GUQs)) allow production and analysis activities to share the same PanDA queues. This enhances resource utilization by dynamically adjusting fairshare allocations between production and analysis tasks, which in turn benefits analysis activities by ensuring better access to computing resources, thereby increasing the efficiency and success rate of analysis jobs.
When a unified PanDA queue encounters an issue with its primary data writing endpoint (e.g., write_lan/0 being down), it can automatically fallback to an alternative endpoint, such as write_lan/1. This capability is particularly beneficial for queues utilizing a remote Rucio Storage Element (RSE) to store job output (e.g. VP queues).
The pilot dynamically selects the appropriate destination based on:
- Job Type:
- For analysis jobs, the Pilot considers write_lan_analysis and write_lan activities
- For production jobs, the default selection is write_lan
- Fallback Logic:
- If the preferred write_lan/0 endpoint is unavailable, the Pilot automatically tries write_lan/1
- This decision helps prevent failures due to temporary storage or network issues
One important restriction in the fallback mechanism is that the job.nucleus (the originally desired destination of the output data) is excluded from being considered as an alternative stage-out destination.
This exclusion is managed through the "Associated Pilot Copytools" configuration within the Computing Resource Information Catalog (CRIC). Each PanDA queue has its own settings in CRIC, defining which copy tools (data transfer utilities) can be used for job output transfer.
- Introduction
- Pilot Architecture
- Pilot Workflows
- Event service
- Metadata
- Direct Access
- Signal Handling
- Error Codes
- Containers
- Special Algorithms
- Pilot Configuration
- Timing Measurements
- Copy Tools
- Fallback Mechanism in Unified PanDA Queues
- Pilot release procedure