What is a one sentence summary of your feature request?
We have not seen a built-in “retry on error” functionality. When an action fails, it is placed in a list that must be manually retried, increasing operational overhead.
Please describe your idea in detail. What is your problem, why do you feel this idea is the best solution, etc.
Currently, when an action fails, it is simply placed in an error list that must be reprocessed manually by the operations teams.
This lack of an automatic retry mechanism increases operational workload, raises the risk of oversight, and makes it more difficult to isolate recurring technical errors.
Problem / Pain point
No standard “retry on error” mechanism for failed actions
Multiplication of manual retry tasks
Difficulty in securing and stabilising processing, especially when too many actions are concentrated in a single orchestration flow
Increased risk of business incidents in case a retry is forgotten
Desired functional behaviour
Ability to define, per job or flow type, a configurable retry policy (number of attempts, delay between attempts, maximum retry duration).
Differentiated handling of technical errors (e.g. temporary unavailability of a target system) versus functional errors (e.g. invalid or inconsistent data).
Clear logging of all retry attempts (timestamp, status, cause of the initial error, result of each attempt).
Centralised view of actions currently being retried and of those that have definitively failed after all attempts have been exhausted.
Integration with existing monitoring and alerting mechanisms (e.g. alerts when retry thresholds are exceeded).
Behaviour aligned with the planned job redesign and segmentation of orchestration flows, so that technical errors are easier to isolate and handle.
How do you currently solve the challenges you have by not having this feature?
Manual actions, such as implementing an operational bulk‑retry script to reprocess resources in error and reduce the manual workload for operations teams.