Retry Operations
Retries enable workflows to automatically recover from transient technical failures by reattempting failed operations after a delay.
Error Types: Technical vs Business
It's important to distinguish between:
- Technical errors – Infrastructure-level problems like network timeouts or service unavailability. These are typically transient and can be retried.
- Business errors – Domain-specific conditions like invalid input or failed validations. These require explicit handling and should not be retried.
Use error handling for managing business errors.
How Retry Works
When the WorkflowInstance#wakeup
is called and the underlying operation fails:
- Without retry: the error is propagated to the caller.
- With retry: the error is swallowed, and a future wakeup is registered within
KnockerUpper
based on retry logic.
Retry Strategies
1. Simple Retry with Fixed Delay
val doSomething: WIO[Any, Nothing, MyState] = ???
val withRetry = doSomething
.retryIn {
case _: TimeoutException => Duration.ofMinutes(2)
case _: UnknownHostException => Duration.ofMinutes(15)
}
2. Advanced Retry with Custom Logic
val doSomething: WIO[Any, Nothing, MyState] = ???
val withRetry = doSomething
.retry { (error, state, now) =>
error match {
case _: TimeoutException => IO.pure(Some(now.plus(Duration.ofMinutes(2))))
case _ => IO.pure(None) // Don't retry other errors
}
}
Caveats
Retries Are Stateless
Currently, Workflow-level retries are stateless—they don’t track attempt counts, elapsed time or any other information about executed retries. This means you can't directly express rules like:
- "Retry at most 5 times"
- "Stop after 1 day"
- "Increase backoff time exponentially"
To support such logic, you can:
- Use custom persistent state to track retry metadata and query it in the retry handler.
- Manually clear the scheduled wakeup time via the
KnockerUpper
to stop future retries.
If you see it as a major limitation, please reach out.
Choose the Right Layer
Use workflow-level retries for retry schedules spanning minutes to hours or days
For short-lived retries (e.g., retrying within milliseconds or seconds), prefer handling them directly inside the IO
operation using libraries like cats-retry
.