Retrying Operations

Retries enable workflows to automatically recover from transient technical failures by reattempting failed operations after a delay.

Error Types: Technical vs Business

It's important to distinguish between:

Technical errors – Infrastructure-level problems like network timeouts or service unavailability. These are typically transient and can be retried.
Business errors – Domain-specific conditions like invalid input or failed validations. These require explicit handling and should not be retried.

Use error handling for managing business errors.

How Retry Works

When the WorkflowInstance#wakeup is called and the underlying operation fails:

Without retry: the error is propagated to the caller.
With retry: the error is swallowed, and a future wakeup is registered within KnockerUpper based on retry logic.

Retry Strategies

1. Simple Retry with Fixed Delay

val doSomething: WIO[Any, Nothing, MyState] = WIO.pure(MyState(1)).autoNamed

val withRetry = doSomething
  .retryIn {
    case _: TimeoutException     => Duration.ofMinutes(2)
    case _: UnknownHostException => Duration.ofMinutes(15)
  }

Rendering Outputs

Flowchart
BPMN
Model

{
  "base" : {
    "meta" : {
      "name" : "Do Something",
      "error" : null
    },
    "_type" : "Pure"
  },
  "_type" : "Retried"
}

2. Advanced Retry with Custom Logic

val doSomethingFull: WIO[Any, Nothing, MyState] = WIO.pure(MyState(1)).autoNamed

val withRetryFull = doSomethingFull
  .retry { (error, state, now) =>
    error match {
      case _: TimeoutException => IO.pure(Some(now.plus(Duration.ofMinutes(2))))
      case _                   => IO.pure(None) // Don't retry other errors
    }
  }

Caveats

Retries Are Stateless

Currently, Workflow-level retries are stateless—they don’t track attempt counts, elapsed time or any other information about executed retries. This means you can't directly express rules like:

"Retry at most 5 times"
"Stop after 1 day"
"Increase backoff time exponentially"

To support such logic, you can:

Use custom persistent state to track retry metadata and query it in the retry handler.
Manually clear the scheduled wakeup time via the KnockerUpper to stop future retries.

If you see it as a major limitation, please reach out.

Choose the Right Layer

Use workflow-level retries for retry schedules spanning minutes to hours or days

For short-lived retries (e.g., retrying within milliseconds or seconds), prefer handling them directly inside the IO operation using libraries like cats-retry.

Error Types: Technical vs Business​

How Retry Works​

Retry Strategies​

1. Simple Retry with Fixed Delay​

2. Advanced Retry with Custom Logic​

Caveats​

Retries Are Stateless​

Choose the Right Layer​