Skip to content

Error Handling


Introduction

This article describes how errors are handled in ProcessFactorial


Execution Engine

The Execution Engine uses a number of tiered retries based on the type of error that occurs.

Unrecoverable Errors

If the execution engine encounters any unrecoverable errors, it will disable that Factorial Flow or Trigger This will stop the Execution Engine from attempting to run a process that will never succeed.

Examples of unrecoverable errors are: - Missing or incorrect metadata on the target Data Store For example, attempting CRUD on a table or field that does not exist or are a different data type from what is configured in the Factorial Flow or Trigger - Authentication failed due to expired authentication keys, such as App Registration, within the connection strings. This includes connection strings targeting a Data Store, Azure Storage Account or MongoDB

The reason why a Factorial Flow or Trigger has been disabled can be seen within the ProcessFactorial Portal via the Environment screens

MongoDB 429s

MongoDB has built in capability to handle 429 (Too Many Requests) errors. We recommend that these be turned on in the first instance. This will help spread the load during bursts. These errors will not be reported as they are internal to MongoDB

For more information see:

MongoDB - Burst Capacity MongoDB - Server Side Retry

Platform Retries

If the platform still encounters errors after the 429 protection above, it will do an Exponential Backoff with Jitter, with a Short Circuit Pattern and Policy Wrapping

What this means is that the platform will automatically retry failed operations using a progressively increasing delay (exponential backoff), with a small random variation (jitter) to avoid overwhelming the service. If failures continue beyond a threshold, the circuit breaker opens, temporarily blocking listening to the Service Bus Queue to further protect the upstream systems. Once stability is detected, it resumes operation. These retry and circuit breaker policies are combined (wrapped) into a unified execution strategy to ensure resilience without overloading any component.

These errors will be written to App Insights and to the Process Execution Report

These errors should be monitored via App insights, especially if they are 429s. If the 429s are coming from MongoDB, then see MongoBD - Scaling Options

Errors covered:

  • Rate Limiting
  • Timeouts
  • Service Availability (5xx)
  • Network Errors (Sockets, DNS, Connection drops)
  • Transient Errors

This pattern happens for all communication between Azure Components. This include service to service communication, for example, when the Execution Engine (npobp_exec) talks to the Translation Layer - Dataverse (npobp_trdataverse) or if any services talk to any data store, whether the target Data Store, Azure Data Table or Azure Cosmos Mongo DB