Error Handling
permalink: customerhostederrorhandling
Introduction
This article describes how errors are handled in ProcessFactorial
Execution Engine
The Execution Engine uses a number of tiered retries based on the type of error that occurs.
Unrecoverable Errors
If the execution engine encounters any unrecoverable errors, it will disable that Factorial Flow or Trigger This will stop the Execution Engine from attempting to run a process that will never succeed.
Examples of unrecoverable errors are: - Missing or incorrect metadata on the target Data Store For example, attempting CRUD on a table or field that does not exist or are a different data type from what is configured in the Factorial Flow or Trigger - Authentication failed due to expired authentication keys, such as App Registration, within the connection strings. This includes connection strings targeting a Data Store, Azure Storage Account or MongoDB
The reason why a Factorial Flow or Trigger has been disabled can be seen within the ProcessFactorial Portal via the Environment screens
MongoDB 429s
MongoDB has built in capability to handle 429 (Too Many Requests) errors. We recommend that these be turned on in the first instance. This will help spread the load during bursts. These errors will not be reported as they are internal to MongoDB
For more information see:
MongoDB - Burst Capacity MongoDB - Server Side Retry
Platform Retries
If the platform still encounters errors after the 429 protection above, it will do an Exponential Backoff with Jitter, with a Short Circuit Pattern and Policy Wrapping
What this means is that the platform will automatically retry failed operations using a progressively increasing delay (exponential backoff), with a small random variation (jitter) to avoid overwhelming the service. If failures continue beyond a threshold, the circuit breaker opens, temporarily blocking listening to the Service Bus Queue to further protect the upstream systems. Once stability is detected, it resumes operation. These retry and circuit breaker policies are combined (wrapped) into a unified execution strategy to ensure resilience without overloading any component.
These errors will be written to App Insights and to the Process Execution Report
These errors should be monitored via App insights, especially if they are 429s. If the 429s are coming from MongoDB, then see MongoBD - Scaling Options
Errors covered:
- Rate Limiting
- Timeouts
- Service Availability (5xx)
- Network Errors (Sockets, DNS, Connection drops)
- Transient Errors
This pattern happens for all communication between Azure Components. This include service to service communication, for example, when the Execution Engine (npobp_exec) talks to the Translation Layer - Dataverse (npobp_trdataverse) or if any services talk to any data store, whether the target Data Store, Azure Data Table or Azure Cosmos Mongo DB