| Internet-Draft | Async Job Problem Details | February 2026 |
| Ratnawat | Expires 30 August 2026 | [Page] |
HTTP APIs that process work asynchronously need a standard way to report job failures. "Problem Details for HTTP APIs" (RFC 9457) provides the envelope; this document defines extension members that fill it with asynchronous-job-specific context.¶
Eight extension members are specified: "jobId", "jobStatus", "submittedAt", "completedAt", "retryable", "retryAfter", "processingStage", and "correlationId". A ninth member, "results", supports batch operations. Together they let a server describe which job failed, when, at what pipeline stage, and whether a retry is advisable -- in a single, machine-readable JSON (RFC 8259) object that works equally well in an HTTP response body, a message-broker payload, or a webhook callback.¶
Although the primary motivation is structured error reporting for failed jobs, the extension members are equally useful for communicating successful job outcomes (e.g., a COMPLETED status with timing information).¶
This document does NOT redefine how to submit, poll, or cancel asynchronous jobs; those mechanics are already covered by "HTTP Semantics" (RFC 9110) (202 Accepted), "Prefer Header for HTTP" (RFC 7240) (respond-async), and emerging IETF work on long-running operations. Instead, it focuses exclusively on the structured reporting gap that remains after a job reaches a terminal state.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 30 August 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Asynchronous job processing is a pervasive pattern in HTTP APIs. When a server cannot fulfil a request within a single request-response cycle, it accepts the work, processes it in the background, and reports the outcome later.¶
The IETF has standardized the mechanics of this pattern well:¶
What has NOT been standardized is the structure of error reports when asynchronous jobs fail.¶
[RFC9457] provides a general-purpose envelope for HTTP API errors ("Problem Details"), including the ability to define extension members for domain-specific context. However, no specification defines reusable extension members for the asynchronous job domain -- leaving every API to invent its own.¶
This document defines exactly those extension members.¶
The design principle is transport independence: the same Problem Details object -- with the same extension members -- can appear in an HTTP response, a Kafka message, a webhook POST body, or a Server-Sent Event. This is critical because asynchronous job results often travel through non-HTTP transports where HTTP headers like "Retry-After" are unavailable.¶
To avoid duplicating existing and emerging IETF work, this document explicitly does NOT define:¶
This document fills a single, specific gap: structured error context for async job failures, expressed as [RFC9457] extension members.¶
The following table maps each aspect of the asynchronous job lifecycle to the standard that covers it, and identifies where this document contributes.¶
| Lifecycle Aspect | Covered By | This Document |
|---|---|---|
| Accepting a job | RFC 9110 (202) | No (uses as-is) |
| Client async preference | RFC 7240 | No (uses as-is) |
| Status polling link | RFC 8288 | No (uses as-is) |
| Idempotent submission | I-D.ietf-httpapi-idempotency-key | No (uses as-is) |
| Rate-limited polling | I-D.ietf-httpapi-ratelimit-headers | No (uses as-is) |
| Error envelope format | RFC 9457 | No (extends) |
| Job failure context | (none) | YES — defines extension members |
| Transport-independent retry semantics | (none) | YES — defines retryable, retryAfter |
| Batch partial failure structure | (none) | YES — defines results array |
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document uses the terms "extension member", "problem type", and "problem details object" as defined in [RFC9457].¶
JSON [RFC8259] is used for all examples and the normative schema. Whitespace in JSON examples is insignificant and is included for readability only.¶
[RFC9457] defines five standard members for problem details objects: "type", "title", "status", "detail", and "instance". It explicitly allows extension members for domain-specific data (Section 3.2 of [RFC9457]).¶
However, [RFC9457] does not define any reusable extension members. Each API defines its own, leading to fragmentation. For the asynchronous job domain -- one of the most common API patterns -- this fragmentation is particularly costly because:¶
A key challenge specific to asynchronous jobs is that the failure report may not travel over HTTP at all. Consider:¶
The "Retry-After" header field ([RFC9110] Section 10.2.3) is transport-bound: it only exists in HTTP responses. When a job failure must be communicated via Kafka, SSE, gRPC, or a webhook, the retry signal is lost.¶
This document solves this by defining "retryable" and "retryAfter" as JSON members inside the Problem Details object -- making them available regardless of transport.¶
The following table surveys how four major platforms represent async job failures today. No two use the same field names.¶
| Concept | AWS Step Functions | Azure LRO | Google AIP-151 | Stripe |
|---|---|---|---|---|
| Job identifier | executionArn | id | name | id |
| Status field | status | status | done + error | status |
| Status values | RUNNING, SUCCEEDED, FAILED, TIMED_OUT, ABORTED | Succeeded, Failed, Cancelled | done: true/false | pending, succeeded, failed |
| Submission time | startDate | createdDateTime | metadata.createTime | created |
| Completion time | stopDate | lastUpdatedDateTime | metadata.endTime | N/A |
| Error structure | error, cause (strings) | error: {code, message} | google.rpc.Status | error: {type, message} |
| Retry guidance | N/A | retryAfter | N/A | N/A |
| Failure stage | N/A | N/A | N/A | N/A |
| Error format | Custom JSON | Custom JSON | google.rpc.Status | Custom JSON |
Key observations:¶
This section defines eight extension members for [RFC9457] problem details objects. They are organized into four groups:¶
All extension members are OPTIONAL in any given problem details object. A server MAY include any subset of them. A client MUST NOT assume any particular member is present and MUST gracefully handle its absence.¶
When present, these members MUST appear at the top level of the problem details JSON object, alongside the standard [RFC9457] members ("type", "title", "status", "detail", "instance").¶
Although defined as [RFC9457] extension members, these JSON member names and semantics MAY also be used in non-problem-details JSON objects (e.g., successful job status responses). When used outside a problem details context, the [RFC9457] standard members ("type", "title", "status", "detail", "instance") are not required. However, when reporting failures, "application/problem+json" SHOULD be used per [RFC9457].¶
In all non-HTTP transport contexts (message brokers, webhooks, SSE), the "status" member represents the HTTP status code that WOULD have been used if the failure had been reported in a synchronous HTTP response. It is not the HTTP status code of the delivery mechanism.¶
To provide interoperability guidance while preserving flexibility, this document defines two conformance levels:¶
Individual member definitions below use "RECOMMENDED" to indicate members that SHOULD be present for Full Conformance. This is not a contradiction with the OPTIONAL preamble above: all members remain OPTIONAL in the [RFC9457] sense, but Full Conformance requires a specific subset.¶
A string that uniquely identifies the asynchronous job.¶
Example:¶
"jobId": "550e8400-e29b-41d4-a716-446655440000"¶
A string indicating the current state of the job. Registered values are defined in Section 4.¶
Example:¶
"jobStatus": "FAILED"¶
An [RFC3339] timestamp indicating when the server accepted the job for processing.¶
Example:¶
"submittedAt": "2026-02-26T10:00:00Z"¶
An [RFC3339] timestamp indicating when the job reached a terminal state.¶
Example:¶
"completedAt": "2026-02-26T10:00:05Z"¶
A boolean indicating whether the client SHOULD retry the job submission with the same input.¶
Semantics:¶
Example:¶
"retryable": true¶
A non-negative integer indicating the number of seconds the client SHOULD wait before retrying the job submission.¶
Constraints:¶
Example:¶
"retryAfter": 30¶
A string identifying the stage in the server's processing pipeline at which the job failed or completed.¶
Recommended stage names (servers MAY define others):¶
| Stage | Description |
|---|---|
| validation | Input validation against schema or business rules failed. |
| authorization | The job's security context was insufficient for the requested operation. |
| queuing | The job could not be placed onto the processing queue (broker unavailable, queue full). |
| processing | General processing failure (use more specific stages when possible). |
| rendering | Template or document rendering failed. |
| conversion | Format conversion failed (e.g., HTML to PDF, image transcoding). |
| storage | The result could not be persisted (disk, object store, database). |
| delivery | The result could not be delivered to the callback URL or output channel. |
Example:¶
"processingStage": "rendering"¶
A string containing a client-supplied identifier from the originating request, used for distributed tracing.¶
Example:¶
"correlationId": "4bf92f3577b34da6a3ce929d0e0e4736"¶
| Status Value | Description |
|---|---|
| ACCEPTED | The job has been accepted but processing has not yet started. |
| PROCESSING | The job is actively being processed. |
| COMPLETED | The job finished successfully; the result is available. |
| FAILED | The job encountered an error. The "detail" member SHOULD describe the failure. |
| CANCELLED | The job was cancelled before completion. |
| TIMED_OUT | The job exceeded the server's maximum allowed processing duration. |
| COMPLETED_WITH_ERRORS | (Batch only) Some items succeeded and some failed. See Section 7. |
Terminal states: COMPLETED, FAILED, CANCELLED, TIMED_OUT, COMPLETED_WITH_ERRORS. Once a job reaches a terminal state, its "jobStatus" MUST NOT change.¶
Non-terminal states: ACCEPTED, PROCESSING.¶
Servers MAY define additional status values for their domain (e.g., "VALIDATING", "RENDERING", "AWAITING_APPROVAL").¶
To reduce the risk of interoperability issues from typos or inconsistent naming, server-defined status values SHOULD use UPPER_SNAKE_CASE to match the registered values, and SHOULD be documented in the API specification.¶
Clients that encounter an unrecognized status value SHOULD treat it as non-terminal (equivalent to "PROCESSING") unless the value is documented by the API as terminal.¶
A core design goal of this specification is that the extension members work identically regardless of how the problem details object reaches the consumer. This section provides guidance for four common transports.¶
This is the primary context anticipated by [RFC9457]. The problem details object appears in the body of an HTTP response with Content-Type "application/problem+json".¶
When used in HTTP responses, servers SHOULD also set:¶
Example context: A client polls a job status resource and receives a failure report.¶
When a job result is published to a message broker (e.g., Apache Kafka, RabbitMQ, Amazon SQS), the problem details object is serialized as the message value.¶
In this context:¶
Example context: A PDF generation service publishes a failure to a Kafka result topic. The consumer reads the Problem Details object from the message value.¶
When a server delivers a job result by POSTing to a client-supplied callback URL, the request body SHOULD be the problem details object with Content-Type "application/problem+json".¶
In this context:¶
When a server pushes job status updates via Server-Sent Events (SSE) [W3C.SSE], the problem details object is serialized as the "data" field of an event.¶
The "event" field of the SSE SHOULD be "job-failed" (or a similar descriptive type).¶
Example:¶
event: job-failed
data: {"type":"https://api.example.com/problems/rendering-failed",
data: "title":"Document Rendering Failed","status":500,
data: "jobId":"550e8400","jobStatus":"FAILED",
data: "processingStage":"rendering"}¶
This section documents how the extension members interact with each referenced RFC, to help implementers compose them correctly.¶
This document extends [RFC9457] by defining reusable extension members per Section 3.2 of that specification. All standard [RFC9457] members ("type", "title", "status", "detail", "instance") retain their original semantics.¶
Notably:¶
[RFC9110] Section 15.3.3 defines 202 Accepted as indicating that the request has been accepted for processing but the processing has not been completed. It notes that there is no facility in HTTP for re-connecting to an asynchronous operation later.¶
This document does not change the semantics of 202. The extension members defined here apply to the failure report, not to the acceptance response.¶
[RFC7240] Section 4.1 defines the "respond-async" preference token. This document does not change its semantics.¶
When a server honors "respond-async" and later the job fails, the failure report (returned from the job status resource or delivered via another transport) SHOULD include the extension members defined in this document.¶
Note: The extension members defined here are useful regardless of whether the originating request included "Prefer: respond-async". A server that always processes requests asynchronously (without requiring the Prefer header) SHOULD still use these extension members when reporting job outcomes.¶
If an API documents its async job failure schemas using AsyncAPI [ASYNCAPI] specifications (commonly authored in YAML), the specification file SHOULD be served with Content-Type "application/yaml" per [RFC9512] when exposed via HTTP.¶
Servers that return 202 Accepted SHOULD include a "Link" header with a relation type that points to the job status resource. Servers MAY use the registered "status" relation type [RFC8631] or a URI-based extension relation type as described in Section 2.1.2 of [RFC8288].¶
When a completed or failed job's status response includes a link to the result resource, the server SHOULD use a URI-based extension relation type as described in Section 2.1.2 of [RFC8288] (e.g., "https://api.example.com/rels/job-result") or an existing registered relation type such as "related" [IANA.LINK-RELATIONS]. Servers MUST NOT use unregistered short-form relation types.¶
When job failure reports are delivered via webhooks, servers MAY sign the webhook request using [RFC9421] to allow the recipient to verify the report's authenticity and integrity.¶
When job results are delivered via message brokers, [RFC9421] does not apply (it is HTTP-specific). Broker-level security mechanisms (e.g., TLS, SASL) SHOULD be used instead.¶
[RFC6585] defines 429 Too Many Requests. This applies to the job status polling endpoint, NOT to job failure reports.¶
If a client polls too frequently, the server MAY return 429 with a "Retry-After" header. This is distinct from the "retryAfter" extension member, which governs job resubmission timing.¶
Implementers MUST NOT confuse these two retry signals:¶
The "jobId" member SHOULD be a UUID per [RFC9562] to ensure global uniqueness and unguessability. UUIDv7 is RECOMMENDED for new implementations because it embeds a timestamp, enabling natural chronological ordering of jobs.¶
When a job fails at the "validation" stage, the "detail" member MAY reference specific input fields using JSON Pointer [RFC6901] syntax (e.g., "Field /templateData/customerName is required").¶
For richer validation error reporting, implementers MAY combine these extension members with validation-specific extensions (such as a "violations" array using JSON Pointer field references), but defining such extensions is outside the scope of this document.¶
When a client resubmits a failed job (guided by "retryable": true), it SHOULD include an "Idempotency-Key" header per [I-D.ietf-httpapi-idempotency-key-header] to prevent duplicate processing if the resubmission is received more than once.¶
The "jobId" from the failed job SHOULD NOT be reused as the idempotency key, because the retry is a new job submission.¶
The RateLimit header fields defined in [I-D.ietf-httpapi-ratelimit-headers] apply to HTTP request rates for any endpoint, including job status polling endpoints.¶
The "retryAfter" extension member defined in this document applies to job resubmission intervals. These are orthogonal concerns.¶
For batch operations that produce multiple outputs, this document defines the "results" extension member: an array of per-item outcome objects.¶
Each object in the array MUST contain:¶
Each object MAY additionally contain:¶
For large batches (hundreds or thousands of items), including every item in the "results" array may produce an impractically large payload. Servers SHOULD apply the following strategies:¶
Define a maximum number of items in "results" (documented in the API specification) and provide a link to a paginated resource for the complete result set. Example:¶
"detail": "150 of 10000 items failed. First 50 shown.", "results": [ "... 50 items ..." ]¶
With a "Link" header or body link to the full result set.¶
This document does not define a pagination mechanism for "results"; pagination is left to the API specification.¶
When a batch contains both successes and failures:¶
The following JSON Schema [JSON-SCHEMA] defines the extension members introduced by this document. It is designed to be composed (via "allOf" or "$ref") with the Problem Details schema from Appendix A of [RFC9457].¶
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$id": "https://example.com/schemas/async-job-problem-details",
"title": "Async Job Problem Details Extensions",
"description": "Extension members for RFC 9457 Problem Details objects that describe asynchronous job outcomes.",
"type": "object",
"properties": {
"jobId": {
"type": "string",
"description": "Unique identifier for the async job.",
"examples": ["550e8400-e29b-41d4-a716-446655440000"]
},
"jobStatus": {
"type": "string",
"description": "Current state of the async job. Registered values are ACCEPTED, PROCESSING, COMPLETED, FAILED, CANCELLED, TIMED_OUT, and COMPLETED_WITH_ERRORS. Servers MAY define additional values per Section 4.3.",
"examples": ["FAILED", "COMPLETED", "TIMED_OUT"]
},
"submittedAt": {
"type": "string",
"format": "date-time",
"description": "RFC 3339 timestamp (UTC) when the job was accepted for processing.",
"examples": ["2026-02-26T10:00:00Z"]
},
"completedAt": {
"type": "string",
"format": "date-time",
"description": "RFC 3339 timestamp (UTC) when the job reached a terminal state.",
"examples": ["2026-02-26T10:00:05Z"]
},
"retryable": {
"type": "boolean",
"default": false,
"description": "Whether the client should retry the job with the same input."
},
"retryAfter": {
"type": "integer",
"minimum": 0,
"description": "Seconds to wait before retrying. Only meaningful when retryable is true."
},
"processingStage": {
"type": "string",
"description": "The pipeline stage at which the failure occurred.",
"examples": ["validation", "rendering", "conversion"]
},
"correlationId": {
"type": "string",
"description": "Client-supplied correlation identifier from the originating request.",
"examples": ["4bf92f3577b34da6a3ce929d0e0e4736"]
},
"results": {
"type": "array",
"description": "Per-item outcomes for batch operations.",
"items": {
"type": "object",
"required": ["itemId", "status"],
"properties": {
"itemId": {
"type": "string",
"description": "Identifier for the batch item."
},
"status": {
"type": "string",
"enum": [
"COMPLETED",
"FAILED",
"CANCELLED",
"TIMED_OUT"
],
"description": "Terminal outcome for this item."
},
"detail": {
"type": "string",
"description": "Human-readable outcome description."
},
"retryable": {
"type": "boolean",
"description": "Whether this item can be retried."
},
"processingStage": {
"type": "string",
"description": "Stage at which this item failed."
}
},
"additionalProperties": true
}
}
},
"additionalProperties": true
}¶
Note: The "jobStatus" property does not use a JSON Schema "enum" constraint because Section 4.3 allows servers to define additional status values. Validators that wish to restrict values to the registered set MAY add an "enum" constraint locally, but SHOULD accept unrecognized values gracefully in production.¶
The extension members defined here can reveal internal architecture details. Servers MUST consider what information is appropriate for their audience:¶
The "correlationId" is client-supplied and MUST be treated as untrusted input.¶
If job identifiers are sequential or predictable, an attacker can enumerate job status resources belonging to other clients.¶
A malicious or compromised server could return "retryable": true with "retryAfter": 0 to induce clients to retry in a tight loop, creating a self-inflicted denial of service.¶
Clients MUST enforce:¶
The "submittedAt" and "completedAt" timestamps reveal processing duration. In sensitive contexts, this could leak information about:¶
Servers operating in high-security environments MAY omit these members or round them to reduce precision.¶
In multi-tenant systems where a batch may contain items belonging to different authorization domains, the "results" array MUST only include items that the requesting client is authorized to view. Servers MUST NOT leak information about other tenants' items through the "results" array, the "detail" summary, or the item count.¶
The extension members defined here may constitute or contain personally identifiable information (PII):¶
Servers SHOULD evaluate the privacy implications of including these members in responses that may be logged, cached, or forwarded through intermediaries. In jurisdictions with data protection regulations (e.g., GDPR, CCPA), operators SHOULD ensure that problem details objects containing PII are treated as personal data for retention and access control purposes.¶
Servers SHOULD NOT include these extension members in responses that are cacheable or served to unauthenticated clients unless the information is already public.¶
This document defines extension members for [RFC9457] problem details objects per the extension mechanism in Section 3.2 of [RFC9457]. As [RFC9457] does not establish a registry for extension members, the members defined in this document ("jobId", "jobStatus", "submittedAt", "completedAt", "retryable", "retryAfter", "processingStage", "correlationId", and "results") are specified here and do not require IANA registration.¶
Implementers that define additional extension members for async job problem details SHOULD choose names that do not conflict with the members defined in this document.¶
This document requests that IANA create a "Problem Details Async Job Status Values" registry with the initial entries from Table 4 (Section 4.1).¶
Registration policy: Specification Required [RFC8126].¶
Each registration MUST include:¶
The designated expert(s) SHOULD verify that proposed values use UPPER_SNAKE_CASE, do not duplicate the semantics of existing registered values, and are clearly documented as either terminal or non-terminal.¶
This document does not register any problem type URIs. The "type" member in the examples uses illustrative URIs under the "api.example.com" domain.¶
Servers SHOULD define their own problem type URIs under a domain they control. When async job extension members are present, the "type" member SHOULD NOT be "about:blank"; a specific URI helps consumers distinguish async job failures from generic HTTP errors. See [RFC9457] Section 4.2.1 for guidance on using "about:blank".¶
All examples are complete problem details objects that validate against the schema in Section 8.¶
A document generation API processed a PDF request asynchronously. The rendering stage failed. The client discovers this by polling the job status resource.¶
HTTP context:¶
GET /api/v1/documents/jobs/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1 Host: api.example.com HTTP/1.1 200 OK Content-Type: application/problem+json¶
{
"type": "https://api.example.com/problems/rendering-failed",
"title": "Document Rendering Failed",
"status": 500,
"detail": "Template 'invoice-v2' contains an unclosed element at line 87",
"instance": "/api/v1/documents/jobs/550e8400-e29b-41d4-a716-446655440000",
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"jobStatus": "FAILED",
"submittedAt": "2026-02-26T10:00:00Z",
"completedAt": "2026-02-26T10:00:03Z",
"retryable": false,
"processingStage": "rendering",
"correlationId": "4bf92f3577b34da6a3ce929d0e0e4736"
}¶
Note: The HTTP status is 200 (the status check succeeded). The "status": 500 inside the object indicates the equivalent synchronous failure code. This distinction is intentional: the client's request to check the job status was successful (HTTP 200); the job itself failed (Problem Details status 500). Client libraries that interpret "application/problem+json" as an error signal SHOULD inspect the HTTP status code first; a 200 response with Problem Details content indicates a successfully retrieved failure report, not a request failure.¶
A report generation job exceeded the 300-second limit. The server indicates the failure is transient and suggests retrying after 60 seconds.¶
{
"type": "https://api.example.com/problems/job-timed-out",
"title": "Job Processing Timed Out",
"status": 504,
"detail": "Job exceeded maximum processing time of 300s",
"jobId": "7c9e6679-7425-40de-944b-e07fc1f90ae7",
"jobStatus": "TIMED_OUT",
"submittedAt": "2026-02-26T09:00:00Z",
"completedAt": "2026-02-26T09:05:00Z",
"retryable": true,
"retryAfter": 60,
"processingStage": "processing"
}¶
A PDF generation service publishes a failure to a Kafka result topic. No HTTP headers are available; the "retryAfter" member (absent here because retryable is false) would be the sole mechanism for conveying retry guidance if present.¶
Kafka message:¶
Topic: com.example.pdf-job.result.v1 Key: d4735e3a-265e-16d0-8f24-2de10e933e80 Headers: content-type=application/problem+json Value:¶
{
"type": "https://api.example.com/problems/conversion-failed",
"title": "PDF Conversion Failed",
"status": 502,
"detail": "iText HTML-to-PDF conversion failed: malformed CSS at line 15",
"jobId": "d4735e3a-265e-16d0-8f24-2de10e933e80",
"jobStatus": "FAILED",
"submittedAt": "2026-02-26T16:30:00Z",
"completedAt": "2026-02-26T16:30:02Z",
"retryable": false,
"processingStage": "conversion",
"correlationId": "monthly-report-2026-02"
}¶
Note: The "retryAfter" member is absent because the failure is not retryable. The "status": 502 indicates the equivalent HTTP status; it is not an HTTP response code in this context.¶
A data export service delivers a batch result via webhook.¶
POST /webhooks/export-results HTTP/1.1
Host: client.example.com
Content-Type: application/problem+json
Signature: sig1=:BASE64SIGNATURE:
Signature-Input: sig1=("content-type" "content-digest");keyid="server-key-1";created=1740567600¶
{
"type": "https://api.example.com/problems/batch-partial",
"title": "Data Export Partially Failed",
"status": 207,
"detail": "8 of 10 records exported successfully",
"jobId": "export-batch-20260226",
"jobStatus": "COMPLETED_WITH_ERRORS",
"submittedAt": "2026-02-26T12:00:00Z",
"completedAt": "2026-02-26T12:15:00Z",
"results": [
{
"itemId": "rec-007",
"status": "FAILED",
"detail": "Record exceeds maximum size (10MB)",
"retryable": false
},
{
"itemId": "rec-009",
"status": "FAILED",
"detail": "Downstream storage timeout",
"retryable": true,
"processingStage": "storage"
}
]
}¶
Note: Only failed items are included in "results" to reduce payload size. The webhook request itself uses HTTP Message Signatures [RFC9421] for authenticity.¶
A Server-Sent Event stream delivers a failure notification.¶
event: job-failed
id: 550e8400-e29b-41d4-a716-446655440000
data: {"type":"https://api.example.com/problems/rendering-failed",
data: "title":"Document Rendering Failed","status":500,
data: "jobId":"550e8400","jobStatus":"FAILED",
data: "submittedAt":"2026-02-26T10:00:00Z",
data: "completedAt":"2026-02-26T10:00:03Z",
data: "retryable":false,"processingStage":"rendering"}¶
Three certificate generation requests; two succeed, one fails.¶
{
"type": "https://api.example.com/problems/batch-partial",
"title": "Batch Processing Partially Failed",
"status": 207,
"detail": "2 of 3 certificates generated successfully",
"jobId": "batch-a1b2c3d4",
"jobStatus": "COMPLETED_WITH_ERRORS",
"submittedAt": "2026-02-26T14:00:00Z",
"completedAt": "2026-02-26T14:00:12Z",
"results": [
{
"itemId": "cert-001",
"status": "COMPLETED"
},
{
"itemId": "cert-002",
"status": "FAILED",
"detail": "Required field 'recipientName' missing",
"retryable": false,
"processingStage": "validation"
},
{
"itemId": "cert-003",
"status": "COMPLETED"
}
]
}¶
The extension members are not limited to failure reporting. A server MAY include them in a successful job status response to provide timing and identification context.¶
GET /api/v1/documents/jobs/a1b2c3d4-5678-90ab-cdef-1234567890ab HTTP/1.1
Host: api.example.com
HTTP/1.1 200 OK
Content-Type: application/json
Link: <https://api.example.com/api/v1/documents/results/a1b2c3d4>;
rel="https://api.example.com/rels/job-result"¶
{
"jobId": "a1b2c3d4-5678-90ab-cdef-1234567890ab",
"jobStatus": "COMPLETED",
"submittedAt": "2026-02-26T10:00:00Z",
"completedAt": "2026-02-26T10:00:04Z",
"correlationId": "invoice-batch-2026-02-26"
}¶
Note: The Content-Type is "application/json", not "application/problem+json", because this is not an error response. The extension members defined in this document are JSON members that MAY appear in any JSON object; they are not restricted to [RFC9457] problem details objects. However, when reporting failures, "application/problem+json" SHOULD be used per [RFC9457].¶
A report generation service publishes a transient failure to a Kafka result topic, demonstrating the transport-independence value of the "retryAfter" member.¶
Kafka message:¶
Topic: com.example.report-job.result.v1 Key: e5f6a7b8-9012-3456-7890-abcdef123456 Headers: content-type=application/problem+json Value:¶
{
"type": "https://api.example.com/problems/downstream-unavailable",
"title": "Downstream Service Temporarily Unavailable",
"status": 503,
"detail": "Data warehouse connection pool exhausted; service is expected to recover within 60 seconds",
"jobId": "e5f6a7b8-9012-3456-7890-abcdef123456",
"jobStatus": "FAILED",
"submittedAt": "2026-02-26T18:00:00Z",
"completedAt": "2026-02-26T18:00:01Z",
"retryable": true,
"retryAfter": 60,
"processingStage": "processing",
"correlationId": "weekly-report-2026-w09"
}¶
Note: In an HTTP context, the server would also include a "Retry-After: 60" header. In this Kafka context, the "retryAfter" member is the only way to convey the retry interval to the consumer.¶
This appendix provides an expanded comparison of async job error reporting across platforms, illustrating the fragmentation that this specification addresses.¶
AWS Step Functions represent failures as:¶
{
"executionArn": "arn:aws:states:...",
"status": "FAILED",
"error": "States.TaskFailed",
"cause": "Lambda function threw an exception",
"startDate": "2026-02-26T10:00:00.000Z",
"stopDate": "2026-02-26T10:00:05.000Z"
}¶
Mapping to this specification: "executionArn" → "jobId", "status" → "jobStatus", "error"+"cause" → "detail", "startDate" → "submittedAt", "stopDate" → "completedAt". No retry guidance. No processing stage. Not RFC 9457.¶
Google uses a Protobuf-based "Operation" resource:¶
{
"name": "operations/abc123",
"done": true,
"error": {
"code": 3,
"message": "Invalid template",
"details": []
}
}¶
Mapping: "name" → "jobId", "done" → terminal state check, "error" → google.rpc.Status (not RFC 9457). No "submittedAt"/"completedAt". No retry guidance. No processing stage.¶
Azure uses a polling pattern with:¶
{
"id": "job-xyz",
"status": "Failed",
"error": {
"code": "RenderingFailed",
"message": "Template syntax error at line 42"
},
"startTime": "2026-02-26T10:00:00Z",
"endTime": "2026-02-26T10:00:05Z"
}¶
Mapping: "id" → "jobId", "status" → "jobStatus", "error" → "detail", "startTime" → "submittedAt", "endTime" → "completedAt". "retryAfter" sometimes in HTTP header only. No processing stage. Not RFC 9457.¶
Stripe uses:¶
{
"id": "pi_abc123",
"status": "failed",
"last_payment_error": {
"type": "card_error",
"message": "Your card was declined"
},
"created": 1740567600
}¶
Mapping: "id" → "jobId", "status" → "jobStatus", "last_payment_error" → "detail", "created" → "submittedAt". No completion time. No retry guidance. No processing stage. Unix timestamp, not RFC 3339. Not RFC 9457.¶
Defining a new media type (e.g., "application/async-job-problem+json") was considered and rejected because:¶
The "Retry-After" HTTP header field [RFC9110] already exists. The "retryAfter" JSON member was added because:¶
The presence of "retryAfter" alone could imply retryability. A separate boolean was added because:¶
Processing pipelines vary too widely across APIs to define a fixed enum. A free-form string with recommended values (Table 3) provides flexibility while encouraging consistency through convention.¶
The W3C Trace Context specification [W3C.TRACE-CONTEXT] is the de facto standard for distributed tracing headers. Referencing it provides interoperability with OpenTelemetry, Jaeger, Zipkin, and other tracing systems without requiring this specification to define its own tracing format.¶
APIs that use AsyncAPI [ASYNCAPI] to document message-driven interfaces can reference this specification in their schemas.¶
Example AsyncAPI schema fragment:¶
components:
schemas:
PdfJobFailedEvent:
description: >
PDF generation failure report per
draft-ratnawat-httpapi-async-problem-details.
allOf:
- $ref: '#/components/schemas/ProblemDetails'
- type: object
required: [jobId, jobStatus, submittedAt]
properties:
jobId:
type: string
format: uuid
jobStatus:
type: string
enum: [FAILED, TIMED_OUT, CANCELLED]
submittedAt:
type: string
format: date-time
completedAt:
type: string
format: date-time
retryable:
type: boolean
retryAfter:
type: integer
minimum: 0
processingStage:
type: string
correlationId:
type: string¶
CloudEvents [CLOUDEVENTS] is a CNCF specification for describing events in a standard way. It defines context attributes including "id", "source", "type", and "subject" that overlap conceptually with some members defined here:¶
The specifications are complementary, not competing:¶
When using CloudEvents as the transport envelope for async job failure notifications, the problem details object (with the extension members defined here) SHOULD be the CloudEvents "data" payload. The CloudEvents "datacontenttype" SHOULD be "application/problem+json".¶
Example CloudEvents + Problem Details:¶
{
"specversion": "1.0",
"id": "evt-550e8400",
"source": "/api/v1/documents/generate",
"type": "com.example.job.failed",
"datacontenttype": "application/problem+json",
"data": {
"type": "https://api.example.com/problems/rendering-failed",
"title": "Document Rendering Failed",
"status": 500,
"jobId": "550e8400-e29b-41d4-a716-446655440000",
"jobStatus": "FAILED",
"processingStage": "rendering",
"retryable": false
}
}¶
APIs documented with the OpenAPI Specification [OPENAPI] can reference the JSON Schema defined in Section 8 to describe async job failure responses.¶
Example OpenAPI schema fragment (YAML):¶
components:
schemas:
AsyncJobProblemDetails:
description: >
RFC 9457 Problem Details extended with async job context
per draft-ratnawat-httpapi-async-problem-details.
allOf:
- $ref: '#/components/schemas/ProblemDetails'
- $ref: 'https://example.com/schemas/async-job-problem-details'
paths:
/api/v1/jobs/{jobId}:
get:
summary: Check async job status
responses:
'200':
description: >
Job status. Content-Type is application/json for
non-terminal and COMPLETED states, and
application/problem+json for failed states.
content:
application/problem+json:
schema:
$ref: '#/components/schemas/AsyncJobProblemDetails'¶
A client that receives a problem details object with some but not all of the extension members defined here SHOULD process the members that are present and ignore the absence of others. For example, a response with "jobId" and "jobStatus" but without "submittedAt" is valid and useful.¶
Per [RFC9457] Section 3.2, consumers MUST ignore extension members they do not understand. This ensures forward compatibility: future revisions of this document may define additional members without breaking existing clients.¶
As specified in Section 4.3, clients that encounter an unrecognized "jobStatus" value SHOULD treat it as non-terminal (equivalent to "PROCESSING"). This provides graceful degradation when a server uses custom status values.¶
[RFC9457] defines both JSON and XML representations for problem details. This document defines extension members for the JSON representation only. Servers that need to produce XML problem details objects may map the extension members to XML elements using the conventions in [RFC9457] Section 4, but this document does not define a normative XML mapping.¶
The "detail" member is human-readable text. Per [RFC9457] Section 3.1.4, the language of "detail" and "title" is determined by Content-Language negotiation.¶
Servers that serve multilingual audiences SHOULD respect the "Accept-Language" header from the originating request when generating the "detail" text for asynchronous failure reports. When the failure report is delivered via a non-HTTP transport where language negotiation is not available, servers SHOULD use English (en) as the default language and MAY include a "Content-Language" metadata field if the transport supports it.¶
The extension members defined in this document were informed by real-world experience building production asynchronous document generation services that process requests via Apache Kafka. The challenge of reporting structured errors across synchronous HTTP and asynchronous messaging boundaries motivated this work.¶
The author thanks the IETF HTTP API Working Group (httpapi) for their ongoing work on [RFC9457], [I-D.ietf-httpapi-idempotency-key-header], and [I-D.ietf-httpapi-ratelimit-headers] -- which collectively address the mechanics of async API patterns and provided the foundation for this complementary specification.¶
Google's AIP-151 (Long-Running Operations) [GOOGLE-AIP-151] influenced the status value design, though this specification chose RFC 9457 as the envelope rather than gRPC/Protobuf.¶
The CloudEvents specification [CLOUDEVENTS] and the OpenAPI Specification [OPENAPI] informed the transport-independence design principle and the integration guidance in the appendices.¶