httpapi G. Ratnawat Internet-Draft IMTF Intended status: Standards Track 26 February 2026 Expires: 30 August 2026 Problem Details for Asynchronous Job Failures draft-ratnawat-httpapi-async-problem-details-00 Abstract HTTP APIs that process work asynchronously need a standard way to report job failures. "Problem Details for HTTP APIs" (RFC 9457) provides the envelope; this document defines extension members that fill it with asynchronous-job-specific context. Eight extension members are specified: "jobId", "jobStatus", "submittedAt", "completedAt", "retryable", "retryAfter", "processingStage", and "correlationId". A ninth member, "results", supports batch operations. Together they let a server describe which job failed, when, at what pipeline stage, and whether a retry is advisable -- in a single, machine-readable JSON (RFC 8259) object that works equally well in an HTTP response body, a message-broker payload, or a webhook callback. Although the primary motivation is structured error reporting for failed jobs, the extension members are equally useful for communicating successful job outcomes (e.g., a COMPLETED status with timing information). This document does NOT redefine how to submit, poll, or cancel asynchronous jobs; those mechanics are already covered by "HTTP Semantics" (RFC 9110) (202 Accepted), "Prefer Header for HTTP" (RFC 7240) (respond-async), and emerging IETF work on long-running operations. Instead, it focuses exclusively on the structured reporting gap that remains after a job reaches a terminal state. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Ratnawat Expires 30 August 2026 [Page 1] Internet-Draft Async Job Problem Details February 2026 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 30 August 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. What This Document Is NOT . . . . . . . . . . . . . . . . 5 1.2. Relationship to Existing Standards . . . . . . . . . . . 5 1.3. Notational Conventions . . . . . . . . . . . . . . . . . 6 1.4. Terminology . . . . . . . . . . . . . . . . . . . . . . . 7 2. Problem Statement . . . . . . . . . . . . . . . . . . . . . . 7 2.1. The Gap in RFC 9457 . . . . . . . . . . . . . . . . . . . 7 2.2. The Transport-Independence Problem . . . . . . . . . . . 8 2.3. Survey of Current Practice . . . . . . . . . . . . . . . 8 3. Extension Members . . . . . . . . . . . . . . . . . . . . . . 10 3.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1.1. Conformance Levels . . . . . . . . . . . . . . . . . 11 3.2. "jobId" . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.3. "jobStatus" . . . . . . . . . . . . . . . . . . . . . . . 12 3.4. "submittedAt" . . . . . . . . . . . . . . . . . . . . . . 12 3.5. "completedAt" . . . . . . . . . . . . . . . . . . . . . . 12 3.6. "retryable" . . . . . . . . . . . . . . . . . . . . . . . 13 3.7. "retryAfter" . . . . . . . . . . . . . . . . . . . . . . 14 3.8. "processingStage" . . . . . . . . . . . . . . . . . . . . 15 3.9. "correlationId" . . . . . . . . . . . . . . . . . . . . . 16 4. Job Status Registry . . . . . . . . . . . . . . . . . . . . . 17 4.1. Initial Values . . . . . . . . . . . . . . . . . . . . . 17 4.2. Terminal vs. Non-Terminal States . . . . . . . . . . . . 18 4.3. Extensibility . . . . . . . . . . . . . . . . . . . . . . 18 5. Transport Contexts . . . . . . . . . . . . . . . . . . . . . 18 Ratnawat Expires 30 August 2026 [Page 2] Internet-Draft Async Job Problem Details February 2026 5.1. HTTP Response Bodies . . . . . . . . . . . . . . . . . . 18 5.2. Message Broker Payloads . . . . . . . . . . . . . . . . . 19 5.3. Webhook Callbacks . . . . . . . . . . . . . . . . . . . . 19 5.4. Server-Sent Events . . . . . . . . . . . . . . . . . . . 20 6. Interaction with Existing Standards . . . . . . . . . . . . . 20 6.1. RFC 9457 — Problem Details for HTTP APIs . . . . . . . . 20 6.2. RFC 9110 — HTTP Semantics (202 Accepted) . . . . . . . . 21 6.3. RFC 7240 — Prefer Header (respond-async) . . . . . . . . 21 6.4. RFC 9512 — YAML Media Type . . . . . . . . . . . . . . . 21 6.5. RFC 8288 — Web Linking . . . . . . . . . . . . . . . . . 21 6.6. RFC 9421 — HTTP Message Signatures . . . . . . . . . . . 22 6.7. RFC 6585 — Additional HTTP Status Codes (429) . . . . . . 22 6.8. RFC 9562 — UUIDs . . . . . . . . . . . . . . . . . . . . 22 6.9. RFC 6901 — JSON Pointer . . . . . . . . . . . . . . . . . 22 6.10. draft-ietf-httpapi-idempotency-key-header . . . . . . . . 23 6.11. draft-ietf-httpapi-ratelimit-headers . . . . . . . . . . 23 7. Batch Operations . . . . . . . . . . . . . . . . . . . . . . 23 7.1. The "results" Extension Member . . . . . . . . . . . . . 23 7.1.1. Payload Size Considerations . . . . . . . . . . . . . 24 7.2. Partial Failure Semantics . . . . . . . . . . . . . . . . 24 8. JSON Schema . . . . . . . . . . . . . . . . . . . . . . . . . 24 9. Security Considerations . . . . . . . . . . . . . . . . . . . 27 9.1. Information Exposure . . . . . . . . . . . . . . . . . . 27 9.2. Correlation ID Injection . . . . . . . . . . . . . . . . 27 9.3. Job Identifier Enumeration . . . . . . . . . . . . . . . 27 9.4. Retry Amplification . . . . . . . . . . . . . . . . . . . 28 9.5. Timing Side Channels . . . . . . . . . . . . . . . . . . 28 9.6. Batch Results Information Scoping . . . . . . . . . . . . 28 9.7. Privacy Considerations . . . . . . . . . . . . . . . . . 28 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 10.1. Problem Details Extension Members . . . . . . . . . . . 29 10.2. Job Status Value Registry . . . . . . . . . . . . . . . 29 10.3. Problem Type URIs . . . . . . . . . . . . . . . . . . . 30 11. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 30 11.1. HTTP Response: Document Rendering Failure . . . . . . . 30 11.2. HTTP Response: Job Timeout with Retry . . . . . . . . . 31 11.3. Kafka Message: PDF Conversion Failure . . . . . . . . . 32 11.4. Webhook Callback: Export Completed with Errors . . . . . 32 11.5. Server-Sent Event: Processing Stage Update . . . . . . . 33 11.6. Batch: Partial Failure . . . . . . . . . . . . . . . . . 33 11.7. HTTP Response: Successful Job Completion . . . . . . . . 34 11.8. Kafka Message: Transient Failure with Retry Guidance . . 35 12. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 12.1. Normative References . . . . . . . . . . . . . . . . . . 36 12.2. Informative References . . . . . . . . . . . . . . . . . 36 Appendix A. Comparison with Industry Patterns . . . . . . . . . 38 A.1. AWS Step Functions . . . . . . . . . . . . . . . . . . . 38 A.2. Google AIP-151 (Long-Running Operations) . . . . . . . . 39 Ratnawat Expires 30 August 2026 [Page 3] Internet-Draft Async Job Problem Details February 2026 A.3. Azure Long-Running Operations . . . . . . . . . . . . . . 39 A.4. Stripe . . . . . . . . . . . . . . . . . . . . . . . . . 40 Appendix B. Design Decisions . . . . . . . . . . . . . . . . . . 40 B.1. Why Extension Members, Not a New Media Type? . . . . . . 40 B.2. Why JSON Members for Retry Instead of Only Headers? . . . 40 B.3. Why an Explicit "retryable" Boolean? . . . . . . . . . . 41 B.4. Why "processingStage" Is a String, Not an Enum? . . . . . 41 B.5. Why "correlationId" References W3C Trace Context? . . . . 41 Appendix C. AsyncAPI Integration . . . . . . . . . . . . . . . . 41 Appendix D. Relationship to CloudEvents . . . . . . . . . . . . 42 Appendix E. OpenAPI Integration . . . . . . . . . . . . . . . . 43 Appendix F. Interoperability Considerations . . . . . . . . . . 44 F.1. Handling Partial Member Sets . . . . . . . . . . . . . . 44 F.2. Handling Unknown Extension Members . . . . . . . . . . . 44 F.3. Handling Unrecognized Job Status Values . . . . . . . . . 44 F.4. XML Representation . . . . . . . . . . . . . . . . . . . 45 F.5. Internationalization of "detail" . . . . . . . . . . . . 45 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . 45 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 45 1. Introduction Asynchronous job processing is a pervasive pattern in HTTP APIs. When a server cannot fulfil a request within a single request- response cycle, it accepts the work, processes it in the background, and reports the outcome later. The IETF has standardized the mechanics of this pattern well: * [RFC9110] Section 15.3.3 defines the "202 Accepted" status code for requests accepted for processing. * [RFC7240] Section 4.1 defines the "respond-async" preference token that lets clients opt into asynchronous handling. * [RFC8288] defines link relations that can point to status polling resources. What has NOT been standardized is the structure of error reports when asynchronous jobs fail. [RFC9457] provides a general-purpose envelope for HTTP API errors ("Problem Details"), including the ability to define extension members for domain-specific context. However, no specification defines reusable extension members for the asynchronous job domain -- leaving every API to invent its own. This document defines exactly those extension members. Ratnawat Expires 30 August 2026 [Page 4] Internet-Draft Async Job Problem Details February 2026 The design principle is transport independence: the same Problem Details object -- with the same extension members -- can appear in an HTTP response, a Kafka message, a webhook POST body, or a Server-Sent Event. This is critical because asynchronous job results often travel through non-HTTP transports where HTTP headers like "Retry- After" are unavailable. 1.1. What This Document Is NOT To avoid duplicating existing and emerging IETF work, this document explicitly does NOT define: * How to submit an asynchronous job. Use POST with "Prefer: respond-async" [RFC7240] and "202 Accepted" [RFC9110]. * How to poll for job status. Use the URI from the "Location" header of the 202 response, or a link relation pointing to the job status resource [RFC8288]. * How to cancel a running job. This is operation-specific. * Idempotency semantics for job submission. See [I-D.ietf-httpapi-idempotency-key-header]. * Rate-limiting headers for status polling. See [I-D.ietf-httpapi-ratelimit-headers] and [RFC6585] (429). * A new media type. This document uses "application/problem+json" as defined by [RFC9457]. This document fills a single, specific gap: structured error context for async job failures, expressed as [RFC9457] extension members. 1.2. Relationship to Existing Standards The following table maps each aspect of the asynchronous job lifecycle to the standard that covers it, and identifies where this document contributes. Ratnawat Expires 30 August 2026 [Page 5] Internet-Draft Async Job Problem Details February 2026 +=====================+==================================+==========+ |Lifecycle Aspect |Covered By |This | | | |Document | +=====================+==================================+==========+ |Accepting a job |RFC 9110 (202) |No (uses | | | |as-is) | +---------------------+----------------------------------+----------+ |Client async |RFC 7240 |No (uses | |preference | |as-is) | +---------------------+----------------------------------+----------+ |Status polling link |RFC 8288 |No (uses | | | |as-is) | +---------------------+----------------------------------+----------+ |Idempotent submission|I-D.ietf-httpapi-idempotency-key |No (uses | | | |as-is) | +---------------------+----------------------------------+----------+ |Rate-limited polling |I-D.ietf-httpapi-ratelimit-headers|No (uses | | | |as-is) | +---------------------+----------------------------------+----------+ |Error envelope format|RFC 9457 |No | | | |(extends) | +---------------------+----------------------------------+----------+ |Job failure context |(none) |YES — | | | |defines | | | |extension | | | |members | +---------------------+----------------------------------+----------+ |Transport-independent|(none) |YES — | |retry semantics | |defines | | | |retryable,| | | |retryAfter| +---------------------+----------------------------------+----------+ |Batch partial failure|(none) |YES — | |structure | |defines | | | |results | | | |array | +---------------------+----------------------------------+----------+ Table 1: Relationship to Existing Standards 1.3. Notational Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. Ratnawat Expires 30 August 2026 [Page 6] Internet-Draft Async Job Problem Details February 2026 This document uses the terms "extension member", "problem type", and "problem details object" as defined in [RFC9457]. JSON [RFC8259] is used for all examples and the normative schema. Whitespace in JSON examples is insignificant and is included for readability only. 1.4. Terminology job: A unit of work that a server accepts for asynchronous processing in response to a client request. job status resource: An HTTP resource whose representation describes the current state of a job. Typically the target of the "Location" header returned in a "202 Accepted" response. terminal state: A job status from which no further transitions are possible (COMPLETED, FAILED, CANCELLED, TIMED_OUT, COMPLETED_WITH_ERRORS). non-terminal state: A job status from which the job may still transition (ACCEPTED, PROCESSING, and any server-defined intermediate states). originating request: The HTTP request that caused the job to be created. 2. Problem Statement 2.1. The Gap in RFC 9457 [RFC9457] defines five standard members for problem details objects: "type", "title", "status", "detail", and "instance". It explicitly allows extension members for domain-specific data (Section 3.2 of [RFC9457]). However, [RFC9457] does not define any reusable extension members. Each API defines its own, leading to fragmentation. For the asynchronous job domain -- one of the most common API patterns -- this fragmentation is particularly costly because: * Monitoring tools cannot parse job failure context across APIs. * Client libraries must handle bespoke error shapes per service. * API designers spend time reinventing the same fields. Ratnawat Expires 30 August 2026 [Page 7] Internet-Draft Async Job Problem Details February 2026 2.2. The Transport-Independence Problem A key challenge specific to asynchronous jobs is that the failure report may not travel over HTTP at all. Consider: * A PDF generation service publishes job results to a Kafka topic. There are no HTTP headers; the entire context must be in the message payload. * A webhook delivers the result via a POST callback. The "Retry- After" header applies to the webhook delivery, not to the original job. * A Server-Sent Event stream pushes status updates. The event data field carries JSON; HTTP headers are from the SSE connection, not the job. The "Retry-After" header field ([RFC9110] Section 10.2.3) is transport-bound: it only exists in HTTP responses. When a job failure must be communicated via Kafka, SSE, gRPC, or a webhook, the retry signal is lost. This document solves this by defining "retryable" and "retryAfter" as JSON members inside the Problem Details object -- making them available regardless of transport. 2.3. Survey of Current Practice The following table surveys how four major platforms represent async job failures today. No two use the same field names. Ratnawat Expires 30 August 2026 [Page 8] Internet-Draft Async Job Problem Details February 2026 +==========+============+===================+===================+==========+ |Concept |AWS Step |Azure LRO |Google AIP-151 |Stripe | | |Functions | | | | +==========+============+===================+===================+==========+ |Job |executionArn|id |name |id | |identifier| | | | | +----------+------------+-------------------+-------------------+----------+ |Status |status |status |done + error |status | |field | | | | | +----------+------------+-------------------+-------------------+----------+ |Status |RUNNING, |Succeeded, Failed, |done: true/false |pending, | |values |SUCCEEDED, |Cancelled | |succeeded,| | |FAILED, | | |failed | | |TIMED_OUT, | | | | | |ABORTED | | | | +----------+------------+-------------------+-------------------+----------+ |Submission|startDate |createdDateTime |metadata.createTime|created | |time | | | | | +----------+------------+-------------------+-------------------+----------+ |Completion|stopDate |lastUpdatedDateTime|metadata.endTime |N/A | |time | | | | | +----------+------------+-------------------+-------------------+----------+ |Error |error, cause|error: {code, |google.rpc.Status |error: | |structure |(strings) |message} | |{type, | | | | | |message} | +----------+------------+-------------------+-------------------+----------+ |Retry |N/A |retryAfter |N/A |N/A | |guidance | | | | | +----------+------------+-------------------+-------------------+----------+ |Failure |N/A |N/A |N/A |N/A | |stage | | | | | +----------+------------+-------------------+-------------------+----------+ |Error |Custom JSON |Custom JSON |google.rpc.Status |Custom | |format | | | |JSON | +----------+------------+-------------------+-------------------+----------+ Table 2: Async Job Error Reporting — Industry Survey Key observations: * Every platform invents its own field names, status values, and error structures. * None builds on [RFC9457], despite it being the IETF standard for HTTP API errors. * Retry guidance is almost never structured (only Azure's LRO pattern includes "retryAfter", and only in the HTTP header). Ratnawat Expires 30 August 2026 [Page 9] Internet-Draft Async Job Problem Details February 2026 * Processing stage information -- which pipeline step failed -- is universally absent, despite being essential for debugging multi- step workflows (e.g., validate → render → convert → store). * Google's AIP-151 (Long-Running Operations) comes closest to a standard but is Google-specific and uses gRPC/Protobuf, not [RFC9457]. 3. Extension Members 3.1. Overview This section defines eight extension members for [RFC9457] problem details objects. They are organized into four groups: Identification: "jobId" (Section 3.2) and "correlationId" (Section 3.9). State and Timing: "jobStatus" (Section 3.3), "submittedAt" (Section 3.4), and "completedAt" (Section 3.5). Retry Guidance: "retryable" (Section 3.6) and "retryAfter" (Section 3.7). Diagnostics: "processingStage" (Section 3.8). All extension members are OPTIONAL in any given problem details object. A server MAY include any subset of them. A client MUST NOT assume any particular member is present and MUST gracefully handle its absence. When present, these members MUST appear at the top level of the problem details JSON object, alongside the standard [RFC9457] members ("type", "title", "status", "detail", "instance"). Although defined as [RFC9457] extension members, these JSON member names and semantics MAY also be used in non-problem-details JSON objects (e.g., successful job status responses). When used outside a problem details context, the [RFC9457] standard members ("type", "title", "status", "detail", "instance") are not required. However, when reporting failures, "application/problem+json" SHOULD be used per [RFC9457]. In all non-HTTP transport contexts (message brokers, webhooks, SSE), the "status" member represents the HTTP status code that WOULD have been used if the failure had been reported in a synchronous HTTP response. It is not the HTTP status code of the delivery mechanism. Ratnawat Expires 30 August 2026 [Page 10] Internet-Draft Async Job Problem Details February 2026 3.1.1. Conformance Levels To provide interoperability guidance while preserving flexibility, this document defines two conformance levels: Basic Conformance: A problem details object that includes at least one extension member from this specification. This is the minimum bar: any server that includes "jobId", "processingStage", or any other member defined herein is using this specification. Full Conformance: A problem details object that includes "jobId", "jobStatus", and "submittedAt". These three members together provide sufficient context to uniquely identify a job, know its state, and establish when it was accepted. Servers that aim for interoperability across monitoring tools and client libraries SHOULD target Full Conformance. Individual member definitions below use "RECOMMENDED" to indicate members that SHOULD be present for Full Conformance. This is not a contradiction with the OPTIONAL preamble above: all members remain OPTIONAL in the [RFC9457] sense, but Full Conformance requires a specific subset. 3.2. "jobId" A string that uniquely identifies the asynchronous job. * Type: string * Constraints: The server MUST assign this value when the job is accepted. The value SHOULD be a UUID [RFC9562] or another globally unique identifier. The client MUST treat it as an opaque string and MUST NOT parse or interpret its structure. * When to include: RECOMMENDED whenever reporting an async job outcome. * Relationship: If the job was submitted via HTTP and the server returned a "Location" header pointing to a job status resource, the "jobId" value SHOULD match the identifier embedded in that URI. Example: "jobId": "550e8400-e29b-41d4-a716-446655440000" Ratnawat Expires 30 August 2026 [Page 11] Internet-Draft Async Job Problem Details February 2026 3.3. "jobStatus" A string indicating the current state of the job. Registered values are defined in Section 4. * Type: string (case-sensitive, enumerated) * Constraints: Consumers MUST perform case-sensitive string comparison when matching "jobStatus" values against the registered values in Section 4. * When to include: RECOMMENDED whenever reporting an async job outcome. Example: "jobStatus": "FAILED" 3.4. "submittedAt" An [RFC3339] timestamp indicating when the server accepted the job for processing. * Type: string (date-time per [RFC3339]) * Constraints: MUST be in UTC (indicated by the "Z" suffix) to avoid timezone ambiguity in distributed systems. * When to include: RECOMMENDED whenever reporting an async job outcome. Example: "submittedAt": "2026-02-26T10:00:00Z" 3.5. "completedAt" An [RFC3339] timestamp indicating when the job reached a terminal state. * Type: string (date-time per [RFC3339]) Ratnawat Expires 30 August 2026 [Page 12] Internet-Draft Async Job Problem Details February 2026 * Constraints: MUST be in UTC. MUST NOT be present when the job is in a non-terminal state. When the problem details object is published to a message broker or delivered via webhook, the producer MUST ensure that "completedAt" is absent if "jobStatus" is a non-terminal value at the time of publication. Consumers that encounter this inconsistency SHOULD log a warning and ignore the "completedAt" value. * When to include: RECOMMENDED when the job has reached a terminal state. Example: "completedAt": "2026-02-26T10:00:05Z" 3.6. "retryable" A boolean indicating whether the client SHOULD retry the job submission with the same input. * Type: boolean * Default: If absent, the client SHOULD assume "false" (not retryable). * Semantics: - "true": The failure is likely transient (e.g., a downstream dependency was temporarily unavailable). The same request MAY succeed if resubmitted. - "false": The failure is deterministic (e.g., invalid input, missing template). Retrying with the same input will produce the same failure. * Relationship to "Retry-After" header: When the failure is reported in an HTTP response, the server SHOULD also include the "Retry- After" header field [RFC9110] with a value consistent with the "retryAfter" extension member. When the failure is reported over a non-HTTP transport (message broker, webhook, SSE), the "retryAfter" member is the only mechanism available. This is the primary motivation for expressing retry semantics as a JSON member rather than relying solely on the HTTP header. Example: "retryable": true Ratnawat Expires 30 August 2026 [Page 13] Internet-Draft Async Job Problem Details February 2026 3.7. "retryAfter" A non-negative integer indicating the number of seconds the client SHOULD wait before retrying the job submission. * Type: integer (>= 0) * Constraints: - SHOULD NOT be present when "retryable" is absent or "false". In exceptional cases, a server MAY include "retryAfter" without "retryable" to signal a general backoff interval (e.g., the server is under load and wants all clients to slow down), but clients SHOULD NOT interpret this as permission to retry a deterministically failed job. - SHOULD be present when "retryable" is "true". - A value of 0 is syntactically valid but clients MUST NOT interpret it as "retry immediately"; see Section 9.4 for minimum backoff requirements. * Semantics: Equivalent to the "Retry-After" header field ([RFC9110] Section 10.2.3) expressed as seconds, but embedded in the JSON body for transport independence. * Temporal reference: The "retryAfter" value represents the minimum number of seconds to wait before retrying, measured from the value of "completedAt" (if present). When "completedAt" is absent, the value is measured from the time the consumer receives the problem details object. For non-HTTP transports where delivery delay is possible (e.g., a lagging Kafka consumer), consumers SHOULD compute the effective wait time as: retryAfter - (now - completedAt). If the result is zero or negative, the consumer MAY retry immediately (subject to the minimum backoff floor in Section 9.4). * Client-side safety: Regardless of the value of "retryAfter", clients MUST enforce a minimum backoff floor (e.g., 1 second) to prevent tight retry loops in case of a misconfigured or malicious server. Clients SHOULD also enforce a maximum retry delay (e.g., 3600 seconds) beyond which the "retryAfter" value is treated as indicating that a retry is impractical. See Section 9.4. Ratnawat Expires 30 August 2026 [Page 14] Internet-Draft Async Job Problem Details February 2026 * Relationship to draft-ietf-httpapi-ratelimit-headers: The RateLimit header fields defined in that draft apply to the HTTP request rate for the polling endpoint. The "retryAfter" extension member defined here applies to the job resubmission interval. These are distinct concerns: one governs how often you ask "is it done?", the other governs how soon you should try the work again. Example: "retryAfter": 30 3.8. "processingStage" A string identifying the stage in the server's processing pipeline at which the job failed or completed. * Type: string * Constraints: The value is server-defined. Servers that use this member SHOULD document the possible values and their meanings in their API specification. * When to include: OPTIONAL. RECOMMENDED when the server has a multi-step processing pipeline and the failure can be attributed to a specific step. * Registry: This document does not create an IANA registry for processing stage values. Stage names are inherently API-specific because processing pipelines vary too widely across domains. The interoperability benefit of this member comes from its structured presence (enabling tools to extract and display the stage), not from standardized values. Servers SHOULD use the stage names from Table 3 when applicable, and MAY define additional names for domain-specific stages. Recommended stage names (servers MAY define others): Ratnawat Expires 30 August 2026 [Page 15] Internet-Draft Async Job Problem Details February 2026 +===============+=================================================+ | Stage | Description | +===============+=================================================+ | validation | Input validation against schema or business | | | rules failed. | +---------------+-------------------------------------------------+ | authorization | The job's security context was insufficient for | | | the requested operation. | +---------------+-------------------------------------------------+ | queuing | The job could not be placed onto the processing | | | queue (broker unavailable, queue full). | +---------------+-------------------------------------------------+ | processing | General processing failure (use more specific | | | stages when possible). | +---------------+-------------------------------------------------+ | rendering | Template or document rendering failed. | +---------------+-------------------------------------------------+ | conversion | Format conversion failed (e.g., HTML to PDF, | | | image transcoding). | +---------------+-------------------------------------------------+ | storage | The result could not be persisted (disk, object | | | store, database). | +---------------+-------------------------------------------------+ | delivery | The result could not be delivered to the | | | callback URL or output channel. | +---------------+-------------------------------------------------+ Table 3: Recommended Processing Stage Names Example: "processingStage": "rendering" 3.9. "correlationId" A string containing a client-supplied identifier from the originating request, used for distributed tracing. * Type: string * Constraints: The value is client-supplied and MUST be treated as untrusted (see Section 9.2). Servers SHOULD enforce a maximum length (e.g., 256 characters) and reject or truncate values that exceed it. See Section 9.2. Ratnawat Expires 30 August 2026 [Page 16] Internet-Draft Async Job Problem Details February 2026 * When to include: RECOMMENDED when the originating request included a correlation identifier (e.g., via the "X-Correlation-ID" header, the W3C "traceparent" header [W3C.TRACE-CONTEXT], or a field in the request body). * Relationship to W3C Trace Context: If the server supports W3C Trace Context [W3C.TRACE-CONTEXT], the "correlationId" SHOULD carry the trace-id portion of the "traceparent" header. This allows the problem details object to be correlated with distributed traces without requiring the consumer to have access to the originating HTTP headers. Example: "correlationId": "4bf92f3577b34da6a3ce929d0e0e4736" 4. Job Status Registry 4.1. Initial Values +=======================+=================================+ | Status Value | Description | +=======================+=================================+ | ACCEPTED | The job has been accepted but | | | processing has not yet started. | +-----------------------+---------------------------------+ | PROCESSING | The job is actively being | | | processed. | +-----------------------+---------------------------------+ | COMPLETED | The job finished successfully; | | | the result is available. | +-----------------------+---------------------------------+ | FAILED | The job encountered an error. | | | The "detail" member SHOULD | | | describe the failure. | +-----------------------+---------------------------------+ | CANCELLED | The job was cancelled before | | | completion. | +-----------------------+---------------------------------+ | TIMED_OUT | The job exceeded the server's | | | maximum allowed processing | | | duration. | +-----------------------+---------------------------------+ | COMPLETED_WITH_ERRORS | (Batch only) Some items | | | succeeded and some failed. See | | | Section 7. | +-----------------------+---------------------------------+ Ratnawat Expires 30 August 2026 [Page 17] Internet-Draft Async Job Problem Details February 2026 Table 4: Initial Job Status Values 4.2. Terminal vs. Non-Terminal States Terminal states: COMPLETED, FAILED, CANCELLED, TIMED_OUT, COMPLETED_WITH_ERRORS. Once a job reaches a terminal state, its "jobStatus" MUST NOT change. Non-terminal states: ACCEPTED, PROCESSING. 4.3. Extensibility Servers MAY define additional status values for their domain (e.g., "VALIDATING", "RENDERING", "AWAITING_APPROVAL"). To reduce the risk of interoperability issues from typos or inconsistent naming, server-defined status values SHOULD use UPPER_SNAKE_CASE to match the registered values, and SHOULD be documented in the API specification. Clients that encounter an unrecognized status value SHOULD treat it as non-terminal (equivalent to "PROCESSING") unless the value is documented by the API as terminal. 5. Transport Contexts A core design goal of this specification is that the extension members work identically regardless of how the problem details object reaches the consumer. This section provides guidance for four common transports. 5.1. HTTP Response Bodies This is the primary context anticipated by [RFC9457]. The problem details object appears in the body of an HTTP response with Content- Type "application/problem+json". When used in HTTP responses, servers SHOULD also set: * The "Retry-After" header field to match the "retryAfter" member (if present), for compatibility with clients that do not parse the body. * The "status" member to match the HTTP status code of the response, per [RFC9457] Section 3.1. Example context: A client polls a job status resource and receives a failure report. Ratnawat Expires 30 August 2026 [Page 18] Internet-Draft Async Job Problem Details February 2026 5.2. Message Broker Payloads When a job result is published to a message broker (e.g., Apache Kafka, RabbitMQ, Amazon SQS), the problem details object is serialized as the message value. In this context: * HTTP headers are unavailable. The "retryAfter" member is the only mechanism for conveying retry guidance. * The "status" member represents the HTTP status code that WOULD have been used if the failure had been reported synchronously. * The broker message key SHOULD be the "jobId" value, enabling consumers to partition and deduplicate by job. * The Content-Type of the message (if the broker supports content- type metadata) SHOULD be "application/problem+json". Example context: A PDF generation service publishes a failure to a Kafka result topic. The consumer reads the Problem Details object from the message value. 5.3. Webhook Callbacks When a server delivers a job result by POSTing to a client-supplied callback URL, the request body SHOULD be the problem details object with Content-Type "application/problem+json". In this context: * The HTTP status code of the webhook POST is about the delivery, not the job. The "status" member inside the problem details object conveys the job-level status. * Servers SHOULD sign webhook deliveries using HTTP Message Signatures [RFC9421] to allow recipients to verify authenticity. See Section 6.6. * The "retryAfter" member applies to job resubmission, NOT to webhook delivery retry. Webhook delivery retry is a concern of the delivery mechanism, not this specification. Ratnawat Expires 30 August 2026 [Page 19] Internet-Draft Async Job Problem Details February 2026 5.4. Server-Sent Events When a server pushes job status updates via Server-Sent Events (SSE) [W3C.SSE], the problem details object is serialized as the "data" field of an event. The "event" field of the SSE SHOULD be "job-failed" (or a similar descriptive type). Example: event: job-failed data: {"type":"https://api.example.com/problems/rendering-failed", data: "title":"Document Rendering Failed","status":500, data: "jobId":"550e8400","jobStatus":"FAILED", data: "processingStage":"rendering"} 6. Interaction with Existing Standards This section documents how the extension members interact with each referenced RFC, to help implementers compose them correctly. 6.1. RFC 9457 — Problem Details for HTTP APIs This document extends [RFC9457] by defining reusable extension members per Section 3.2 of that specification. All standard [RFC9457] members ("type", "title", "status", "detail", "instance") retain their original semantics. Notably: * "type" SHOULD be a URI identifying the specific failure type, NOT "about:blank", when async job extension members are present. This enables consumers to distinguish async job failures from generic HTTP errors. * "instance" SHOULD identify the specific occurrence of the problem, per [RFC9457] Section 3.1.5. For async jobs, this is typically the URI of the job status resource (e.g., "/api/v1/documents/ jobs/550e8400-e29b-41d4-a716-446655440000"), enabling consumers to dereference it for full details. * "status" represents the HTTP status code that WOULD have been returned if the job had been processed synchronously. Ratnawat Expires 30 August 2026 [Page 20] Internet-Draft Async Job Problem Details February 2026 6.2. RFC 9110 — HTTP Semantics (202 Accepted) [RFC9110] Section 15.3.3 defines 202 Accepted as indicating that the request has been accepted for processing but the processing has not been completed. It notes that there is no facility in HTTP for re- connecting to an asynchronous operation later. This document does not change the semantics of 202. The extension members defined here apply to the failure report, not to the acceptance response. 6.3. RFC 7240 — Prefer Header (respond-async) [RFC7240] Section 4.1 defines the "respond-async" preference token. This document does not change its semantics. When a server honors "respond-async" and later the job fails, the failure report (returned from the job status resource or delivered via another transport) SHOULD include the extension members defined in this document. Note: The extension members defined here are useful regardless of whether the originating request included "Prefer: respond-async". A server that always processes requests asynchronously (without requiring the Prefer header) SHOULD still use these extension members when reporting job outcomes. 6.4. RFC 9512 — YAML Media Type If an API documents its async job failure schemas using AsyncAPI [ASYNCAPI] specifications (commonly authored in YAML), the specification file SHOULD be served with Content-Type "application/ yaml" per [RFC9512] when exposed via HTTP. 6.5. RFC 8288 — Web Linking Servers that return 202 Accepted SHOULD include a "Link" header with a relation type that points to the job status resource. Servers MAY use the registered "status" relation type [RFC8631] or a URI-based extension relation type as described in Section 2.1.2 of [RFC8288]. When a completed or failed job's status response includes a link to the result resource, the server SHOULD use a URI-based extension relation type as described in Section 2.1.2 of [RFC8288] (e.g., "https://api.example.com/rels/job-result") or an existing registered relation type such as "related" [IANA.LINK-RELATIONS]. Servers MUST NOT use unregistered short-form relation types. Ratnawat Expires 30 August 2026 [Page 21] Internet-Draft Async Job Problem Details February 2026 6.6. RFC 9421 — HTTP Message Signatures When job failure reports are delivered via webhooks, servers MAY sign the webhook request using [RFC9421] to allow the recipient to verify the report's authenticity and integrity. When job results are delivered via message brokers, [RFC9421] does not apply (it is HTTP-specific). Broker-level security mechanisms (e.g., TLS, SASL) SHOULD be used instead. 6.7. RFC 6585 — Additional HTTP Status Codes (429) [RFC6585] defines 429 Too Many Requests. This applies to the job status polling endpoint, NOT to job failure reports. If a client polls too frequently, the server MAY return 429 with a "Retry-After" header. This is distinct from the "retryAfter" extension member, which governs job resubmission timing. Implementers MUST NOT confuse these two retry signals: * "Retry-After" header on a 429 response: "wait before polling again." * "retryAfter" extension member in a FAILED job report: "wait before resubmitting the job." 6.8. RFC 9562 — UUIDs The "jobId" member SHOULD be a UUID per [RFC9562] to ensure global uniqueness and unguessability. UUIDv7 is RECOMMENDED for new implementations because it embeds a timestamp, enabling natural chronological ordering of jobs. 6.9. RFC 6901 — JSON Pointer When a job fails at the "validation" stage, the "detail" member MAY reference specific input fields using JSON Pointer [RFC6901] syntax (e.g., "Field /templateData/customerName is required"). For richer validation error reporting, implementers MAY combine these extension members with validation-specific extensions (such as a "violations" array using JSON Pointer field references), but defining such extensions is outside the scope of this document. Ratnawat Expires 30 August 2026 [Page 22] Internet-Draft Async Job Problem Details February 2026 6.10. draft-ietf-httpapi-idempotency-key-header When a client resubmits a failed job (guided by "retryable": true), it SHOULD include an "Idempotency-Key" header per [I-D.ietf-httpapi-idempotency-key-header] to prevent duplicate processing if the resubmission is received more than once. The "jobId" from the failed job SHOULD NOT be reused as the idempotency key, because the retry is a new job submission. 6.11. draft-ietf-httpapi-ratelimit-headers The RateLimit header fields defined in [I-D.ietf-httpapi-ratelimit-headers] apply to HTTP request rates for any endpoint, including job status polling endpoints. The "retryAfter" extension member defined in this document applies to job resubmission intervals. These are orthogonal concerns. 7. Batch Operations 7.1. The "results" Extension Member For batch operations that produce multiple outputs, this document defines the "results" extension member: an array of per-item outcome objects. Each object in the array MUST contain: * "itemId" (string): An identifier for the item within the batch. The value is server-defined and SHOULD correspond to an identifier from the original batch request. If the original batch request does not assign item identifiers, the server SHOULD assign them (e.g., zero-based array indices as strings: "0", "1", "2") and document the assignment scheme in the API specification. * "status" (string): A terminal job status value from Section 4. Note: "COMPLETED_WITH_ERRORS" is not valid for individual batch items because a single item cannot itself be a partial batch; only the top-level batch "jobStatus" uses this value. Each object MAY additionally contain: * "detail" (string): A human-readable description of the outcome. * "retryable" (boolean): Whether this specific item can be retried. * "processingStage" (string): The stage at which this item failed. Ratnawat Expires 30 August 2026 [Page 23] Internet-Draft Async Job Problem Details February 2026 7.1.1. Payload Size Considerations For large batches (hundreds or thousands of items), including every item in the "results" array may produce an impractically large payload. Servers SHOULD apply the following strategies: * Include only failed items in "results" when the batch jobStatus is "COMPLETED_WITH_ERRORS", and note the omission in the "detail" member (e.g., "3 of 5000 items failed; only failed items are listed"). * Define a maximum number of items in "results" (documented in the API specification) and provide a link to a paginated resource for the complete result set. Example: "detail": "150 of 10000 items failed. First 50 shown.", "results": [ "... 50 items ..." ] With a "Link" header or body link to the full result set. * For very large batches, consider omitting "results" entirely and providing only a summary in "detail" with a link to the detailed results. This document does not define a pagination mechanism for "results"; pagination is left to the API specification. 7.2. Partial Failure Semantics When a batch contains both successes and failures: * The "jobStatus" SHOULD be "COMPLETED_WITH_ERRORS". * The "status" member (HTTP status equivalent) SHOULD be 207 (Multi- Status), as defined in [RFC4918] Section 13. Note that 207 was originally defined for WebDAV; its use in JSON batch APIs is a widely adopted industry convention. * The "detail" member SHOULD summarize the outcome (e.g., "3 of 5 items completed successfully"). 8. JSON Schema The following JSON Schema [JSON-SCHEMA] defines the extension members introduced by this document. It is designed to be composed (via "allOf" or "$ref") with the Problem Details schema from Appendix A of [RFC9457]. Ratnawat Expires 30 August 2026 [Page 24] Internet-Draft Async Job Problem Details February 2026 { "$schema": "https://json-schema.org/draft/2020-12/schema", "$id": "https://example.com/schemas/async-job-problem-details", "title": "Async Job Problem Details Extensions", "description": "Extension members for RFC 9457 Problem Details objects that describe asynchronous job outcomes.", "type": "object", "properties": { "jobId": { "type": "string", "description": "Unique identifier for the async job.", "examples": ["550e8400-e29b-41d4-a716-446655440000"] }, "jobStatus": { "type": "string", "description": "Current state of the async job. Registered values are ACCEPTED, PROCESSING, COMPLETED, FAILED, CANCELLED, TIMED_OUT, and COMPLETED_WITH_ERRORS. Servers MAY define additional values per Section 4.3.", "examples": ["FAILED", "COMPLETED", "TIMED_OUT"] }, "submittedAt": { "type": "string", "format": "date-time", "description": "RFC 3339 timestamp (UTC) when the job was accepted for processing.", "examples": ["2026-02-26T10:00:00Z"] }, "completedAt": { "type": "string", "format": "date-time", "description": "RFC 3339 timestamp (UTC) when the job reached a terminal state.", "examples": ["2026-02-26T10:00:05Z"] }, "retryable": { "type": "boolean", "default": false, "description": "Whether the client should retry the job with the same input." }, "retryAfter": { "type": "integer", "minimum": 0, "description": "Seconds to wait before retrying. Only meaningful when retryable is true." }, "processingStage": { "type": "string", "description": "The pipeline stage at which the failure occurred.", "examples": ["validation", "rendering", "conversion"] }, "correlationId": { "type": "string", "description": "Client-supplied correlation identifier from the originating request.", "examples": ["4bf92f3577b34da6a3ce929d0e0e4736"] Ratnawat Expires 30 August 2026 [Page 25] Internet-Draft Async Job Problem Details February 2026 }, "results": { "type": "array", "description": "Per-item outcomes for batch operations.", "items": { "type": "object", "required": ["itemId", "status"], "properties": { "itemId": { "type": "string", "description": "Identifier for the batch item." }, "status": { "type": "string", "enum": [ "COMPLETED", "FAILED", "CANCELLED", "TIMED_OUT" ], "description": "Terminal outcome for this item." }, "detail": { "type": "string", "description": "Human-readable outcome description." }, "retryable": { "type": "boolean", "description": "Whether this item can be retried." }, "processingStage": { "type": "string", "description": "Stage at which this item failed." } }, "additionalProperties": true } } }, "additionalProperties": true } Note: The "jobStatus" property does not use a JSON Schema "enum" constraint because Section 4.3 allows servers to define additional status values. Validators that wish to restrict values to the registered set MAY add an "enum" constraint locally, but SHOULD accept unrecognized values gracefully in production. Ratnawat Expires 30 August 2026 [Page 26] Internet-Draft Async Job Problem Details February 2026 9. Security Considerations 9.1. Information Exposure The extension members defined here can reveal internal architecture details. Servers MUST consider what information is appropriate for their audience: * "processingStage" reveals pipeline structure. For external-facing APIs, servers SHOULD use generic stage names ("processing", "conversion") rather than implementation-specific names ("kafka- consumer-group-rebalance", "s3-multipart-upload"). * "jobId" values MUST NOT encode sensitive data (e.g., user IDs, database primary keys). * "detail" messages MUST NOT include stack traces, internal hostnames, database queries, or file system paths. 9.2. Correlation ID Injection The "correlationId" is client-supplied and MUST be treated as untrusted input. * Servers MUST sanitize the value before including it in any output context -- log files, HTTP headers, HTML pages, database queries, or template rendering -- to prevent injection attacks including log injection (newline injection, ANSI escape sequences), HTTP header injection (CRLF injection), and cross-site scripting (XSS). * Servers SHOULD enforce a maximum length (e.g., 256 characters) and restrict the character set (e.g., alphanumeric, hyphens, underscores, periods). * Servers MUST NOT use the correlation ID in SQL queries, template rendering, or any context where injection is possible, without proper escaping. 9.3. Job Identifier Enumeration If job identifiers are sequential or predictable, an attacker can enumerate job status resources belonging to other clients. * Servers MUST use unguessable identifiers. UUIDv4 or UUIDv7 [RFC9562] are RECOMMENDED. * Servers SHOULD enforce authorization on job status resources: a client MUST only be able to access jobs it submitted. Ratnawat Expires 30 August 2026 [Page 27] Internet-Draft Async Job Problem Details February 2026 * Servers SHOULD implement rate limiting on the job status endpoint to slow enumeration attempts. 9.4. Retry Amplification A malicious or compromised server could return "retryable": true with "retryAfter": 0 to induce clients to retry in a tight loop, creating a self-inflicted denial of service. Clients MUST enforce: * A maximum retry count (e.g., 5 attempts). * A minimum backoff floor (e.g., 1 second), regardless of the server's "retryAfter" value. A "retryAfter" value of 0 MUST NOT be interpreted as "retry immediately". * Exponential backoff with jitter for successive retries. 9.5. Timing Side Channels The "submittedAt" and "completedAt" timestamps reveal processing duration. In sensitive contexts, this could leak information about: * Server load (longer processing times indicate high load). * Input complexity (more complex inputs may take longer). * Conditional branching (different code paths take different amounts of time). Servers operating in high-security environments MAY omit these members or round them to reduce precision. 9.6. Batch Results Information Scoping In multi-tenant systems where a batch may contain items belonging to different authorization domains, the "results" array MUST only include items that the requesting client is authorized to view. Servers MUST NOT leak information about other tenants' items through the "results" array, the "detail" summary, or the item count. 9.7. Privacy Considerations The extension members defined here may constitute or contain personally identifiable information (PII): Ratnawat Expires 30 August 2026 [Page 28] Internet-Draft Async Job Problem Details February 2026 * "jobId" values, if correlated with external data, could identify individual users or transactions. * "correlationId" values are client-supplied and may contain user identifiers, session tokens, or other PII. * "submittedAt" and "completedAt" timestamps can reveal usage patterns that may be attributable to individuals. Servers SHOULD evaluate the privacy implications of including these members in responses that may be logged, cached, or forwarded through intermediaries. In jurisdictions with data protection regulations (e.g., GDPR, CCPA), operators SHOULD ensure that problem details objects containing PII are treated as personal data for retention and access control purposes. Servers SHOULD NOT include these extension members in responses that are cacheable or served to unauthenticated clients unless the information is already public. 10. IANA Considerations 10.1. Problem Details Extension Members This document defines extension members for [RFC9457] problem details objects per the extension mechanism in Section 3.2 of [RFC9457]. As [RFC9457] does not establish a registry for extension members, the members defined in this document ("jobId", "jobStatus", "submittedAt", "completedAt", "retryable", "retryAfter", "processingStage", "correlationId", and "results") are specified here and do not require IANA registration. Implementers that define additional extension members for async job problem details SHOULD choose names that do not conflict with the members defined in this document. 10.2. Job Status Value Registry This document requests that IANA create a "Problem Details Async Job Status Values" registry with the initial entries from Table 4 (Section 4.1). Registration policy: Specification Required [RFC8126]. Each registration MUST include: * Status Value: An UPPER_SNAKE_CASE string (e.g., "PAUSED"). Ratnawat Expires 30 August 2026 [Page 29] Internet-Draft Async Job Problem Details February 2026 * Terminal: Whether this status is terminal (yes/no). * Description: A brief description of the status meaning. * Reference: A reference to the specification defining the status value. The designated expert(s) SHOULD verify that proposed values use UPPER_SNAKE_CASE, do not duplicate the semantics of existing registered values, and are clearly documented as either terminal or non-terminal. 10.3. Problem Type URIs This document does not register any problem type URIs. The "type" member in the examples uses illustrative URIs under the "api.example.com" domain. Servers SHOULD define their own problem type URIs under a domain they control. When async job extension members are present, the "type" member SHOULD NOT be "about:blank"; a specific URI helps consumers distinguish async job failures from generic HTTP errors. See [RFC9457] Section 4.2.1 for guidance on using "about:blank". 11. Examples All examples are complete problem details objects that validate against the schema in Section 8. 11.1. HTTP Response: Document Rendering Failure A document generation API processed a PDF request asynchronously. The rendering stage failed. The client discovers this by polling the job status resource. HTTP context: GET /api/v1/documents/jobs/550e8400-e29b-41d4-a716-446655440000 HTTP/1.1 Host: api.example.com HTTP/1.1 200 OK Content-Type: application/problem+json Ratnawat Expires 30 August 2026 [Page 30] Internet-Draft Async Job Problem Details February 2026 { "type": "https://api.example.com/problems/rendering-failed", "title": "Document Rendering Failed", "status": 500, "detail": "Template 'invoice-v2' contains an unclosed element at line 87", "instance": "/api/v1/documents/jobs/550e8400-e29b-41d4-a716-446655440000", "jobId": "550e8400-e29b-41d4-a716-446655440000", "jobStatus": "FAILED", "submittedAt": "2026-02-26T10:00:00Z", "completedAt": "2026-02-26T10:00:03Z", "retryable": false, "processingStage": "rendering", "correlationId": "4bf92f3577b34da6a3ce929d0e0e4736" } Note: The HTTP status is 200 (the status check succeeded). The "status": 500 inside the object indicates the equivalent synchronous failure code. This distinction is intentional: the client's request to check the job status was successful (HTTP 200); the job itself failed (Problem Details status 500). Client libraries that interpret "application/problem+json" as an error signal SHOULD inspect the HTTP status code first; a 200 response with Problem Details content indicates a successfully retrieved failure report, not a request failure. 11.2. HTTP Response: Job Timeout with Retry A report generation job exceeded the 300-second limit. The server indicates the failure is transient and suggests retrying after 60 seconds. { "type": "https://api.example.com/problems/job-timed-out", "title": "Job Processing Timed Out", "status": 504, "detail": "Job exceeded maximum processing time of 300s", "jobId": "7c9e6679-7425-40de-944b-e07fc1f90ae7", "jobStatus": "TIMED_OUT", "submittedAt": "2026-02-26T09:00:00Z", "completedAt": "2026-02-26T09:05:00Z", "retryable": true, "retryAfter": 60, "processingStage": "processing" } Ratnawat Expires 30 August 2026 [Page 31] Internet-Draft Async Job Problem Details February 2026 11.3. Kafka Message: PDF Conversion Failure A PDF generation service publishes a failure to a Kafka result topic. No HTTP headers are available; the "retryAfter" member (absent here because retryable is false) would be the sole mechanism for conveying retry guidance if present. Kafka message: Topic: com.example.pdf-job.result.v1 Key: d4735e3a-265e-16d0-8f24-2de10e933e80 Headers: content-type=application/problem+json Value: { "type": "https://api.example.com/problems/conversion-failed", "title": "PDF Conversion Failed", "status": 502, "detail": "iText HTML-to-PDF conversion failed: malformed CSS at line 15", "jobId": "d4735e3a-265e-16d0-8f24-2de10e933e80", "jobStatus": "FAILED", "submittedAt": "2026-02-26T16:30:00Z", "completedAt": "2026-02-26T16:30:02Z", "retryable": false, "processingStage": "conversion", "correlationId": "monthly-report-2026-02" } Note: The "retryAfter" member is absent because the failure is not retryable. The "status": 502 indicates the equivalent HTTP status; it is not an HTTP response code in this context. 11.4. Webhook Callback: Export Completed with Errors A data export service delivers a batch result via webhook. POST /webhooks/export-results HTTP/1.1 Host: client.example.com Content-Type: application/problem+json Signature: sig1=:BASE64SIGNATURE: Signature-Input: sig1=("content-type" "content-digest");keyid="server-key-1";created=1740567600 Ratnawat Expires 30 August 2026 [Page 32] Internet-Draft Async Job Problem Details February 2026 { "type": "https://api.example.com/problems/batch-partial", "title": "Data Export Partially Failed", "status": 207, "detail": "8 of 10 records exported successfully", "jobId": "export-batch-20260226", "jobStatus": "COMPLETED_WITH_ERRORS", "submittedAt": "2026-02-26T12:00:00Z", "completedAt": "2026-02-26T12:15:00Z", "results": [ { "itemId": "rec-007", "status": "FAILED", "detail": "Record exceeds maximum size (10MB)", "retryable": false }, { "itemId": "rec-009", "status": "FAILED", "detail": "Downstream storage timeout", "retryable": true, "processingStage": "storage" } ] } Note: Only failed items are included in "results" to reduce payload size. The webhook request itself uses HTTP Message Signatures [RFC9421] for authenticity. 11.5. Server-Sent Event: Processing Stage Update A Server-Sent Event stream delivers a failure notification. event: job-failed id: 550e8400-e29b-41d4-a716-446655440000 data: {"type":"https://api.example.com/problems/rendering-failed", data: "title":"Document Rendering Failed","status":500, data: "jobId":"550e8400","jobStatus":"FAILED", data: "submittedAt":"2026-02-26T10:00:00Z", data: "completedAt":"2026-02-26T10:00:03Z", data: "retryable":false,"processingStage":"rendering"} 11.6. Batch: Partial Failure Three certificate generation requests; two succeed, one fails. Ratnawat Expires 30 August 2026 [Page 33] Internet-Draft Async Job Problem Details February 2026 { "type": "https://api.example.com/problems/batch-partial", "title": "Batch Processing Partially Failed", "status": 207, "detail": "2 of 3 certificates generated successfully", "jobId": "batch-a1b2c3d4", "jobStatus": "COMPLETED_WITH_ERRORS", "submittedAt": "2026-02-26T14:00:00Z", "completedAt": "2026-02-26T14:00:12Z", "results": [ { "itemId": "cert-001", "status": "COMPLETED" }, { "itemId": "cert-002", "status": "FAILED", "detail": "Required field 'recipientName' missing", "retryable": false, "processingStage": "validation" }, { "itemId": "cert-003", "status": "COMPLETED" } ] } 11.7. HTTP Response: Successful Job Completion The extension members are not limited to failure reporting. A server MAY include them in a successful job status response to provide timing and identification context. GET /api/v1/documents/jobs/a1b2c3d4-5678-90ab-cdef-1234567890ab HTTP/1.1 Host: api.example.com HTTP/1.1 200 OK Content-Type: application/json Link: ; rel="https://api.example.com/rels/job-result" Ratnawat Expires 30 August 2026 [Page 34] Internet-Draft Async Job Problem Details February 2026 { "jobId": "a1b2c3d4-5678-90ab-cdef-1234567890ab", "jobStatus": "COMPLETED", "submittedAt": "2026-02-26T10:00:00Z", "completedAt": "2026-02-26T10:00:04Z", "correlationId": "invoice-batch-2026-02-26" } Note: The Content-Type is "application/json", not "application/ problem+json", because this is not an error response. The extension members defined in this document are JSON members that MAY appear in any JSON object; they are not restricted to [RFC9457] problem details objects. However, when reporting failures, "application/ problem+json" SHOULD be used per [RFC9457]. 11.8. Kafka Message: Transient Failure with Retry Guidance A report generation service publishes a transient failure to a Kafka result topic, demonstrating the transport-independence value of the "retryAfter" member. Kafka message: Topic: com.example.report-job.result.v1 Key: e5f6a7b8-9012-3456-7890-abcdef123456 Headers: content-type=application/problem+json Value: { "type": "https://api.example.com/problems/downstream-unavailable", "title": "Downstream Service Temporarily Unavailable", "status": 503, "detail": "Data warehouse connection pool exhausted; service is expected to recover within 60 seconds", "jobId": "e5f6a7b8-9012-3456-7890-abcdef123456", "jobStatus": "FAILED", "submittedAt": "2026-02-26T18:00:00Z", "completedAt": "2026-02-26T18:00:01Z", "retryable": true, "retryAfter": 60, "processingStage": "processing", "correlationId": "weekly-report-2026-w09" } Note: In an HTTP context, the server would also include a "Retry- After: 60" header. In this Kafka context, the "retryAfter" member is the only way to convey the retry interval to the consumer. 12. References Ratnawat Expires 30 August 2026 [Page 35] Internet-Draft Async Job Problem Details February 2026 12.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, DOI 10.17487/RFC3339, July 2002, . [RFC7240] Snell, J., "Prefer Header for HTTP", RFC 7240, DOI 10.17487/RFC7240, June 2014, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [RFC8259] Bray, T., Ed., "The JavaScript Object Notation (JSON) Data Interchange Format", STD 90, RFC 8259, DOI 10.17487/RFC8259, December 2017, . [RFC8288] Nottingham, M., "Web Linking", RFC 8288, DOI 10.17487/RFC8288, October 2017, . [RFC9110] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, Ed., "HTTP Semantics", STD 97, RFC 9110, DOI 10.17487/RFC9110, June 2022, . [RFC9457] Nottingham, M., Wilde, E., and S. Dalal, "Problem Details for HTTP APIs", RFC 9457, DOI 10.17487/RFC9457, July 2023, . [RFC9562] Davis, K., Peabody, B., and P. Leach, "Universally Unique IDentifiers (UUIDs)", RFC 9562, DOI 10.17487/RFC9562, May 2024, . 12.2. Informative References [RFC4918] Dusseault, L., Ed., "HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV)", RFC 4918, DOI 10.17487/RFC4918, June 2007, . Ratnawat Expires 30 August 2026 [Page 36] Internet-Draft Async Job Problem Details February 2026 [RFC6585] Nottingham, M. and R. Fielding, "Additional HTTP Status Codes", RFC 6585, DOI 10.17487/RFC6585, April 2012, . [RFC6901] Bryan, P., Ed., Zyp, K., and M. Nottingham, Ed., "JavaScript Object Notation (JSON) Pointer", RFC 6901, DOI 10.17487/RFC6901, April 2013, . [RFC8126] Cotton, M., Leiba, B., and T. Narten, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 8126, DOI 10.17487/RFC8126, June 2017, . [RFC8631] Wilde, E., "Link Relation Types for Web Services", RFC 8631, DOI 10.17487/RFC8631, July 2019, . [RFC9421] Backman, A., Ed., Richer, J., Ed., and M. Sporny, "HTTP Message Signatures", RFC 9421, DOI 10.17487/RFC9421, February 2024, . [RFC9512] Wilde, E. and R. Amsüss, "YAML Media Type", RFC 9512, DOI 10.17487/RFC9512, February 2024, . [JSON-SCHEMA] Wright, A., Andrews, H., Hutton, B., and G. Dennis, "JSON Schema: A Media Type for Describing JSON Documents", Work in Progress, Internet-Draft, draft-bhutton-json-schema-01, June 2022, . [ASYNCAPI] AsyncAPI Initiative, "AsyncAPI Specification", Version 3.0.0, February 2024, . [W3C.TRACE-CONTEXT] Rodrigues, W., Kanzhelev, S., Mace, J., and B. Banerjee, "Trace Context", W3C Recommendation, November 2021, . [W3C.SSE] Hickson, I., "Server-Sent Events", W3C Recommendation, February 2015, . Ratnawat Expires 30 August 2026 [Page 37] Internet-Draft Async Job Problem Details February 2026 [I-D.ietf-httpapi-idempotency-key-header] Dalal, S. and J. Jena, "The Idempotency-Key HTTP Header Field", Work in Progress, Internet-Draft, draft-ietf- httpapi-idempotency-key-header, . [I-D.ietf-httpapi-ratelimit-headers] Polli, R. and A. Martinez, "RateLimit header fields for HTTP", Work in Progress, Internet-Draft, draft-ietf- httpapi-ratelimit-headers, . [GOOGLE-AIP-151] Google, "AIP-151: Long-running operations", . [CLOUDEVENTS] Cloud Native Computing Foundation, "CloudEvents Specification", Version 1.0.2, . [OPENAPI] OpenAPI Initiative, "OpenAPI Specification", Version 3.1.0, . [IANA.LINK-RELATIONS] IANA, "Link Relations", . Appendix A. Comparison with Industry Patterns This appendix provides an expanded comparison of async job error reporting across platforms, illustrating the fragmentation that this specification addresses. A.1. AWS Step Functions AWS Step Functions represent failures as: Ratnawat Expires 30 August 2026 [Page 38] Internet-Draft Async Job Problem Details February 2026 { "executionArn": "arn:aws:states:...", "status": "FAILED", "error": "States.TaskFailed", "cause": "Lambda function threw an exception", "startDate": "2026-02-26T10:00:00.000Z", "stopDate": "2026-02-26T10:00:05.000Z" } Mapping to this specification: "executionArn" → "jobId", "status" → "jobStatus", "error"+"cause" → "detail", "startDate" → "submittedAt", "stopDate" → "completedAt". No retry guidance. No processing stage. Not RFC 9457. A.2. Google AIP-151 (Long-Running Operations) Google uses a Protobuf-based "Operation" resource: { "name": "operations/abc123", "done": true, "error": { "code": 3, "message": "Invalid template", "details": [] } } Mapping: "name" → "jobId", "done" → terminal state check, "error" → google.rpc.Status (not RFC 9457). No "submittedAt"/"completedAt". No retry guidance. No processing stage. A.3. Azure Long-Running Operations Azure uses a polling pattern with: { "id": "job-xyz", "status": "Failed", "error": { "code": "RenderingFailed", "message": "Template syntax error at line 42" }, "startTime": "2026-02-26T10:00:00Z", "endTime": "2026-02-26T10:00:05Z" } Ratnawat Expires 30 August 2026 [Page 39] Internet-Draft Async Job Problem Details February 2026 Mapping: "id" → "jobId", "status" → "jobStatus", "error" → "detail", "startTime" → "submittedAt", "endTime" → "completedAt". "retryAfter" sometimes in HTTP header only. No processing stage. Not RFC 9457. A.4. Stripe Stripe uses: { "id": "pi_abc123", "status": "failed", "last_payment_error": { "type": "card_error", "message": "Your card was declined" }, "created": 1740567600 } Mapping: "id" → "jobId", "status" → "jobStatus", "last_payment_error" → "detail", "created" → "submittedAt". No completion time. No retry guidance. No processing stage. Unix timestamp, not RFC 3339. Not RFC 9457. Appendix B. Design Decisions B.1. Why Extension Members, Not a New Media Type? Defining a new media type (e.g., "application/async-job- problem+json") was considered and rejected because: * [RFC9457] already provides a well-known envelope with Content-Type negotiation ("application/problem+json"). * Extension members integrate seamlessly with existing [RFC9457] infrastructure (exception mappers, middleware, logging). * Clients that don't understand the extensions can still process the standard Problem Details members ("type", "title", "status", "detail"). B.2. Why JSON Members for Retry Instead of Only Headers? The "Retry-After" HTTP header field [RFC9110] already exists. The "retryAfter" JSON member was added because: * Message broker payloads have no HTTP headers. Ratnawat Expires 30 August 2026 [Page 40] Internet-Draft Async Job Problem Details February 2026 * Webhook callbacks use HTTP, but the "Retry-After" header applies to the webhook delivery, not the job retry. * The JSON member travels with the Problem Details object across all transports without transformation. B.3. Why an Explicit "retryable" Boolean? The presence of "retryAfter" alone could imply retryability. A separate boolean was added because: * Explicitness aids machine readability — no inference needed. * Some failures are retryable without a specific delay recommendation ("retryable": true without "retryAfter"). * Some APIs may want to convey "retryAfter" for general backoff purposes even on non-retryable errors (Section 3.7 allows this in exceptional cases); the explicit boolean prevents ambiguity about whether a retry is advised. B.4. Why "processingStage" Is a String, Not an Enum? Processing pipelines vary too widely across APIs to define a fixed enum. A free-form string with recommended values (Table 3) provides flexibility while encouraging consistency through convention. B.5. Why "correlationId" References W3C Trace Context? The W3C Trace Context specification [W3C.TRACE-CONTEXT] is the de facto standard for distributed tracing headers. Referencing it provides interoperability with OpenTelemetry, Jaeger, Zipkin, and other tracing systems without requiring this specification to define its own tracing format. Appendix C. AsyncAPI Integration APIs that use AsyncAPI [ASYNCAPI] to document message-driven interfaces can reference this specification in their schemas. Example AsyncAPI schema fragment: Ratnawat Expires 30 August 2026 [Page 41] Internet-Draft Async Job Problem Details February 2026 components: schemas: PdfJobFailedEvent: description: > PDF generation failure report per draft-ratnawat-httpapi-async-problem-details. allOf: - $ref: '#/components/schemas/ProblemDetails' - type: object required: [jobId, jobStatus, submittedAt] properties: jobId: type: string format: uuid jobStatus: type: string enum: [FAILED, TIMED_OUT, CANCELLED] submittedAt: type: string format: date-time completedAt: type: string format: date-time retryable: type: boolean retryAfter: type: integer minimum: 0 processingStage: type: string correlationId: type: string Appendix D. Relationship to CloudEvents CloudEvents [CLOUDEVENTS] is a CNCF specification for describing events in a standard way. It defines context attributes including "id", "source", "type", and "subject" that overlap conceptually with some members defined here: * CloudEvents "id" ≈ "jobId" (event/job identifier) * CloudEvents "source" ≈ "instance" (origin of the event) * CloudEvents "time" ≈ "completedAt" (timestamp) The specifications are complementary, not competing: Ratnawat Expires 30 August 2026 [Page 42] Internet-Draft Async Job Problem Details February 2026 * CloudEvents defines the event envelope (metadata about the event itself: what happened, where, when). * This specification defines the event payload content (structured failure context: what failed, at what stage, whether to retry). When using CloudEvents as the transport envelope for async job failure notifications, the problem details object (with the extension members defined here) SHOULD be the CloudEvents "data" payload. The CloudEvents "datacontenttype" SHOULD be "application/problem+json". Example CloudEvents + Problem Details: { "specversion": "1.0", "id": "evt-550e8400", "source": "/api/v1/documents/generate", "type": "com.example.job.failed", "datacontenttype": "application/problem+json", "data": { "type": "https://api.example.com/problems/rendering-failed", "title": "Document Rendering Failed", "status": 500, "jobId": "550e8400-e29b-41d4-a716-446655440000", "jobStatus": "FAILED", "processingStage": "rendering", "retryable": false } } Appendix E. OpenAPI Integration APIs documented with the OpenAPI Specification [OPENAPI] can reference the JSON Schema defined in Section 8 to describe async job failure responses. Example OpenAPI schema fragment (YAML): Ratnawat Expires 30 August 2026 [Page 43] Internet-Draft Async Job Problem Details February 2026 components: schemas: AsyncJobProblemDetails: description: > RFC 9457 Problem Details extended with async job context per draft-ratnawat-httpapi-async-problem-details. allOf: - $ref: '#/components/schemas/ProblemDetails' - $ref: 'https://example.com/schemas/async-job-problem-details' paths: /api/v1/jobs/{jobId}: get: summary: Check async job status responses: '200': description: > Job status. Content-Type is application/json for non-terminal and COMPLETED states, and application/problem+json for failed states. content: application/problem+json: schema: $ref: '#/components/schemas/AsyncJobProblemDetails' Appendix F. Interoperability Considerations F.1. Handling Partial Member Sets A client that receives a problem details object with some but not all of the extension members defined here SHOULD process the members that are present and ignore the absence of others. For example, a response with "jobId" and "jobStatus" but without "submittedAt" is valid and useful. F.2. Handling Unknown Extension Members Per [RFC9457] Section 3.2, consumers MUST ignore extension members they do not understand. This ensures forward compatibility: future revisions of this document may define additional members without breaking existing clients. F.3. Handling Unrecognized Job Status Values As specified in Section 4.3, clients that encounter an unrecognized "jobStatus" value SHOULD treat it as non-terminal (equivalent to "PROCESSING"). This provides graceful degradation when a server uses custom status values. Ratnawat Expires 30 August 2026 [Page 44] Internet-Draft Async Job Problem Details February 2026 F.4. XML Representation [RFC9457] defines both JSON and XML representations for problem details. This document defines extension members for the JSON representation only. Servers that need to produce XML problem details objects may map the extension members to XML elements using the conventions in [RFC9457] Section 4, but this document does not define a normative XML mapping. F.5. Internationalization of "detail" The "detail" member is human-readable text. Per [RFC9457] Section 3.1.4, the language of "detail" and "title" is determined by Content-Language negotiation. Servers that serve multilingual audiences SHOULD respect the "Accept- Language" header from the originating request when generating the "detail" text for asynchronous failure reports. When the failure report is delivered via a non-HTTP transport where language negotiation is not available, servers SHOULD use English (en) as the default language and MAY include a "Content-Language" metadata field if the transport supports it. Acknowledgements The extension members defined in this document were informed by real- world experience building production asynchronous document generation services that process requests via Apache Kafka. The challenge of reporting structured errors across synchronous HTTP and asynchronous messaging boundaries motivated this work. The author thanks the IETF HTTP API Working Group (httpapi) for their ongoing work on [RFC9457], [I-D.ietf-httpapi-idempotency-key-header], and [I-D.ietf-httpapi-ratelimit-headers] -- which collectively address the mechanics of async API patterns and provided the foundation for this complementary specification. Google's AIP-151 (Long-Running Operations) [GOOGLE-AIP-151] influenced the status value design, though this specification chose RFC 9457 as the envelope rather than gRPC/Protobuf. The CloudEvents specification [CLOUDEVENTS] and the OpenAPI Specification [OPENAPI] informed the transport-independence design principle and the integration guidance in the appendices. Author's Address Ratnawat Expires 30 August 2026 [Page 45] Internet-Draft Async Job Problem Details February 2026 Gaurav Ratnawat IMTF Email: gaurav.ratnawat@imtf.com Ratnawat Expires 30 August 2026 [Page 46]