Share via

Request for Continued Access to GPT-5.3 Chat for Production RAG Workloads

Kim Chan 0 Reputation points
2026-06-12T07:59:19.33+00:00

Dear Azure Team,

I am writing on behalf of DocPro Limited, which operates Ask.Legal and DocLegal.ai, legal technology platforms using retrieval-augmented generation against fixed legal datasets.

We understand that GPT-5.3 Chat / gpt-5.3-chat-latest is being deprecated, with migration to GPT-5.5 recommended. We would like to request continued access to GPT-5.3 Chat, or an equivalent stable model configuration, for our production RAG workloads.

Our concern is not simply model preference. For our use case, GPT-5.3 Chat appears to perform materially better than GPT-5.4 and GPT-5.5. Our platform relies on source-grounded legal analysis from curated datasets, where the priority is not maximum reasoning depth, but reliable extraction, disciplined application of retrieved sources, concise legal analysis, and predictable token usage.

In our testing, the newer models tend to “overthink” fixed-dataset RAG tasks. This creates several practical issues:

  1. They produce longer and more expensive outputs for substantially the same legal query.
  2. They are more likely to elaborate beyond the retrieved materials.
  3. They sometimes introduce additional reasoning where the better answer is a narrower source-grounded response.
  4. They increase token consumption without a corresponding increase in answer quality for our workflow.
  5. They make it harder for us to maintain stable cost, latency, and output behaviour for a legal product.

For legal RAG, a more powerful general reasoning model is not always the better model. In many cases, the optimal model is one that follows the retrieved context closely, avoids unnecessary speculation, answers within a defined scope, and maintains citation discipline. GPT-5.3 Chat currently appears to be a better fit for that profile.

We would therefore like to ask whether Azure / OpenAI can support one of the following options:

  1. Continued API access to GPT-5.3 Chat / gpt-5.3-chat-latest for our organisation beyond the scheduled shutdown date (end of June) at least until the OpenAI shutdown date (10 Aug 2026);
  2. A dedicated-capacity or legacy-access arrangement for this model;
  3. A stable snapshot or model alias optimised for non-agentic, low-reasoning, fixed-dataset RAG;
  4. A recommended GPT-5.4 or GPT-5.5 configuration that closely replicates GPT-5.3 Chat’s behaviour for legal RAG; or
  5. Pricing or migration support if OpenAI requires us to move to a more expensive model that materially increases cost without improving our production outcome.

Our request is for continuity and production suitability. We are not asking Azure / OpenAI to maintain older models indefinitely for general use. Rather, we are asking whether Azure / OpenAI can provide a stable option for customers whose products depend on deterministic, source-grounded RAG over fixed datasets, where newer reasoning-heavy models may not be the best fit.

We would appreciate the opportunity to discuss this with your product, solutions, or enterprise support team.

Kind regards,

Kim Chan Founder & CEO, DocPro Limited | Ask.Legal / DocLegal.ai 

URL: [Ask.Legal] [DocLegal.ai]

Azure OpenAI in Foundry Models
0 comments No comments

1 answer

Sort by: Most helpful
  1. AI answer

    2026-06-12T07:59:42.7433333+00:00

    Azure OpenAI retirement and availability policies are defined centrally and cannot be overridden per customer through forums or general support; continued access, legacy access, or dedicated-capacity exceptions for a specific retired model are not documented as supported options.

    For GPT‑5.x models, availability and access are governed by the Azure AI Foundry model catalog and the “models sold directly by Azure” policies. GPT‑5 series models (including gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, and gpt-5.1 family) are listed as generally available with no restricted access in the current reasoning models documentation, and new GPT‑5.x models (such as gpt-5.4, gpt-5.5) are exposed through the same mechanisms.

    Where a model is gated or limited-access, the only supported path is to submit an access request via the Azure OpenAI access workflow (for example, via the limited access model application referenced in the documentation). Once a model is fully GA (as with gpt-5, gpt-5.1, gpt-5-pro, gpt-5-codex, etc.), access is no longer restricted and no special request is needed.

    The documentation does not describe any mechanism to:

    • Extend the life of a specific retiring GPT‑5.x chat model for a single tenant,
    • Obtain a private “legacy” or “snapshot” deployment of a retired GPT‑5.x chat model,
    • Negotiate special pricing or migration credits for moving from one GPT‑5.x model to another.

    For RAG workloads that need tighter control over reasoning depth, verbosity, and cost, the supported approach with GPT‑5 reasoning models is to tune the exposed parameters rather than rely on an older model version. The GPT‑5 reasoning models expose:

    • reasoning_effort with options none, minimal, low, medium, high, xhigh (availability varies by model):
      • gpt-5.2, gpt-5.1, gpt-5.1-codex, gpt-5.1-codex-max, and gpt-5.1-codex-mini support none to disable additional reasoning and increase speed.
      • gpt-5-pro only supports high.
      • gpt-5.1-codex-max adds xhigh.
    • verbosity with options low, medium, high to control how concise responses are.

    All GPT‑5 reasoning models in the table support:

    • Structured outputs,
    • Tools/functions (including parallel tool calls for most models),
    • System/developer messages,
    • Reasoning summaries,
    • Streaming (for most models),
    • Large context windows (up to 1,050,000 tokens for gpt-5.4, gpt-5.4-pro, gpt-5.5).

    To approximate a “low‑reasoning, concise, source‑grounded RAG” profile on newer GPT‑5.x models, the documented levers are:

    1. Choose a GPT‑5 model that supports reasoning_effort='none' or low (for example, gpt-5.2 or gpt-5.1 family) rather than a Pro or Codex‑Max variant that enforces high reasoning.
    2. Set reasoning_effort to none or minimal/low (where supported) to reduce over‑elaboration and cost.
    3. Set verbosity='low' to keep outputs concise.
    4. Use system/developer messages to enforce strict grounding and citation discipline (for example, instructing the model to answer only from provided context and to avoid speculation).

    Any request for exception handling, extended access to a retiring model, or commercial terms (pricing/migration support) must be handled directly through Azure support or the account team; these options are not described or guaranteed in the public technical documentation.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.