An Azure service that provides access to OpenAI’s GPT-3 models with enterprise capabilities.
Azure OpenAI retirement and availability policies are defined centrally and cannot be overridden per customer through forums or general support; continued access, legacy access, or dedicated-capacity exceptions for a specific retired model are not documented as supported options.
For GPT‑5.x models, availability and access are governed by the Azure AI Foundry model catalog and the “models sold directly by Azure” policies. GPT‑5 series models (including gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-pro, and gpt-5.1 family) are listed as generally available with no restricted access in the current reasoning models documentation, and new GPT‑5.x models (such as gpt-5.4, gpt-5.5) are exposed through the same mechanisms.
Where a model is gated or limited-access, the only supported path is to submit an access request via the Azure OpenAI access workflow (for example, via the limited access model application referenced in the documentation). Once a model is fully GA (as with gpt-5, gpt-5.1, gpt-5-pro, gpt-5-codex, etc.), access is no longer restricted and no special request is needed.
The documentation does not describe any mechanism to:
- Extend the life of a specific retiring GPT‑5.x chat model for a single tenant,
- Obtain a private “legacy” or “snapshot” deployment of a retired GPT‑5.x chat model,
- Negotiate special pricing or migration credits for moving from one GPT‑5.x model to another.
For RAG workloads that need tighter control over reasoning depth, verbosity, and cost, the supported approach with GPT‑5 reasoning models is to tune the exposed parameters rather than rely on an older model version. The GPT‑5 reasoning models expose:
-
reasoning_effortwith optionsnone,minimal,low,medium,high,xhigh(availability varies by model):-
gpt-5.2,gpt-5.1,gpt-5.1-codex,gpt-5.1-codex-max, andgpt-5.1-codex-minisupportnoneto disable additional reasoning and increase speed. -
gpt-5-proonly supportshigh. -
gpt-5.1-codex-maxaddsxhigh.
-
-
verbositywith optionslow,medium,highto control how concise responses are.
All GPT‑5 reasoning models in the table support:
- Structured outputs,
- Tools/functions (including parallel tool calls for most models),
- System/developer messages,
- Reasoning summaries,
- Streaming (for most models),
- Large context windows (up to 1,050,000 tokens for
gpt-5.4,gpt-5.4-pro,gpt-5.5).
To approximate a “low‑reasoning, concise, source‑grounded RAG” profile on newer GPT‑5.x models, the documented levers are:
- Choose a GPT‑5 model that supports
reasoning_effort='none'orlow(for example,gpt-5.2orgpt-5.1family) rather than a Pro or Codex‑Max variant that enforces high reasoning. - Set
reasoning_efforttononeorminimal/low(where supported) to reduce over‑elaboration and cost. - Set
verbosity='low'to keep outputs concise. - Use system/developer messages to enforce strict grounding and citation discipline (for example, instructing the model to answer only from provided context and to avoid speculation).
Any request for exception handling, extended access to a retiring model, or commercial terms (pricing/migration support) must be handled directly through Azure support or the account team; these options are not described or guaranteed in the public technical documentation.
References: