Plan Copilot Studio agent deployments for throughput and rate limits

Production-ready Copilot Studio agents need more than licensing and total message-volume planning. They also need throughput planning. Throughput planning covers how quickly traffic arrives, which platform services the solution calls, and which limits apply across the full solution.

This article helps solution architects, makers, and Power Platform admins prepare high-volume Copilot Studio deployments for production traffic, user acceptance testing (UAT), load testing, business-to-customer (B2C) scenarios, and autonomous workloads.

Rate provisioning is separate from license provisioning

Production Copilot Studio planning has two related but separate workstreams:

License provisioning covers commercial entitlement and consumption, such as licenses, credits, prepaid capacity, message packs, and pay-as-you-go billing.
Rate provisioning covers how quickly traffic can be processed before throttling or service-protection controls apply.

Note

Microsoft uses the term quotas for Copilot Studio rate limits. In broader industry vocabulary, this planning activity is often called rate provisioning. Review the published limits, estimate peak request rates, and plan before production traffic arrives.

Pay-as-you-go can increase available limits compared with lower-capacity configurations, but throughput isn't infinite. Check the current Copilot Studio limits, Power Platform request allocations, Power Automate limits, Dataverse service protection limits, connector throttling rules, and downstream API limits.

What happens when throttling occurs?

Throttling is a service-protection behavior. It protects shared services from traffic patterns that exceed published limits, burst controls, or service capacity. The exact symptom depends on which service is throttled.

When a limit is reached, the consequence is more than a planning issue. Requests can be throttled, delayed, blocked, or rejected. In user-facing chats, this behavior can appear as a temporary service interruption. For example, the user might be unable to send the next message, receive an agent-unavailable or usage-limit message, or experience a failed step because a flow, connector, Dataverse call, AI service, or downstream API reached its limit.

Learn about Copilot Studio-specific symptoms and error messages in Resolve usage limit errors in agents.

How rate limits are measured

Rate limits measure how much traffic a service can accept during a specific time window. Think about these windows granularly: per minute, per five minutes, per 10 minutes, per hour, per day, per week, and per month. Monthly or weekly volume helps estimate total demand, but shorter windows matter for rate provisioning because throttling often results from concentrated traffic.

For example, a B2C company might receive most of its agent traffic during one focused campaign hour. Its weekly average might look low, but that single hour can still create enough throughput pressure to cause throttling or service interruptions. A design that looks safe at the weekly or monthly level can still exceed limits during a one-hour peak.

Understand the scope of limits

Limits don't only apply at the individual agent level. Depending on the service, they can apply at the environment level, tool level, API level, connector level, channel level, or downstream service level.

For example, Copilot Studio messages-to-agent limits are scoped per Dataverse environment. When you estimate traffic, include all sources that send messages to agents in that environment, including user-facing channels, integrations, autonomous workloads, and Azure Bot Framework skills. Check the current values and scope in Copilot Studio quotas and limits.

Decide whether rate provisioning applies to your agent

Not every agent needs detailed rate provisioning work. A simple internal FAQ agent with a small audience, predictable usage, and few or no downstream calls is unlikely to hit rate limits. Rate provisioning becomes important when an agent might exceed requests-per-minute or requests-per-hour limits, even if its monthly volume looks modest.

Think about expected traffic early in the project, alongside solution design. Before user acceptance testing (UAT) and load testing begin, the team should be confident that the agent design, environment, connected services, and downstream systems can support the expected throughput profile.

This guidance matters most for larger, more intensive enterprise-grade agents where traffic can arrive in bursts, many users or events can invoke the agent at the same time, or each interaction depends on multiple platform services. It can also apply to smaller agents with concentrated usage patterns, such as a short launch window, a department-wide event, a scheduled process, or a workflow that creates many requests in a few minutes.

B2C and autonomous agents require early rate provisioning

Customer-facing B2C agents can receive traffic from campaigns, public websites, customer portals, incident communications, product launches, or seasonal demand. Autonomous agents can generate high-frequency traffic from schedules, events, background processes, or when they call multiple tools and workflows.

Tip

Treat B2C and autonomous use cases as first-class rate provisioning scenarios. They can generate burst traffic, multiple simultaneous requests, and high-frequency background activity faster than many employee-facing chat experiences.

Use peak windows, not only monthly totals

Ask whether the agent can create concentrated requests in a minute or hour. A smaller scenario can still need rate provisioning if a load test, campaign, outage response, or automated trigger pushes too many messages, generative AI calls, workflow actions, connector calls, or Dataverse requests through the environment in a short window.

Monthly volume is useful for estimating total demand, but it isn't enough for rate provisioning. Convert expected usage into smaller time windows so you can compare the design with current requests per minute (RPM), requests per hour (RPH), burst, and daily limits from the linked pages.

Build both an average traffic profile and a peak traffic profile. For example, if most traffic happens every day between 5 PM and 6 PM, the hourly peak should reflect that concentration. The daily estimate doesn't need to be 24 times the peak hour if the traffic is concentrated in one window.

When else can throttling happen?

Throttling can also happen when:

A large employee population uses the agent during a predictable peak window, such as a department-wide event or training.
A marketing campaign, outage, launch, or scheduled business event creates a short traffic spike.
Power Automate flows include loops, retries, pagination, or child flows that amplify request volume.
Reporting, auditing, telemetry export, or transcript capture runs synchronously in the user turn path.
Multiple agents or workloads share the same environment, identity, connector, or downstream API capacity.
Load tests ramp faster than the production architecture or support process was prepared to handle.

Where to look up relevant rate limits

Copilot Studio has its own limits, and the agent's runtime path might include other services with their own limits. Review all relevant limits for the services your agent uses.

Copilot Studio limits

Rate provisioning area	What to look up	Where to check current values	How to use it
Messages to an agent	Current RPM/RPH limit and scope for messages sent to the agent.	Copilot Studio quotas and limits	Compare expected messages per minute and per hour for the target Dataverse environment.
Generative AI messages	Current limit for generative orchestration, agent actions, AI tools, agent workflow actions, and generative answers.	Generative AI messages to an agent	Model AI-heavy and autonomous scenarios against the current published limits.
Autonomous trigger nodes	Current limits that apply when an autonomous agent is triggered by events, schedules, or background processes.	Copilot Studio quotas and limits	Model event-driven and scheduled workloads separately from interactive chat traffic.
Copilot Studio subscription request limits	Current Power Platform request limits that apply to Copilot Studio usage.	Copilot Studio subscription limits	Use these values alongside rate-limit planning for flows, Dataverse, and connected services.

Other platform limits to consider

The lowest limit in the runtime path determines the user experience. A Copilot Studio agent can be within its own limits while a flow, connector, Dataverse call, language service, or external API is throttled.

Note

Other platform limits might affect your agent if it uses other components in the agent request path. Take these limits into consideration as well, including Power Platform, Power Automate, Dataverse, connectors, language services, and downstream systems.

Runtime area	What to look at	Rate provisioning questions	Where to check current limits
Power Platform request plane	Requests across Power Automate, Copilot Studio workflow calls, Dataverse usage, Power Apps, and Dynamics 365.	Which user, connection, application user, or service principal generates the requests? Are request allocations sufficient for the expected daily and peak workload?	Requests limits and allocations
Power Automate flows	Triggers, actions, loops, child flows, HTTP actions, connector actions, retries, pagination, and concurrency.	How many actions are created per agent turn? Are burst, concurrency, trigger, and connector limits in scope?	Understand platform limits and avoid throttling Limits of automated, scheduled, and instant flows
Dataverse	CRUD operations, plug-ins, workflows, assign/share operations, connector calls, and system operations required to complete transactions.	Which users, application users, or service principals generate Dataverse calls? Are service protection limits or retry behavior likely to apply?	Service protection API limits Dataverse API limits overview
Connectors	Standard connectors, premium connectors, custom connectors, connector-specific throttling, and downstream APIs.	Which connector is the bottleneck? Does the downstream service enforce its own rate limit?	API throughput limits on connectors Power Automate connector reference
Conversational language understanding (CLU) and AI services	CLU calls, AI prompts, search and summarize operations, model-backed tools, payload size, and service-specific limits.	Does each user turn call a language or AI service? Are those calls repeated during retries or orchestration?	Conversational language understanding limits Copilot Studio quotas and limits
External APIs and line-of-business systems	Vendor APIs, internal APIs, databases, middleware, gateways, and custom services.	What limit does the downstream owner enforce? Is there a retry contract, queue, or backpressure strategy?	Use the downstream service owner's current limits, service level agreement (SLA), and support process.

Design to reduce throughput pressure

Don't make rate increases your first design response. First, review the agent design and optimize efficiency. If the agent needs to look something up, keep external calls intentional, optimize API calls, and avoid unnecessary request volume across Copilot Studio, Power Automate, Dataverse, connectors, and downstream systems.

After the design is efficient, control throughput so traffic reaches the platform in a predictable way:

For environment-level limits, consider splitting agents across multiple environments if that approach matches your operational design. This approach can help keep high-volume agents, business units, regions, or autonomous workloads from competing with unrelated workloads for the same environment-scoped limits.
For autonomous agents, use queues, batching, trigger filters, scheduled processing, retry controls, and monitoring so background work doesn't arrive as an uncontrolled burst.
Move scheduled, reporting, audit export, and telemetry work outside the interactive chat path when possible.
Review load-test results and production telemetry to identify where requests concentrate, then tune the agent, flows, connectors, and downstream APIs before requesting higher limits.

Autonomous agents are uniquely positioned to maximize the use of their allocated capacity with robust predictability and observability by queuing requests and controlling their trigger rates.

What to do if default rate limits aren't enough

If the peak-traffic estimate shows that the agent or any connected service might exceed current published limits, start the rate provisioning support process before UAT, load testing, or production launch. Don't wait for the first production failure.

Note

Copilot Studio is an SaaS service with rate limits in place to protect the service for all customers. With proper justification, engineering can enable custom limits for approved scenarios.

Open a support request

Administrators can request support from the Power Platform admin center.

Open the ticket early and include the best available estimates. The more detail you provide, the easier the review process will be. Update the request as the design is refined or load testing provides observed data.

Core information to include

Information	Description
Environment ID	The Dataverse environment where the agent runs.
Agent name or identifier	The agent affected by the request.
Business impact	Critical impact if the default limits aren't enough.
Known information	What is known about the scenario, channel, launch context, business criticality, and whether it's B2C, autonomous, employee-facing, or internal-only.
Agent snapshot	A snapshot or export that helps reviewers understand the agent configuration, design, connected services, and relevant settings.
Agent design	High-level description of topics, generative AI usage, knowledge sources, actions, flows, connectors, Dataverse calls, and external APIs used by the agent.
Average traffic estimate	Expected average traffic by hour, day, week, or month.
Peak traffic estimate	Expected peak messages, sessions, generative AI calls, flow actions, connector calls, Dataverse requests, and external API calls where known.

More details that can help

Information	Description
Date range	Start and end date for the requested increase. Separate load-test user acceptance test, and production date ranges if they differ.
Peak pattern	Peak windows, time zones, expected burst drivers, and whether traffic is concentrated in a short daily window.
Session profile	Concurrent sessions, average and peak session length, messages per session, and questions per session.
Typical session examples	Representative user paths, typical steps performed, tools used, and sample session IDs where available.
Runtime path	Flows, actions, AI prompts, knowledge calls, Dataverse requests, connectors, and APIs per interaction.
Feature-level peaks	Peak volume per agent, feature, user, environment, connector, minute, hour, and day where known.
Products needing review	Whether the request involves Copilot Studio, Power Platform request allocations, Power Automate, connectors, Dataverse, CLU/AI services, or external APIs.
Evidence	Sample session IDs, errors, correlation IDs, logs, load-test results, or production observations.
Mitigations	Summarize what you already tried to reduce throughput pressure. Reference the Design to reduce throughput pressure guidance, including design review, optimized external calls, environment segmentation, batching, queueing, trigger filtering, scheduling, workload distribution, and other optimizations already in place.

Important

A throughput increase isn't guaranteed. Microsoft Support reviews requests based on the scenario, environment, requested date range, expected traffic, eligibility, current limits, and service capacity.

Feedback

Was this page helpful?

Last updated on 2026-06-11

Plan Copilot Studio agent deployments for throughput and rate limits

Rate provisioning is separate from license provisioning

What happens when throttling occurs?

How rate limits are measured

Understand the scope of limits

Decide whether rate provisioning applies to your agent

B2C and autonomous agents require early rate provisioning

Use peak windows, not only monthly totals

When else can throttling happen?

Where to look up relevant rate limits

Copilot Studio limits

Other platform limits to consider

Design to reduce throughput pressure

What to do if default rate limits aren't enough

Open a support request

Core information to include

More details that can help

Related information

Feedback

Additional resources