Share via

Can a single ADF pipeline trigger multiple Azure ML pipelines in parallel when their inputs and outputs are independent?

Inkey IT User 1 20 Reputation points
2026-06-04T19:08:41.69+00:00

I have an Azure Data Factory pipeline that orchestrates a multi-stage classification workflow. Right now every activity is chained one after another with success dependencies, so the whole pipeline runs strictly sequentially and takes a long time end-to-end. I'd like to run the parts that are independent of each other in parallel to cut the total runtime.

Current setup

Everything lives in one ADF pipeline and is fully chained left-to-right. Simplified:

Script (backup)  
→ [Segment A: OpCode]   Copy data → Data Flow (flag) → Azure ML pipeline (classify) → Data Flow (write back)    
→ [Segment B: Part]      Copy data → Data Flow (flag) → Azure ML pipeline (classify) → Data Flow (write back)   
→ [Segment C: Customer]  Copy data → Data Flow (flag) → Azure ML pipeline (classify) → Data Flow (write back) 

  • Each Azure ML pipeline is invoked from an Execute ML Pipeline activity.
  • The three segments (OpCode, Part, Customer) are functionally separate classification jobs.
  • The only ordering that genuinely matters is within a segment (load → flag → classify → write back). That internal order must stay sequential.User's image

Why I believe the three segments are independent

  1. Different backend tables. Each segment reads from and writes to its own, separate SQL tables in the backend. There is no shared table between segments, so no branch reads or writes a table that another branch touches.
  2. Separate parquet outputs. Each Azure ML pipeline produces its own separate parquet file as output. No segment consumes another segment's parquet/output — there is no data hand-off between the three branches.

So across segments there is no data dependency at all; the chaining is just how the pipeline happens to be wired today.

What I want to achieve

Run the three segments concurrently instead of one after another. The pipeline should simply complete once all three segments have finished.

Questions

  1. Can a single ADF pipeline trigger and run multiple Azure ML pipelines concurrently? Are there limits I should be aware of?
  2. Is the correct approach simply to fan out — connect the initial backup step to the first activity of each segment and not chain the segments to each other — so all three run concurrently and the pipeline completes once all three finish?
  3. On the Azure ML side: if all three pipelines target the same compute cluster, will they actually run at the same time, or will they queue? How should I size the cluster (max nodes / quota) to get true parallelism would i need to increase my current capacity of cluster to accommodate this change?
  4. Any gotchas with the Execute ML Pipeline activity's "Wait on completion" setting, or with concurrent updates to a shared status/tracking table, that I should watch out for when running these branches in parallel?

Environment

  • Azure Data Factory
  • Azure Machine Learning (pipelines triggered from ADF via Execute Pipeline)
  • Mapping Data Flows
  • Sources / sinks: Azure SQL + Parquet files on Blob Storage

Thanks in advance — any guidance on the recommended pattern (and the compute/concurrency considerations) would be much appreciated.

Azure Data Factory
Azure Data Factory

An Azure service for ingesting, preparing, and transforming data at scale.


1 answer

Sort by: Most helpful
  1. AI answer

    2026-06-04T19:09:00.9+00:00
    1. Yes. A single Azure Data Factory pipeline can run multiple activities in parallel, including multiple Execute Azure ML Pipeline activities, as long as there is no dependency configured between them. When multiple activities in a pipeline have no dependency on each other, they can run in parallel.
    2. The correct pattern in ADF is to fan out after the backup step:
      • Keep each segment’s internal sequence (Copy → Data Flow → Execute ML Pipeline → Data Flow) chained with success dependencies.
      • Connect the backup activity’s success output to the first activity of each of the three segments.
      • Do not chain Segment A → Segment B → Segment C. With no cross‑segment dependencies, ADF will schedule the three branches in parallel and the pipeline run will complete only after all branches finish.
      This aligns with the documented behavior that when multiple activities exist in a pipeline and subsequent activities are not dependent on previous activities, they may run in parallel.
    3. On the Azure ML side, parallelism is governed by the compute configuration and job type:
      • Each Azure ML pipeline triggered from ADF is an independent job.
      • If all three ML pipelines target the same compute cluster, Azure ML will schedule their steps onto that cluster according to available nodes and cores.
      • To achieve true parallelism, the cluster must have sufficient capacity (node count and per‑node resources) so that the steps from different pipelines can be placed concurrently instead of queueing.
      • Azure ML supports parallel jobs and multinode compute; parallelization works by splitting work into mini‑batches and dispatching them across nodes. If the cluster is undersized relative to the total parallel workload (three pipelines × their internal parallel steps), some steps will wait for resources.
      • Therefore, to move from sequential to concurrent execution, increase the cluster’s maximum nodes or quota so that the combined workload of the three pipelines fits without contention.
    4. Additional considerations:
      • In ADF, if Execute ML Pipeline activities are configured with Wait on completion, each branch will wait for its ML pipeline to finish before moving to its next activity, but branches remain independent and can still run in parallel.
      • When using Mapping Data Flows in parallel branches, be aware that each data flow activity spins up its own Spark cluster when executed in parallel, which increases concurrent compute usage and cost.
      • For any shared status or tracking tables, ensure that updates from the three branches are either to distinct rows or are coordinated to avoid conflicts. ADF itself does not serialize access; concurrency control must be handled in the database logic.

    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.