An Azure service for ingesting, preparing, and transforming data at scale.
A mapping data flow with schema drift disabled does not change a column’s data type at runtime. With drift disabled, the data flow uses the defined projection (early binding) and data types remain as designed. Type changes at runtime are associated with schema drift and late binding, where drifted columns can be auto‑typed or inferred.
Given the behavior described, the inconsistency is most likely introduced before or at the JSON staging layer (Salesforce → JSON), not by a published mapping data flow with schema drift disabled.
For the follow‑up points:
- Can a published Mapping Data Flow change a column’s datatype at runtime when schema drift is disabled?
No. When schema drift is disabled, the source projection is taken from the dataset schema and used as the fixed contract for the flow. The documentation notes that:
- Source projection is created from the dataset schema definition.
- Schema drift is explicitly defined as reading columns that are not in that projection and treating them as drifted columns.
Without schema drift, the data flow does not dynamically re‑infer types based on incoming values.
- If yes, how to prevent it?
Since type changes are tied to schema drift and inference:
- Ensure Allow schema drift is unchecked on both source and sink transformations.
- Ensure Infer drifted column types is not used (this only applies when drift is enabled).
- Keep a well‑defined projection and avoid late‑binding patterns for critical fields.
- Where to enforce schema in a reusable/dynamic ingestion framework?
The context shows several enforcement points and their characteristics:
- Source projection / dataset schema
- When a dataset is selected, the service “automatically take[s] the schema from the dataset and create[s] a projection from that dataset schema definition.”
- This is the primary early‑binding contract for mapping data flows.
- For JSON, the “Use projected schema” optimization explicitly does not work with schema drift, reinforcing that projection is the enforcement point when drift is off.
- Within mapping data flow transformations
- The Cast transformation is designed to “modify the data types of individual columns in a data flow” and to check for casting errors.
- The Stringify transformation converts complex types to string when needed.
Given this, the Microsoft‑aligned pattern for a reusable/dynamic framework is:
- Define and maintain a strong schema in the dataset/source projection for known fields.
- Disable schema drift for those sources/sinks where strict typing is required.
- Use a Cast transformation early in the flow to normalize types for critical columns if there is any ambiguity from upstream systems.
Enforcing at extraction (Salesforce → JSON) is outside the provided context. Within Azure Data Factory, enforcement is primarily at:
- Dataset/source projection, and
- Mapping data flow transformations (Cast/Stringify) over that projection.
- Recommended pattern for postal codes, identifiers, account numbers, business keys
For fields that must always be treated as strings (postal codes, identifiers, account numbers, business keys), the recommended pattern within ADF mapping data flows is:
- In the dataset and source projection:
- Define these columns explicitly as string in the dataset schema so that the projection is string‑typed.
- Keep Allow schema drift disabled for these sources to avoid late‑binding behavior.
- In the mapping data flow:
- Add a Cast transformation near the start of the flow and explicitly cast these columns to
string, even if they already appear as string in the projection. This provides a defensive normalization layer against any upstream inconsistencies. - If any of these fields are complex (arrays, maps, structs), use Stringify to convert them into a single string representation.
- Add a Cast transformation near the start of the flow and explicitly cast these columns to
- In sinks:
- Define sink schemas so that these columns are
string(or equivalent text type) and keep Allow schema drift disabled for strict enforcement. - If schema drift must be enabled for other columns, use rule‑based mapping so that these key fields are explicitly mapped as strings and not left to auto‑mapping.
- Define sink schemas so that these columns are
This combination—early‑bound dataset schema, disabled schema drift for critical paths, and explicit Cast/Stringify transformations—provides a reusable pattern to guarantee that business keys are consistently treated as strings throughout the pipeline.
References: