Share via

nvidia gpu driver extension failed to provision

Dennis O 0 Reputation points
2026-06-09T04:41:49.3433333+00:00

US West 2, Standard_NC4asT4_v3, Ubuntu Server 24.04

When selected nVidia GPU driver extension to install during VM provisioning, nVidia driver failed to provision.

Uninstalled and reinstalled the extension using Azure portal but failed with same error.

The following are error details

Status Conflict

Error details

{

"code": "VMExtensionProvisioningError",

"message": "VM has reported a failure when processing extension 'NvidiaGpuDriverLinux' (publisher 'Microsoft.HpcCompute' and type 'NvidiaGpuDriverLinux'). Error message: 'Installation failed. Exit code 0'. More information on troubleshooting is available at https://aka.ms/VMExtensionNvidiaGpuDriverLinuxTroubleshoot"

}

Operation details

{

"status": "Failed",

"error": {

    "code": "VMExtensionProvisioningError",

    "message": "VM has reported a failure when processing extension 'NvidiaGpuDriverLinux' (publisher 'Microsoft.HpcCompute' and type 'NvidiaGpuDriverLinux'). Error message: 'Installation failed. Exit code 0'. More information on troubleshooting is available at https://aka.ms/VMExtensionNvidiaGpuDriverLinuxTroubleshoot"

}

}

Deployment template

{

"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",

"contentVersion": "1.0.0.0",

"parameters": {

    "vmName": {

        "type": "String"

    },

    "location": {

        "type": "String"

    }

},

"resources": [

    {

        "type": "Microsoft.Compute/virtualMachines/extensions",

        "apiVersion": "2015-06-15",

        "name": "[concat(parameters('vmName'),'/NvidiaGpuDriverLinux')]",

        "location": "[parameters('location')]",

        "properties": {

            "publisher": "Microsoft.HpcCompute",

            "type": "NvidiaGpuDriverLinux",

            "typeHandlerVersion": "1.9",

            "autoUpgradeMinorVersion": true,

            "settings": {}

        }

    }

]

}

Deployment parameters

{

"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",

"contentVersion": "1.0.0.0",

"parameters": {

    "vmName": {

        "value": "<redacted - my vm name>"

    },

    "location": {

        "value": "westus2"

    }

}

}

I have used this extension and vm size before in this region with success but not sure why the extension failed to provision. Please advise.

Azure Virtual Machines
Azure Virtual Machines

An Azure service that is used to provision Windows and Linux virtual machines.


2 answers

Sort by: Most helpful
  1. Himanshu Shekhar 6,610 Reputation points Microsoft External Staff Moderator
    2026-06-09T05:44:11.05+00:00

    Hi Dennis,

    Thanks for the detailed error output that helps a lot.

    The Installation failed. Exit code 0 text from the NvidiaGpuDriverLinux extension is generic; the actual cause is logged inside the VM. Based on your configuration (Standard_NC4as_T4_v3, Ubuntu Server 24.04, West US 2), here are the most likely causes and how to resolve them.

    1. Secure Boot / Trusted Launch (most common on Ubuntu 24.04) Ubuntu 24.04 Gen2 images are deployed as Trusted Launch VMs with Secure Boot enabled by default. The GPU driver extension does not support Secure Boot all boot components must be signed by a trusted publisher, and the unsigned NVIDIA kernel module is rejected, causing the install to fail (often silently). Secure Boot and vTPM should be disabled for the extension path, because the process can hang or fail when they're enabled.

    Disable Secure Boot on the VM (VM > Configuration / Security type), reboot, then re‑add the extension. This is the single most common difference between a deployment that "worked before" and one that now fails.

    1. NVIDIA package signing key rotation (known issue on NC‑series Linux) There is a known issue where the public signing keys NVIDIA uses for its apt/CUDA repositories were rotated, and the extension fails to download/install the driver as a result. This affects all NC‑series sizes, including Standard_NC4as_T4_v3. If Secure Boot is already off, refresh the signing key inside the guest and re‑run the extension:

    User's image

    1. OS / driver support matrix On NCasT4_v3, the extension installs CUDA drivers by default. The extension's supported‑distro matrix lists Ubuntu 20.04 LTS for CUDA, while 22.04/24.04 are aligned to the GRID path. If you need the driver on 24.04 specifically, the reliable route is the manual install documented for N‑series Linux.

    Diagnostics to confirm root cause

    Confirm the GPU is visible to the VM

    User's image

    A line like nvidia: module verification failed: signature and/or required key missing confirms the Secure Boot cause.

    Recommended fix path

    1. Disable Secure Boot (and vTPM) on the VM > reboot.
    2. Remove and re‑add the NvidiaGpuDriverLinux extension.
    3. If it still fails, refresh the NVIDIA signing key (above) and retry, or install the driver manually per the N‑series Linux guide and validate with nvidia-smi.

    Reference: NVIDIA GPU Driver Extension for Linux and N‑series GPU driver setup for Linux

    Was this answer helpful?


  2. AI answer

    2026-06-09T04:42:01.0733333+00:00

    The extension error text is misleading here. The important part is the exit code from the NVIDIA GPU Driver Linux extension and the extension logs inside the VM.

    For this VM size and OS, use the built‑in troubleshooting path for the NvidiaGpuDriverLinux extension:

    1. Check the extension provisioning state
      • From Azure CLI:
             az vm extension list \
               --resource-group <your-rg> \
               --vm-name <your-vm-name> -o table
        
      • Confirm that the extension name is NvidiaGpuDriverLinux and note its provisioning state and any status message.
    2. Inspect the extension log inside the VM
      • SSH into the VM.
      • Open the extension status log:
             sudo less /var/log/azure/nvidia-vmext-status
        
      • This file contains the detailed output of the driver installation and the real failure reason for the “Installation failed. Exit code 0” message.
    3. Interpret the exit code
      • The extension defines specific exit codes:
        • 0 – Operation successful
        • 1 – Incorrect usage of extension
        • 10 – Linux Integration Services for Hyper‑V and Azure not available or installed
        • 11 – NVIDIA GPU not found on this VM size
        • 12 – Image offer not supported
        • 13 – VM size not supported
        • 14 – Operation unsuccessful
      • The portal message is generic; use the value and message in /var/log/azure/nvidia-vmext-status to see which of these actually occurred and follow the suggested action (for example, use a supported VM size/OS if you see 11, 12, or 13).
    4. Re‑deploy the extension with a supported version and optional settings
      • The current documented version for Linux is 1.6. When deploying via CLI:
             az vm extension set \
               --resource-group <your-rg> \
               --vm-name <your-vm-name> \
               --name NvidiaGpuDriverLinux \
               --publisher Microsoft.HpcCompute \
               --version 1.6
        
      • If the issue is related to GRID/CUDA driver version compatibility (for example, kernel/driver mismatch), pass a specific driver version in --settings, such as:
             az vm extension set  \
               --resource-group <your-rg> \
               --vm-name <your-vm-name>  \
               --name NvidiaGpuDriverLinux \
               --publisher Microsoft.HpcCompute \
               --settings "{'driverVersion':'535.161'}"
        
      • If the log indicates kernel/driver issues (for example, GRID 16.x/17.x on newer kernels), follow the guidance in the log and known‑issues section (for example, downgrade the kernel to 6.8, then reinstall the extension).
    5. If the extension still fails
      • Run VM assist for Linux (https://aka.ms/vmassistlinux) to validate VM Guest Agent and VM health.
      • If the log shows image/size not supported or other conditions that cannot be changed, use a supported N‑series image/size combination as documented in the N‑series driver setup article referenced from the extension docs.

    If, after checking /var/log/azure/nvidia-vmext-status and adjusting driver/kernel or VM size/OS as needed, the extension continues to fail, collect that log and open a support case so the extension team can review the specific failure.


    References:

    AI-generated content may be incorrect. Read our transparency notes for more information.

    Was this answer helpful?

Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.