From patchwork Mon Jan 24 21:24:29 2022
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: "Taihsiang Ho (tai271828)" <taihsiang.ho@canonical.com>
X-Patchwork-Id: 1583693
Return-Path: <kernel-team-bounces@lists.ubuntu.com>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Authentication-Results: bilbo.ozlabs.org;
	dkim=fail reason="signature verification failed" (2048-bit key;
 unprotected) header.d=canonical.com header.i=@canonical.com
 header.a=rsa-sha256 header.s=20210705 header.b=abZCcfRY;
	dkim-atps=neutral
Authentication-Results: ozlabs.org;
 spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com
 (client-ip=91.189.94.19; helo=huckleberry.canonical.com;
 envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=<UNKNOWN>)
Received: from huckleberry.canonical.com (huckleberry.canonical.com
 [91.189.94.19])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by bilbo.ozlabs.org (Postfix) with ESMTPS id 4JjNL645ZCz9t56
	for <incoming@patchwork.ozlabs.org>; Tue, 25 Jan 2022 08:25:02 +1100 (AEDT)
Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com)
	by huckleberry.canonical.com with esmtp (Exim 4.86_2)
	(envelope-from <kernel-team-bounces@lists.ubuntu.com>)
	id 1nC6pd-00024d-KM; Mon, 24 Jan 2022 21:24:57 +0000
Received: from smtp-relay-internal-1.internal ([10.131.114.114]
 helo=smtp-relay-internal-1.canonical.com)
 by huckleberry.canonical.com with esmtps
 (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2)
 (envelope-from <taihsiang.ho@canonical.com>) id 1nC6pX-00022c-2q
 for kernel-team@lists.ubuntu.com; Mon, 24 Jan 2022 21:24:51 +0000
Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com
 [209.85.208.69])
 (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest
 SHA256)
 (No client certificate requested)
 by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 1AF803F1C1
 for <kernel-team@lists.ubuntu.com>; Mon, 24 Jan 2022 21:24:50 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com;
 s=20210705; t=1643059490;
 bh=2WpFi8gAdds3udFkuIWOy9Bx3LwNgupEF8ykb5ZZhpQ=;
 h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References:
 MIME-Version;
 b=abZCcfRYhKrDTh3Mvvsc6e5JBbksAc+hYMGWqaApDbjARyGPigV9bhJcOlb/LPqKG
 XnoROwQTihawdxER5vi2rqGm1z5hmNM+bFWQwqbxfaH60wl1krPwGD83o1FPS4dYpU
 N5WN0nS7iZduRgVt40YWWxqu6pNdFY0OtiyyDqWu1wzGdlBk9qfF+6ayNfmzoNyJuV
 M9/AvnScO+C5jfKpDdE6Y9KJsRijp3ST35CeeKKPMnAEeVB3n6LiSIXp6GFM4gH/5/
 Mt5xJvaZRB7H34o4H8i80DjhYKi7Hkae3Zc76zfy7OhG3qoqOEDrlk6s6LNfh+mW98
 X/T/jV6iWLfCg==
Received: by mail-ed1-f69.google.com with SMTP id
 i22-20020a50fd16000000b00405039f2c59so9712645eds.1
 for <kernel-team@lists.ubuntu.com>; Mon, 24 Jan 2022 13:24:50 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=2WpFi8gAdds3udFkuIWOy9Bx3LwNgupEF8ykb5ZZhpQ=;
 b=5JFhrgOTDnZ+G/2lnw/J5MrC8b3APmR1sk5C6S30UHyXxKtiN7DVJtXzeHr2I992sx
 IoCXS8shxZDWCabq+m3kosHlcSLLB6Ox15RXQPcOFQ5V6Nv6LEQDJjAUPoBHJQqrKoc5
 t75Fm0J/Ex9TEhHI5sfJCoy9p4Yq7bOA1f9Sn4A7r21gYKJky2yw2pLWnaVz6dNo+Xlw
 DIkB/EmAYPENeEeh2Fgq6h308oJ5gGgcqZJoMveeS2rIcUeJ5kkYShZ+dv6xv4t9BuPR
 FTcdNQRDZWRGCRUiF6jjaMVctBd/kOOyAwrrB/7zMKffVVEa0/GBghJpBL0mSp+D4yGy
 l0AA==
X-Gm-Message-State: AOAM5322OUZtx5L36wV+jgdyMQ7ewwCiVOLxT7l9UvTWeznveP++jZ5o
 uc6y6xGo2kDW5RU7fbVjUKXx3P2R9SynE5tmcoTuuwIb81kK27y18rN5gYyRau7fT44j2vmVIG0
 L97rgqwUhxKLNGgf7pDQ5n7Yb6WV8r+BGWj0Q0rtRMg==
X-Received: by 2002:a05:6402:270f:: with SMTP id
 y15mr17911082edd.408.1643059489156;
 Mon, 24 Jan 2022 13:24:49 -0800 (PST)
X-Google-Smtp-Source: 
 ABdhPJycqXW25QMBsgWpu6WhecsGqO59WDqYO9kCa/3Am4B6oH8qMPym59qy1Ch4q/jbXry2sSNTRw==
X-Received: by 2002:a05:6402:270f:: with SMTP id
 y15mr17911059edd.408.1643059488704;
 Mon, 24 Jan 2022 13:24:48 -0800 (PST)
Received: from localhost.localdomain ([2001:67c:1560:8007::aac:c540])
 by smtp.gmail.com with ESMTPSA id m12sm7161112edq.40.2022.01.24.13.24.47
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 24 Jan 2022 13:24:48 -0800 (PST)
From: "Taihsiang Ho (tai271828)" <taihsiang.ho@canonical.com>
To: kernel-team@lists.ubuntu.com
Subject: [autotest-client-tests][PATCH 1/1] UBUNTU: SAUCE:
 ubuntu_nvidia_server_driver: create nvidia-fs module test
Date: Mon, 24 Jan 2022 22:24:29 +0100
Message-Id: <20220124212429.8876-2-taihsiang.ho@canonical.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20220124212429.8876-1-taihsiang.ho@canonical.com>
References: <20220124212429.8876-1-taihsiang.ho@canonical.com>
MIME-Version: 1.0
X-BeenThere: kernel-team@lists.ubuntu.com
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Kernel team discussions <kernel-team.lists.ubuntu.com>
List-Unsubscribe: <https://lists.ubuntu.com/mailman/options/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=unsubscribe>
List-Archive: <https://lists.ubuntu.com/archives/kernel-team>
List-Post: <mailto:kernel-team@lists.ubuntu.com>
List-Help: <mailto:kernel-team-request@lists.ubuntu.com?subject=help>
List-Subscribe: <https://lists.ubuntu.com/mailman/listinfo/kernel-team>,
 <mailto:kernel-team-request@lists.ubuntu.com?subject=subscribe>
Cc: dann.frazier@canonical.com
Errors-To: kernel-team-bounces@lists.ubuntu.com
Sender: "kernel-team" <kernel-team-bounces@lists.ubuntu.com>

The goal of this test is to confirm that the nvidia-fs module continues to
build and work properly with new kernel updates.

The environment in which this test needs to run requires several 3rd party
pieces of software - including other 3rd party modules that require a reboot
after installation. To avoid having to handle reboots of the test client,
we instead do the test inside of a virtual machine that the test client
can spin up and reboot itself. The actual nvidia-fs test runs in a docker
container inside that virtual machine.

The test is kicked off by running 01-run-test.sh, which will run each of
the other scripts in turn to set up the virtual machine and the test
docker container within it.

Signed-off-by: Taihsiang Ho (tai271828) <taihsiang.ho@canonical.com>
Acked-by: Po-Hsu Lin <po-hsu.lin@canonical.com>
---
 ubuntu_nvidia_server_driver/control           |   1 +
 ubuntu_nvidia_server_driver/nvidia-fs/00-vars |  11 ++
 .../nvidia-fs/01-run-test.sh                  | 156 ++++++++++++++++++
 .../nvidia-fs/02-inside-vm-update-kernel.sh   |  52 ++++++
 .../nvidia-fs/03-inside-vm-install-drivers.sh |  39 +++++
 .../04-inside-vm-setup-docker-and-run-test.sh |  41 +++++
 .../nvidia-fs/05-inside-docker-run-test.sh    |  38 +++++
 ubuntu_nvidia_server_driver/nvidia-fs/README  |  17 ++
 .../nvidia-fs/a-c-t-entry.sh                  |  10 ++
 .../ubuntu_nvidia_server_driver.py            |  10 ++
 10 files changed, 375 insertions(+)
 create mode 100644 ubuntu_nvidia_server_driver/nvidia-fs/00-vars
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/01-run-test.sh
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/02-inside-vm-update-kernel.sh
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/03-inside-vm-install-drivers.sh
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/05-inside-docker-run-test.sh
 create mode 100644 ubuntu_nvidia_server_driver/nvidia-fs/README
 create mode 100755 ubuntu_nvidia_server_driver/nvidia-fs/a-c-t-entry.sh
diff --git a/ubuntu_nvidia_server_driver/control b/ubuntu_nvidia_server_driver/control
index 2c3f2510..3f6f2323 100644
--- a/ubuntu_nvidia_server_driver/control
+++ b/ubuntu_nvidia_server_driver/control
@@ -10,3 +10,4 @@ Perform testing of Nvidia server drivers
 """
 
 job.run_test_detail('ubuntu_nvidia_server_driver', test_name='load', tag='load', timeout=600)
+job.run_test_detail('ubuntu_nvidia_server_driver', test_name='nvidia-fs', tag='nvidia-fs', timeout=1500)
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/00-vars b/ubuntu_nvidia_server_driver/nvidia-fs/00-vars
new file mode 100644
index 00000000..ad86f46e
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/00-vars
@@ -0,0 +1,11 @@
+# shellcheck shell=bash
+# shellcheck disable=SC2034
+KERNEL_FLAVOR="generic"
+CUDA_CONTAINER_NAME="nvcr.io/nvidia/cuda"
+NVIDIA_BRANCH="470-server"
+LXD_INSTANCE="nvidia-fs-test"
+MLNX_REPO="https://linux.mellanox.com/public/repo/mlnx_ofed"
+MLNX_OFED_VER="5.4-1.0.3.0"
+if [ -f 00-vars.gen ]; then
+    source ./00-vars.gen
+fi
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/01-run-test.sh b/ubuntu_nvidia_server_driver/nvidia-fs/01-run-test.sh
new file mode 100755
index 00000000..1db631af
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/01-run-test.sh
@@ -0,0 +1,156 @@
+#!/usr/bin/env bash
+
+set -e
+set -x
+
+shopt -s nullglob
+
+rm -f 00-vars.gen # avoid stale configs from previous runs
+source 00-vars
+source ../nvidia-module-lib
+
+sudo apt install -y jq xmlstarlet
+
+driver_recommended_cuda_version() {
+    local xmlout
+    xmlout="$(mktemp)"
+
+    sudo nvidia-smi -q -u -x --dtd | tee "$xmlout" > /dev/null
+    xmlstarlet sel -t -v "/nvidia_smi_log/cuda_version" < "$xmlout"
+    rm -f "$xmlout"
+}
+
+find_latest_cuda_container_tag_by_branch() {
+    local branch="$1" # e.g. 11.4
+    source ./00-vars.gen # pick up LXD_OS_VER
+
+    # List all of the available nvidia cuda image tags, filter for
+    # devel/ubuntu images that match our cuda x.y, and sort numerically
+    # to find the newest minor (x.y.z) version.
+    #
+    # Output is paginated by default. To get all the items in one go,
+    # set a page_size greater than the likely number of items (1024)
+    curl -L -s \
+	 'https://registry.hub.docker.com/v2/repositories/nvidia/cuda/tags?page_size=1024' | \
+	jq '."results"[]["name"]' | \
+	tr -d \" | \
+	grep -E "^${branch}(\.[0-9]+)*-devel-ubuntu${LXD_OS_VER}$" | \
+	sort -n | tail -1
+}
+
+gen_vars() {
+    local cuda_branch
+    local container_tag
+
+    # Match the host OS
+    echo "LXD_OS_CODENAME=$(lsb_release -cs)" > 00-vars.gen
+    echo "LXD_OS_VER=$(lsb_release -rs)" >> 00-vars.gen
+    cuda_branch="$(driver_recommended_cuda_version)"
+    container_tag="$(find_latest_cuda_container_tag_by_branch "$cuda_branch")"
+    echo "CUDA_BRANCH=${cuda_branch}" >> 00-vars.gen
+    echo "CUDA_CONTAINER_TAG=${container_tag}" >> 00-vars.gen
+}
+
+lxd_wait() {
+    local instance="$1"
+    
+    for _ in $(seq 300); do
+        if lxc exec "${instance}" -- /bin/true; then
+	    break
+	fi
+	sleep 1
+    done
+}
+
+is_whole_nvme_dev() {
+    local dev
+    dev="$(basename "$1")"
+    echo "$dev" | grep -Eq '^nvme[0-9]+n[0-9]+$'
+}
+
+find_free_nvme() {
+    local dev
+    local children
+    command -v jq > /dev/null || sudo apt install -y jq 1>&2
+    for dev in /dev/nvme*; do
+	is_whole_nvme_dev "$dev" || continue
+	# Is this device used by another kernel device (RAID/LVM/etc)?
+	children=$(lsblk -J "$dev" | jq '.["blockdevices"][0]."children"')
+	if [ "$children" = "null" ]; then
+	   echo "$dev"
+	   return 0
+	fi
+    done
+    return 1
+}
+
+nvme_dev_to_bdf() {
+    local dev="$1"
+    local bdf=""
+
+    while read -r comp; do
+        if echo "$comp" | grep -q -E '^[0-9a-f]{4}:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]$'; then
+            bdf="$comp"
+        fi
+    done <<<"$(readlink /sys/block/"$(basename "$dev")" | tr / '\n')"
+    if [ -z "$bdf" ]; then
+        echo "ERROR: name_dev_to_bdf: No PCI address found for $dev" 1>&2
+        return 1
+    fi
+    echo "$bdf"
+    return 0
+}
+
+gen_vars
+source ./00-vars.gen
+
+# 20.04 installs currently get LXD 4.0.7 by default, but we need at least
+# 4.11 for PCI passthrough support for VMs. latest/stable is new enough.
+sudo snap refresh lxd --channel=latest/stable
+sudo lxd init --auto
+lxc delete --force "$LXD_INSTANCE" || :
+
+# FIXME: Should probably dynamically adapt cpu/memory based on host system
+lxc launch --vm "ubuntu:${LXD_OS_CODENAME}" "$LXD_INSTANCE" \
+    -t c48-m16 \
+    -c security.secureboot=false # so we can load untrusted modules
+
+# Ran out of space pulling the docker image w/ the default 10GB. Double it.
+lxc config device override "${LXD_INSTANCE}" root size=20GB
+lxd_wait "${LXD_INSTANCE}"
+
+for file in 00-vars 00-vars.gen 02-inside-vm-update-kernel.sh 03-inside-vm-install-drivers.sh 04-inside-vm-setup-docker-and-run-test.sh 05-inside-docker-run-test.sh; do
+    lxc file push ${file} "${LXD_INSTANCE}"/root/${file}
+done
+lxc exec "${LXD_INSTANCE}" -- /root/02-inside-vm-update-kernel.sh
+
+# Reboot to switch to updated kernel, so new drivers will build for it
+lxc stop "${LXD_INSTANCE}"
+
+# Release GPU devices so we can assign them to a VM
+sudo service nvidia-fabricmanager stop || :
+recursive_remove_module nvidia
+
+## Pass in devices. Note: devices can be assigned only while VM is stopped
+
+# Any Nvidia GPU will do, just grab the first one we find
+gpuaddr="$(lspci | grep '3D controller: NVIDIA Corporation' | cut -d' ' -f1 | head -1)"
+lxc config device add "${LXD_INSTANCE}" gpu pci "address=${gpuaddr}"
+
+# Find an unused NVMe device to pass in
+nvmedev=$(find_free_nvme) || \
+    (echo "ERROR: No unused nvme device found" 1>&2 && exit 1)
+nvmeaddr="$(nvme_dev_to_bdf "$nvmedev")" || \
+    (echo "ERROR: No PCI device found for $nvmedev" 1>&2 && exit 1)
+lxc config device add "${LXD_INSTANCE}" nvme pci "address=${nvmeaddr}"
+
+lxc start "${LXD_INSTANCE}"
+lxd_wait "${LXD_INSTANCE}"
+lxc exec "${LXD_INSTANCE}" -- /root/03-inside-vm-install-drivers.sh
+
+# Reboot to switch to new overridden drivers
+lxc stop "${LXD_INSTANCE}"
+lxc start "${LXD_INSTANCE}"
+
+lxd_wait "${LXD_INSTANCE}"
+lxc exec "${LXD_INSTANCE}" -- /root/04-inside-vm-setup-docker-and-run-test.sh
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/02-inside-vm-update-kernel.sh b/ubuntu_nvidia_server_driver/nvidia-fs/02-inside-vm-update-kernel.sh
new file mode 100755
index 00000000..914cf795
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/02-inside-vm-update-kernel.sh
@@ -0,0 +1,52 @@
+#!/usr/bin/env bash
+
+set -e
+set -x
+
+source ./00-vars
+
+export DEBCONF_FRONTEND="noniteractive"
+export DEBIAN_PRIORITY="critical"
+
+enable_proposed() {
+    local arch
+    local release
+    local mirror
+    local pockets
+    arch="$(dpkg --print-architecture)"
+    release="$(lsb_release -cs)"
+    pockets="restricted main universe multiverse"
+    
+    case $arch in
+	i386|amd64)
+	    mirror="http://archive.ubuntu.com/ubuntu"
+	    ;;
+	*)
+	    mirror="http://ports.ubuntu.com/ubuntu-ports"
+	    ;;
+    esac    
+	
+    echo "deb $mirror ${release}-proposed restricted $pockets" | \
+	sudo tee "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null
+    echo "deb-src $mirror ${release}-proposed restricted $pockets" | \
+	sudo tee -a "/etc/apt/sources.list.d/${release}-proposed.list" > /dev/null
+}
+
+enable_proposed
+apt update
+apt install -y linux-"${KERNEL_FLAVOR}" \
+    linux-modules-nvidia-"${NVIDIA_BRANCH}"-"${KERNEL_FLAVOR}" \
+    nvidia-kernel-source-"${NVIDIA_BRANCH}" \
+    nvidia-utils-"${NVIDIA_BRANCH}"
+
+# Find the latest kernel version that matches our flavor and create "-test"
+# symlinks to it since they will sort highest, making it the default
+kver=$(linux-version list | grep -- "-${KERNEL_FLAVOR}$" | \
+	   linux-version sort --reverse | head -1)
+ln -s "vmlinuz-${kver}" /boot/vmlinuz-test
+ln -s "initrd.img-${kver}" /boot/initrd.img-test
+
+# Workaround LP: #1849563
+echo "GRUB_CMDLINE_LINUX_DEFAULT=\"\$GRUB_CMDLINE_LINUX_DEFAULT pci=nocrs pci=realloc\"" > /etc/default/grub.d/99-nvidia-fs-test.cfg
+
+update-grub
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/03-inside-vm-install-drivers.sh b/ubuntu_nvidia_server_driver/nvidia-fs/03-inside-vm-install-drivers.sh
new file mode 100755
index 00000000..9d12ddc6
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/03-inside-vm-install-drivers.sh
@@ -0,0 +1,39 @@
+#!/usr/bin/env bash
+
+set -e
+set -x
+
+source ./00-vars
+
+export DEBCONF_FRONTEND="noniteractive"
+export DEBIAN_PRIORITY="critical"
+
+# Remove headers for all kernels except the one running so DKMS does not
+# try to build modules against them. Other kernels may not be compatible
+# with our modules, and we don't want the install to fail because of that.
+# We need to do this twice because apt will avoid removing a metapackage
+# (e.g. linux-kvm) if it can instead upgrade it, which may pull in a new
+# headers package. If that happens, the 2nd time through we'll remove that
+# updated headers package as well as the metapackage(s) that brung it.
+for _ in 1 2; do
+    for file in /lib/modules/*/build; do
+	if [ "$file" = "/lib/modules/$(uname -r)/build" ]; then
+	    continue
+	fi
+	apt remove --purge "$(dpkg -S "$file" | cut -d":" -f1 | sed 's/, / /g')" -y
+    done
+done
+
+# Install MOFED stack
+wget -qO - https://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | \
+    apt-key add -
+wget -qO - "${MLNX_REPO}/${MLNX_OFED_VER}/ubuntu${LXD_OS_VER}/mellanox_mlnx_ofed.list" | tee /etc/apt/sources.list.d/mellanox_mlnx_ofed.list
+apt update
+apt install -y mlnx-ofed-all mlnx-nvme-dkms mlnx-nfsrdma-dkms
+
+# Install nvidia-fs module
+cuda_os="ubuntu$(echo "$LXD_OS_VER" | tr -d .)"
+apt-key adv --fetch-keys "https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/7fa2af80.pub"
+add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /"
+apt install -y nvidia-fs-dkms
+add-apt-repository -r "deb https://developer.download.nvidia.com/compute/cuda/repos/${cuda_os}/x86_64/ /"
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh b/ubuntu_nvidia_server_driver/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh
new file mode 100755
index 00000000..17cb5ddb
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/04-inside-vm-setup-docker-and-run-test.sh
@@ -0,0 +1,41 @@
+#!/usr/bin/env bash
+
+set -e
+set -x
+
+source ./00-vars
+
+install_nvidia_docker() {
+    local distribution
+    distribution="$(. /etc/os-release;echo "$ID$VERSION_ID")"
+    curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
+    curl -s -L "https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list" | \
+        sudo tee /etc/apt/sources.list.d/nvidia-docker.list > /dev/null
+    sudo apt update
+    sudo apt install -y nvidia-docker2 -y
+    sudo systemctl restart docker
+}
+
+umount /mnt/nvme || true
+parted -s /dev/nvme0n1 -- mklabel gpt
+parted -s /dev/nvme0n1 -- mkpart primary ext4 0 100%
+udevadm settle
+mkfs.ext4 -F "/dev/nvme0n1p1"
+mkdir -p /mnt/nvme
+mount "/dev/nvme0n1p1" /mnt/nvme -o data=ordered
+
+modprobe nvidia-fs
+
+install_nvidia_docker
+
+container="${CUDA_CONTAINER_NAME}:${CUDA_CONTAINER_TAG}"
+
+docker pull "${container}"
+docker run --rm --ipc host --name test_gds --gpus device=all \
+       --volume /run/udev:/run/udev:ro \
+       --volume /sys/kernel/config:/sys/kernel/config/ \
+       --volume /dev:/dev:ro \
+       --volume /mnt/nvme:/data/:rw \
+       --volume /root:/root/:ro \
+       --privileged "${container}" \
+       bash -c 'cd /root && ./05-inside-docker-run-test.sh'
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/05-inside-docker-run-test.sh b/ubuntu_nvidia_server_driver/nvidia-fs/05-inside-docker-run-test.sh
new file mode 100755
index 00000000..652bb558
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/05-inside-docker-run-test.sh
@@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+set -e
+set -x
+
+source ./00-vars
+
+# We want e.g. gds-tools-11-4 if using CUDA 11.4
+gds_tools="gds-tools-$(echo "$CUDA_BRANCH" | tr "." "-")"
+
+apt update
+apt install "$gds_tools" libssl-dev -y
+cd /usr/local/cuda/gds/samples
+make -j "$(nproc)"
+dd status=none if=/dev/urandom of=/data/file1 iflag=fullblock bs=1M count=1024
+dd status=none if=/dev/urandom of=/data/file2 iflag=fullblock bs=1M count=1024
+
+#Edit cufile.json and set "allow_compat" property to "false".
+sed -i 's/"allow_compat_mode": true,/"allow_compat_mode": false,/' /etc/cufile.json
+
+echo "sample1"
+./cufile_sample_001 /data/file1 0
+echo "sample 2"
+./cufile_sample_002 /data/file1 0
+echo "sample 3"
+./cufile_sample_003 /data/file1 /data/file2 0
+echo "sample 4"
+./cufile_sample_004 /data/file1 /data/file2 0
+echo "sample 5"
+./cufile_sample_005 /data/file1 /data/file2 0
+echo "sample 6"
+./cufile_sample_006 /data/file1 /data/file2 0
+echo "sample 7"
+./cufile_sample_007 0
+echo "sample 8"
+./cufile_sample_008 0
+echo "sample 14"
+./cufile_sample_014 /data/file1 /data/file2 0
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/README b/ubuntu_nvidia_server_driver/nvidia-fs/README
new file mode 100644
index 00000000..fb68ce75
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/README
@@ -0,0 +1,17 @@
+= nvidia-fs testing =
+The goal of this test is to confirm that the nvidia-fs module continues to
+build and work properly with new kernel updates.
+
+The environment in which this test needs to run requires several 3rd party
+pieces of software - including other 3rd party modules that require a reboot
+after installation. To avoid having to handle reboots of the test client,
+we instead do the test inside of a virtual machine that the test client
+can spin up and reboot itself. The actual nvidia-fs test runs in a docker
+container inside that virtual machine.
+
+The test is kicked off by running 01-run-test.sh, which will run each of
+the other scripts in turn to set up the virtual machine and the test
+docker container within it.
+
+
+
diff --git a/ubuntu_nvidia_server_driver/nvidia-fs/a-c-t-entry.sh b/ubuntu_nvidia_server_driver/nvidia-fs/a-c-t-entry.sh
new file mode 100755
index 00000000..9c535270
--- /dev/null
+++ b/ubuntu_nvidia_server_driver/nvidia-fs/a-c-t-entry.sh
@@ -0,0 +1,10 @@
+#!/usr/bin/env bash
+
+run_test() {
+    exe_dir=$(dirname "${BASH_SOURCE[0]}")
+    pushd "${exe_dir}"
+    ./01-run-test.sh
+    popd
+}
+
+run_test
diff --git a/ubuntu_nvidia_server_driver/ubuntu_nvidia_server_driver.py b/ubuntu_nvidia_server_driver/ubuntu_nvidia_server_driver.py
index d0c667ae..6a6f4c53 100644
--- a/ubuntu_nvidia_server_driver/ubuntu_nvidia_server_driver.py
+++ b/ubuntu_nvidia_server_driver/ubuntu_nvidia_server_driver.py
@@ -19,6 +19,10 @@ class ubuntu_nvidia_server_driver(test.test):
         cmd = "{} test".format(sh_executable)
         utils.system(cmd)
 
+    def run_nvidia_fs_in_lxc(self):
+        cmd = os.path.join(p_dir, "./nvidia-fs/a-c-t-entry.sh")
+        utils.system(cmd)
+
     def run_once(self, test_name):
         if test_name == "load":
             self.compare_kernel_modules()
@@ -26,6 +30,12 @@ class ubuntu_nvidia_server_driver(test.test):
             print("")
             print("{} has run.".format(test_name))
 
+        elif test_name == "nvidia-fs":
+            self.run_nvidia_fs_in_lxc()
+
+            print("")
+            print("{} has run.".format(test_name))
+
         print("")
 
     def postprocess_iteration(self):