Virtual GPU Support#
Graphistry is designed to run on virtual GPUs using Nvidia’s vGPU capabilities. This may simplify administration, encourage additional privacy modes, and improve isolation. It is tested on Nutanix, Vmware, and whatever is used by the major cloud providers.
If you are installing on such a system, we are happy to help, so please reach out.
In general, running Graphistry with a hypervisor requires either no special configuration, or with a vGPU, a 1-line setting in data/config/custom.env. However, virtual environment installation may take considerably more planning and effort when unfamiliar. In particular, for steps around virtualization planning, passthrough/vGPU provisioning, vGPU licensing, vGPU driver installation, and surrounding testing and troubleshooting.
Architectural considerations: You may not want vGPUs#
We recommend checking whether you want baremetal OS, hypervisor with a passthrough GPU driver, or hypervisor with a vGPU driver.
A baremetal OS (no hypervisor) or passthrough driver (hypervisor with non-vGPU driver) configuration may be sufficient, meaning you can skip the complexity of using vGPU drivers and licensing:
Hypervisors: If only 1 dedicated GPU application, or multiple applications running in the same guest OS, you can expose it in passthrough mode
Scaling users through multiple GPUs:
Graphistry already automatically uses all GPUs exposed to it, primarily for scaling to more user sessions
New APIs are starting to use multi-GPUs for acceleration as well
Multiple Graphistry installs
You can launch concurrent instances of Graphistry using docker:
./graphistry -p my_unique_namespace_123 upYou can configure docker to use different GPUs or share the same ones
Isolate Graphistry from other GPU software
Docker allows picking which GPUs + CPUs are used
… For both sharing and isolation
Nvidia vGPUs do not yet support Uniform CPU/GPU memory sharing, which causes RAPIDS processes to crash on bigger-than-GPU memory use instead of transparently spilling
Longer-term, Graphistry is aiming to push most/all GPU use to Dask, which adds even more flexibility for resource sharing.
GPU Virtualization configuration planning#
Plan licensing and driver versons first, as mistakes may require starting over.
Most likely, you’ll want vGPU profile 8Q with vGPU 11.0+ (11.5+ for AI) vGPU drivers:
GPU Driver
You will install a hypervisor GPU driver in the hypervisor and a guest OS GPU driver in the guest OS:
The hypervisor+guest GPU driver pair should be from the same vGPU family
The driver’s CUDA version must be RAPIDS-compatible: 11.0+ (11.5+ for AI) at time of writing
Use officially sanctioned drivers. Google Cloud hosts MD5-matched guest OS drivers.
Virtualization features: C (vCS)
Nvidia vGPUs have different labels (A, B, C, Q, …) that correspond with enabled features.
Only C (vCS) and Q (Quadro vDWS) support OpenCL/CUDA. The compute profile (C) is intended for compute workloads like Graphistry. See official NVIDIA CUDA Toolkit and OpenCL Support on NVIDIA vGPU Software.
Virtualization type: Logical (not MIG)
Only one kind of vGPU virtualization currently works with Nvidia RAPIDS:
Supported - Logical time and memory sliced: Nvidia RAPIDS-compatible vGPUs are for 11.0 (11.5+ for AI) . Each vGPU gets time-sliced and a maximum amount of memory.
Unsupported - MIG (physical virtualization): vGPU 11.1+ supports more true isolation of GPU cores and memory… but Nvidia RAPIDS does not yet officially work on MIGs.
Size: 2 or 8
Each GPU can be split into homogenous sizes. Ex: 1 x 8Q = 8Q and 4 x 2Q = 8Q. You may be able to oversubscribe, e.g., 4 x 4Q = 16Q.
In the case of multi-GPU systems, you can use different partition sizes on different GPUs. Ex: (1 x 8Q) + (4 x 2Q).
vGPU Software setup#
Supporting vGPUs means setting up a Nvidia license manager server, GPU-capable hypervisor, a GPU-capable guest OS, and making them work together. Almost all other environment configuration is the same as regular self-hosted Graphistry setup.
There is also a 1 line change for making Graphistry work with vGPUs.
Licensing server and license#
Setup an Nvidia licensing server and use it to generate and download a license. It must stay on while using your vGPU as the license gets dynamically checked.
Unlicensed GPUs will appear under nvidia-smi and be marked as unlicensed in nvidia-smi -q, but fail upon use with CUDA.
If you have a license server but cannot use it to generate a license, you may be able to receive a time/account-limited evaluation license at nvidia.com.
Hyperviser#
Install hypervisor GPU drivers from the vGPU software version
Guest OS#
Install guest OS GPU drivers from the vGPU software version
Configure a local license file at /etc/nvidia/gridd.conf. Take care to specify vGPU profile vCS (C) / Quadro vDWS (Q) and the right license manager server. Restart the local daemon via sudo service nvidia-gridd restart.
To confirm license activation, check the license status in nvidia-smi -q and debug logs (sudo grep gridd /var/log/messages)
Graphistry#
Set RMM_ALLOCATOR=default in your data/config/custom.env to avoid relying on CUDA Unified Memory for handling bigger-than-memory workloads, which Nvidia vGPUs do not currently support.
Hyperviser-specific options#
See official docs and support forums for guidance specific to your hypervisor.
Nutanix#
Follow the standard Graphistry + vGPU instructions
Helpful links and configurations:
Drivers: Download a hypervisor + guest OS driver pair from an Nvidia.com vGPU account that comes with your hardware purchase. Alternatively, if allowed, Nutanix provides hypervisor drivers and Google Cloud hosts guest OS drivers.
OS: Ubuntu 24.04/22.04 LTS (plain). Do not use the listed Snap-based add-on for Docker.
Docs: Check main Nutanix.com Nvidia docs and supporting Nvidia.com Nutanux docs
Testing#
Script#
In your configured VM, after loading the Graphistry containers, go to your Graphistry release folder and run ./test-gpu.sh. For more information on what this does, see quick testing and test-gpu.
Hypevisor#
nvidia-smishould report a GPU of the expected driver version
Guest OS#
GPU:
nvidia-smishould report a GPU of the expected driver versionNvidia license server approval:
sudo grep gridd /var/log/messagesSee above
test-gpu.sh
Docker#
See above
test-gpu.sh
Common errors#
No GPU driver in hypervisor / guest VM
Mismatching GPU driver between hypervisor / guest VM; use a supplied pair
RAPIDS-incompatible GPU driver version
Unlicensed GPU: No license, license server down, …
Inappropriate license: CUDA-incapable vGPU profile/license; need C, Q
In addition, the usual GPU setup surprises may apply:
No Docker GPU runtime
Docker GPU runtime not set as default (
docker infoand/etc/docker/daemon.json)See above
test-gpu.shdiscussions