Graphistry System Debugging FAQ#

Issues sometimes occur during server start, especially in on-premises scenarios with environment configuration drift.

List of Issues#

Started before initialization completed
GPU driver misconfiguration
Wrong or mismatched containers installed

1. Issue: Started before initialization completed#

Primary symptom#

Visualization page never returns or Nginx “504 Gateway Time-out” due to services still initializing.” Potentially also “502”.

Correlated symptoms#

GPU tests pass
Often with first-ever container launch
Likely within 60s of launch
Can happen even after static homepage loads
In docker compose up logs (or docker logs ubuntu_central_1):
- “Error: Server at maximum capacity…
- “Error: Too many users…
- “Error while assigning…

Solution#

Try stopping and starting the containers
Wait for 1-2min after start and try again
- Viz container should report a bunch of INFO success: viz-worker-10006 entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
- Mongo container should report a bunch of I ACCESS [conn66] Successfully authenticated as principal graphistry on cluster

2. Issue: GPU driver misconfiguration#

Primary symptoms#

Visualization page never returns or Nginx “504 Gateway Time-out” due to services failing to initialize GPU context. Potentially also “502”.
Visualization loads and positions appear, but never starts clustering, and browser console reports a web socket disconnect

Correlated symptoms#

node processes in ubuntu_viz_1 container fail to run for more than 30s (check durations through docker exec -it ubuntu_viz_1 ps "-aux")
Upon manually starting a worker in ubuntu_viz_1, error message having to do with GPUs (Nvidia, OpenCL, drivers, context, …)
- docker exec -it ubuntu_viz_1 bash -c "VIZ_LISTEN_PORT=7000 node /opt/graphistry/apps/core/viz/index.js"
GPU tests fail
- host
  - nvidia-smi
    - Failure: host has no GPU drivers
  - Optional: See https://www.npmjs.com/package/@graphistry/cljs
    - note: Requires CL installed in host, which production use of Graphistry does not require
- container
  - ./graphistry-cli/graphistry/bootstrap/ubuntu-cuda9.2/test-20-docker.sh
  - ./graphistry-cli/graphistry/bootstrap/ubuntu-cuda9.2/test-30-CUDA.sh
  - ./graphistry-cli/graphistry/bootstrap/ubuntu-cuda9.2/test-40-nvidia-docker.sh
  - nvidia-docker run –rm docker.io/rapidsai/base:24.04-cuda11.8-py3.10 nvidia-smi
  - nvidia-docker exec -it ubuntu_viz_1 nvidia-smi
    - If run --rm docker.io/rapidsai/base:24.04-cuda11.8-py3.10 succeeds but exec fails, you likely need to update /etc/docker/daemon.json to add nvidia-container-runtime, and sudo service docker restart, and potentially clean stale images to make sure they use the right runtime
  - See https://www.npmjs.com/package/@graphistry/cljs
  - In container ubuntu_viz_1, create & run /opt/graphistry/apps/lib/cljs/test/cl node test-nvidia.js:

const cl = require('node-opencl');
const { argv } = require('../util');
const { CLPlatform, CLDeviceTypes } = require('../../');
CLPlatform.devices('gpu')[0].isNvidiaDevice === true

Solution#

Based on where the issue is according to the above tests, fix that installation layer
If problems persist, reimaging the full box or switching to a cloud instance may prevent heartache

3. Issue: Wrong or mismatched containers installed#

Primary symptom#

Especially when upgrading, only some images may have updated. You can delete all of them and start from scratch.

Correlated symptoms#

docker images or docker ps shows surprising versions

Solution#

Delete graphistry images and reinstall

Identify installed images: docker images | grep graphistry and docker images | grep nvidia
Remove: docker rmi -f graphistry/nginx-proxy graphistry/graphistry-central ...
Reload: docker load -i containers.tar

Graphistry System Debugging FAQ

Contents

Graphistry System Debugging FAQ#

List of Issues#

1. Issue: Started before initialization completed#

Primary symptom#

Correlated symptoms#

Solution#

2. Issue: GPU driver misconfiguration#

Primary symptoms#

Correlated symptoms#

Solution#

3. Issue: Wrong or mismatched containers installed#

Primary symptom#

Correlated symptoms#

Solution#