Install WoolyAI Client Libraries
Prerequisites​
- NVIDIA GPU with CUDA 12.9.x or 13.x installed.
Setup​
-
Create a directory for the client libraries to be stored in:
mkdir woolyai-libraries -
Download the latest version of the WoolyAI Client libraries from https://github.com/Wooly-AI/woolyai-client-libraries/releases into the
woolyai-librariesdirectory.curl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libpreload_dlopen.socurl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libwooly-12.9.1.socurl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libwooly-12.socurl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libwooly-13.1.1.socurl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libwooly-13.socurl -O -L https://github.com/Wooly-AI/woolyai-client-libraries/releases/download/nvidia-arm64-0.2.1/libwooly.so -
Alongside the
woolyai-librariesdirectory, create a~/.config/wooly/configfile:
info
You can set WOOLYAI_CLIENT_CONFIG to the path of the config file to use a different path.
# PRIO: The priority the task gets on the server (default: 0, which is the highest priority)
## Assign a priority from 0 to 4 for execution on a shared GPU pool. WoolyAI server uses the PRIO value to determine priority while allocating GPU compute core and VRAM resources for when there are concurrent jobs running on the same GPU.
# PRIO = 0
# GPU_COUNT: (Multi-GPU mode) Count of GPUs to execute the client's task across (default: 1)
# GPU_COUNT = 2
###################
# Controller config
###################
# Note: If using the controller, you need to disable ADDRESS and PORT in the config
# CONTROLLER_URL: The URL of the controller
## When CONTROLLER_URL is commented out, the client will go directly to the server and not use a controller
## NOTE CONTROLLER ONLY: Without REQUIRED_VRAM, you'll see "controller assignment failed: failed to parse response JSON: [json.exception.parse_error.101] parse error at line 1, column 5: syntax error while parsing value - invalid literal; last read: '400 B'; expected end of input"
# CONTROLLER_URL=http://127.0.0.1:8080
# CONTROLLER_NODE_GROUP_IDS: The IDs of the node groups to use for the client
# CONTROLLER_NODE_GROUP_IDS = nvidia,fast-networking
# REQUIRED_VRAM (required): Required VRAM for the client in MB
# REQUIRED_VRAM = 50000
# GPU_MODE: This determines if this client request needs exclusive use of the GPU or not (default: Shared)
## Do not confuse with Multi-GPU mode (GPU_COUNT). This is to tell the controller whether to allow other tasks to be assigned alongside of it on the same GPU.
# GPU_MODE = Exclusive
########################
# Controller-less config
########################
# ADDRESS: The direct server address to use for the client
ADDRESS = 127.0.0.1
# PORT: The server port to use for the client
PORT = 10000
# SSL: The SSL mode to use for the client
SSL = DISABLE
# GPUS: Comma-separated GPU indices as seen by WoolyAI Server (not your job scheduler).
## Indices match the server's device list (often all GPUs on the node unless the server was started with a restricted CUDA_VISIBLE_DEVICES or Docker --gpus list).
## When GPUS is commented out, the client will use a single gpu #0
# GPUS = 0,1,2
# WOOLYAI_WEIGHTS_DEDUP: Whether to deduplicate weights between clients on the same GPU; all clients need to have this set to use deduplication.
## Default: false
# WOOLYAI_WEIGHTS_DEDUP = true
# WOOLYAI_SWAP_FROM_VRAM: Whether to swap weights from VRAM to disk when the GPU is full
## Default: 1
# WOOLYAI_SWAP_FROM_VRAM = 0
-
Update the ADDRESS and PORT if necessary. Keep it default if you are running on the same machine as the WoolyAI Server.
-
Set the environment variables for WoolyAI:
export LD_PRELOAD="${PWD}/woolyai-libraries/libpreload_dlopen.so"export LIB_WOOLY_PATH="${PWD}/woolyai-libraries/" -
Create a venv and then install torch and test script (
test.py):python3 -m venv woolyai-venvsource woolyai-venv/bin/activatepip install --index-url https://download.pytorch.org/whl/cu129 'torch>=2.6'pip install numpyimport torchimport mathprint(f"Number of GPUs available: {torch.cuda.device_count()}")for i in range(torch.cuda.device_count()):print(f"GPU {i}: {torch.cuda.get_device_name(i)}")def add(dtype):device = torch.device("cuda:0")x = torch.ones(5, device=device, dtype=dtype)y = torch.ones(5, device=device, dtype=dtype)r = x + yprint(x)print(torch.abs(r))add(torch.float)add(torch.bfloat16)add(torch.half) -
Run the test:
python3 test.pyNumber of GPUs available: 1GPU 0: NVIDIA GH200 480GB (WOOLY)tensor([1., 1., 1., 1., 1.], device='cuda:0')tensor([2., 2., 2., 2., 2.], device='cuda:0')tensor([1., 1., 1., 1., 1.], device='cuda:0', dtype=torch.bfloat16)tensor([2., 2., 2., 2., 2.], device='cuda:0', dtype=torch.bfloat16)tensor([1., 1., 1., 1., 1.], device='cuda:0', dtype=torch.float16)tensor([2., 2., 2., 2., 2.], device='cuda:0', dtype=torch.float16)
Notes​
- Set the PRIO flag in the config file to assign a priority from 0 (highest priority) to 4 (lowest priority) for execution on a shared GPU pool. WoolyAI server uses the PRIO value to determine priority while allocating GPU compute core and VRAM resources for when there are concurrent jobs running on the same GPU. WoolyAI Controller uses PRIO value to schedule the Client request across GPU nodes.