Skip to main content

Set Up The Server

Prerequisites​

  • A host machine with a compatible GPU (NVIDIA currently)
  • A license.json file. You can get it from https://woolyai.com/signup/
  • Docker installed on the GPU host machine.
  • Choose the proper docker image from the WoolyAI Server Docker Hub. We provide images for NVIDIA at specific driver versions. Generally, you use as close as possible to your

Setup​

  1. Optional(Only needed for Model Weight Dedup feature) - Create a directory for the server VRAM cache.
mkdir woolyai-server-vram-cache
  1. Create the server config file: woolyai-server-config.toml:
[SERVER]

LISTEN_ADDR = tcp::443
# Optional SSL endpoint. Uncomment after placing certfile.pem in working dir.
# LISTEN_ADDR = ssl::443
# SSL_CERT_FILE = certfile.pem
# SSL_KEY_FILE = certfile.pem

########################
# Controller integration (leave blank if not using a controller).
########################
## Note: You can comma separate multiple controller URLs
# CONTROLLER_NATS_URL = nats://localhost:4222
# NODE_NAME must be unique across all nodes in the cluster
# NODE_NAME = my-node
# NODE_ID will be auto-generated from NODE_NAME if not set (must be a valid UUID)
# NODE_ID = 159e6f46-9398-11f0-bca3-6b6ea1493108
# NODE_ADDRESS is the address of the node the client will connect to
# NODE_ADDRESS = 127.0.0.1

# Global cache behaviour: OFF, RECORD, or REPLAY (default).
GLOBAL_CACHE_MODE = OFF
  1. Make sure you have the woolyai-server-license.json file in the current directory. You can get it from WoolyAI support.

  2. Run the Container

NVIDIA​

We provide multiple cuda versions for you to use. Be sure to the appropriate version for your GPU by checking the WoolyAI Server Docker Hub. Currently, you can find 12.9.1 and 13.1.1.

docker run -d --name woolyai-server \
--gpus all \
--network=host \
--pull always \
--entrypoint /usr/local/bin/server-entrypoint.bash \
-v "./woolyai-server-vram-cache:/home/automation/.wooly/shared_mem:rw" \
-v "./woolyai-server-config.toml:/home/automation/.wooly/config:ro" \
-v "./woolyai-server-license.json:/home/automation/.wooly/license.json:ro" \
woolyai/server:cuda12.9.1-latest
  1. Check the logs with docker logs woolyai-server to make sure it started properly. You should see "server listening on" if it worked.
info

The wooly-server-vram-cache(Optional) folder is where you can cache models in VRAM with the VRAM Model Cache Tool. This is done with the woolyai-vram-model-cache --root ./wooly-server-vram-cache . . . command.

FAQ​

  • There is no need to go into the container.
  • You can see logs with: docker logs -f woolyai-server