Set Up The Server
Prerequisites​
- A host machine with a compatible GPU (NVIDIA currently)
- A license.json file. You can get it from https://woolyai.com/signup/
- Docker installed on the GPU host machine.
- Choose the proper docker image from the WoolyAI Server Docker Hub. We provide images for NVIDIA at specific driver versions. Generally, you use as close as possible to your
Setup​
- Create a directory for the server VRAM cache
mkdir woolyai-server-vram-cache
- Create the server config file:
woolyai-server-config.toml:
[SERVER]
LISTEN_ADDR = tcp::443
# Optional SSL endpoint. Uncomment after placing certfile.pem in working dir.
# LISTEN_ADDR = ssl::443
# SSL_CERT_FILE = certfile.pem
# SSL_KEY_FILE = certfile.pem
########################
# Controller integration (leave blank if not using a controller).
########################
## Note: You can comma separate multiple controller URLs
# CONTROLLER_NATS_URL = nats://localhost:4222
# NODE_NAME must be unique across all nodes in the cluster
# NODE_NAME = my-node
# NODE_ID will be auto-generated from NODE_NAME if not set (must be a valid UUID)
# NODE_ID = 159e6f46-9398-11f0-bca3-6b6ea1493108
# NODE_ADDRESS is the address of the node the client will connect to
# NODE_ADDRESS = 127.0.0.1
# Global cache behaviour: OFF, RECORD, or REPLAY (default).
GLOBAL_CACHE_MODE = OFF
-
Make sure you have the
woolyai-server-license.jsonfile in the current directory. You can get it from WoolyAI support. -
Run the Container
NVIDIA​
docker run -d --name woolyai-server \
--gpus all \
--network=host \
--pull always \
--entrypoint /usr/local/bin/server-entrypoint.bash \
-v "./woolyai-server-vram-cache:/home/automation/.wooly/shared_mem:rw" \
-v "./woolyai-server-config.toml:/home/automation/.wooly/config:ro" \
-v "./woolyai-server-license.json:/home/automation/.wooly/license.json:ro" \
woolyai/server:cuda12.9.1-latest
- Check the logs with
docker logs woolyai-serverto make sure it started properly. You should see"server listening on"if it worked.
info
The wooly-server-vram-cache(Optional) folder is where you can cache models in VRAM with the VRAM Model Cache Tool. This is done with the woolyai-vram-model-cache --root ./wooly-server-vram-cache . . . command.
FAQ​
- There is no need to go into the container.
- You can see logs with:
docker logs -f woolyai-server