Running Your First Project
Orientation
Wooly is provided currently as a docker container. It contains the Wooly runtime libraries and a CLI.
You can find all available images on Docker Hub.
Your working Environment
Setup your CPU backed Linux environment where you will run the Wooly Client container.
- Recommended: Linux CPU instance (US Virginia Region)
- Since we are in Beta, GPU resources powering the WoolyAI Acceleration Service are limited and setup only in US Virginia geographic region. For best user experience, we recommend spinning up a CPU instance on a public cloud close to US Virginia region.
- Quick Start: Local CPU Hardware
- Pull and run our Wooly Client Docker Container on your laptop and work with Pytorch projects inside it.
Depending on the model size you wish to run, you will have to start CPU instance with enough RAM (memory optimized). The model is first loaded into the CPU instance/local CPU hardware RAM and then moved to the Wooly AI Acceleration Service GPU.
For models with 7B parameters or more, configure linux CPU instances with minimum 32GB RAM and 4 VCPU for best experience.
If you are working on local CPU hardware (macOS) and running container inside docker, swap space can be increased through Docker Desktop settings. But it is very limited. MacOS will also manage swap space automatically.
Contact us if you have any questions regarding this: support@woolyai.com
Pull the latest Docker Image
docker pull woolyai/client:latest
Run the Docker Container
docker run -itd --network=host --name wooly-container -h wooly-client woolyai/client:latest
--network=host
is used to bind the container to the host network for best performance.
Exec into the Container
docker exec -it wooly-container bash
Log in to the WoolyAI Acceleration Service
# once attached to bash in the container with docker exec -it wooly-container bash
ubuntu@wooly-client:~$ wooly login
By proceeding, you agree to our Terms & Conditions: https://woolyai.com/terms-and-conditions (enter yes/no): yes
enter your token:
ping latency to the wooly server is 32.128700 ms
success
You'll be prompted to enter your Wooly token. You can obtain a Wooly token by signing up at https://woolyai.com/get-started/.
If this fails, please reach out to support@woolyai.com.
Run a PyTorch Project
"Which GPU should you use?", "Do you have enough resources?" Don't worry about it! WoolyAI Acceleration service takes away the hassles of GPU Resource management. Your token by default has enough Wooly credits attached which lets you run Pytorch projects and utilize GPU resources.
In the Beta, we are powering the service with limited GPU Infrastructure in the backend. This means that really large 70B or greater parameter models need to be quantized to run.
The bash files in ~/examples
directory install all required dependencies and then run the python example code. Here is an example of running a DeepSeek Pytorch model:
-
You'll need to log into huggingface inside of the container first. You can do this with:
pip install -U "huggingface_hub[cli]"
huggingface-cli login --token hf_IXXXXXXXXXXXXXXXXX -
Some of the examples are gated behind an approval process on huggingface. You'll need to go to the model's page and request access from the model owner.
~/examples/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.bash
+++ dirname /home/ubuntu/examples/deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B.bash
++ cd /home/ubuntu/examples
++ pwd
+ SCRIPT_DIR=/home/ubuntu/examples
+ cd /home/ubuntu/examples/
+ . ../shared.bash
+ true
+ pip install transformers accelerate
Defaulting to user installation because normal site-packages is not writeable
Collecting transformers
Downloading transformers-4.49.0-py3-none-any.whl (10.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/10.0 MB 584.6 kB/s eta 0:00:00
Collecting accelerate
Downloading accelerate-1.4.0-py3-none-any.whl (342 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 342.1/342.1 KB 2.8 MB/s eta 0:00:00
Requirement already satisfied: huggingface-hub<1.0,>=0.26.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers) (0.29.1)
Collecting tokenizers<0.22,>=0.21
Downloading tokenizers-0.21.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (2.9 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.9/2.9 MB 3.2 MB/s eta 0:00:00
Requirement already satisfied: packaging>=20.0 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers) (24.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.17.0)
Collecting regex!=2019.12.17
Using cached regex-2024.11.6-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (782 kB)
Requirement already satisfied: requests in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers) (2.32.3)
Collecting safetensors>=0.4.1
Downloading safetensors-0.5.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (459 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 459.5/459.5 KB 3.7 MB/s eta 0:00:00
Requirement already satisfied: tqdm>=4.27 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers) (4.67.1)
Requirement already satisfied: pyyaml>=5.1 in /home/ubuntu/.local/lib/python3.10/site-packages (from transformers) (6.0.2)
Collecting numpy>=1.17
Using cached numpy-2.2.3-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.4 MB)
Requirement already satisfied: torch>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.6.0)
Collecting psutil
Downloading psutil-7.0.0-cp36-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (279 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 279.5/279.5 KB 4.0 MB/s eta 0:00:00
Requirement already satisfied: typing-extensions>=3.7.4.3 in /home/ubuntu/.local/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (4.12.2)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub<1.0,>=0.26.0->transformers) (2025.2.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (3.1.5)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (3.4.2)
Requirement already satisfied: sympy==1.13.1 in /usr/local/lib/python3.10/dist-packages (from torch>=2.0.0->accelerate) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /usr/local/lib/python3.10/dist-packages (from sympy==1.13.1->torch>=2.0.0->accelerate) (1.3.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/ubuntu/.local/lib/python3.10/site-packages (from requests->transformers) (3.4.1)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/ubuntu/.local/lib/python3.10/site-packages (from requests->transformers) (2.3.0)
Requirement already satisfied: idna<4,>=2.5 in /home/ubuntu/.local/lib/python3.10/site-packages (from requests->transformers) (3.10)
Requirement already satisfied: certifi>=2017.4.17 in /home/ubuntu/.local/lib/python3.10/site-packages (from requests->transformers) (2025.1.31)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch>=2.0.0->accelerate) (3.0.2)
Installing collected packages: safetensors, regex, psutil, numpy, tokenizers, accelerate, transformers
Successfully installed accelerate-1.4.0 numpy-2.2.3 psutil-7.0.0 regex-2024.11.6 safetensors-0.5.3 tokenizers-0.21.0 transformers-4.49.0
+ cat
+ python3 /tmp/script.py
config.json: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 679/679 [00:00<00:00, 1.24MB/s]
model.safetensors: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 3.55G/3.55G [00:40<00:00, 88.1MB/s]
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
generation_config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████| 181/181 [00:00<00:00, 1.82MB/s]
tokenizer_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████| 3.07k/3.07k [00:00<00:00, 41.6MB/s]
tokenizer.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 7.03M/7.03M [00:00<00:00, 33.2MB/s]
Device set to use cuda
Generated Response:
[{'generated_text': [{'role': 'user', 'content': 'Who are you?'}, {'role': 'assistant', 'content': "Greetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have.\n</think>\n\nGreetings! I'm DeepSeek-R1, an artificial intelligence assistant created by DeepSeek. I'm at your service and would be delighted to assist you with any inquiries or tasks you may have."}]}]
You can also create and run your custom Pytorch projects inside the CPU container. No special configuration is needed
Caching
We have both global and private caching on the Wooly AI Acceleration Service, enabling you to run models faster and more efficiently.
- Global caching is available for all users and populated by our team based on the popularity of the models. You can see the list of globally cached models by running
wooly cache global list
inside the container. - Private caching is available for your account. This is limited to 40GB in size and can be managed by running
wooly cache private usage
andwooly cache private invalidate
inside the container.
Final Notes
The container has a specific environment, allowing the Wooly libraries to be prioritized over any others you install. Learn more about the container environment on the Understanding the Container Environment page.