Using the VRAM Model Cache Tool
The WoolyAI GPU VRAM Model Cache (GPU VRAM DeDup feature) CLI is a tool that allows you to cache models on the WoolyAI Server's GPU VRAM, so that any WoolyAI Client kernel executions that load the identical models can share the cached model. This eliminates duplicate consumption of GPU VRAM for identical models, making it available to load more models and execute more jobs.
Prerequisites​
- A WoolyAI Server
- Python installed on the WoolyAI Server host machine
Setup​
- Download the latest version of the Model Cacher from the WoolyAI Downloads.
- Install the wheel file using
pip install <path to the wheel file>.
Usage​
The model cacher may need to be modified to match the model you are trying to cache. This is done post-install, under the ~/.local/lib/python3.10/site-packages/woolyai_model_cacher/cli.py (path may vary).
def compute_hashes_from_source(source: str, dtype_name: Optional[str]) -> list[str]:
import torch
torch_dtype = _resolve_torch_dtype(dtype_name)
kwargs = {"torch_dtype": torch_dtype} if torch_dtype is not None else {}
model = AutoModelForCausalLM.from_pretrained(source, **kwargs)
$ woolyai-vram-model-cache --help
usage: woolyai-vram-model-cache [-h] [--root ROOT] {list,add,delete} ...
Manage per-model chunk SHA1 hashes for WoolyAI models.
positional arguments:
{list,add,delete}
list list models and number of hashes
add compute & store hashes (overwrites)
delete delete whole model file or all models
options:
-h, --help show this help message and exit
--root ROOT override storage dir (e.g. ./.wooly/shared_mem or
/etc/wooly/shared_mem)
List cached models​
woolyai-vram-model-cache list
Add a model (compute and store hashes)​
woolyai-vram-model-cache add --source <huggingface-model-id-or-local-path>
woolyai-vram-model-cache add --source meta-llama/Llama-2-7b-hf --dtype float16
Delete a cached model​
woolyai-vram-model-cache delete --model <model-name>
woolyai-vram-model-cache delete --all # delete all cached models
Specify a custom location for the model cache​
woolyai-vram-model-cache --root <custom-root-directory> add --source <huggingface-model-id-or-local-path>