Using the VRAM Model Cache Tool

The WoolyAI GPU VRAM Model Cache (GPU VRAM DeDup feature) CLI is a tool that allows you to cache models on the WoolyAI Server's GPU VRAM, so that any WoolyAI Client kernel executions that load the identical models can share the cached model. This eliminates duplicate consumption of GPU VRAM for identical models, making it available to load more models and execute more jobs.

Prerequisites

A WoolyAI Server
Python installed on the WoolyAI Server host machine

Setup

Download the latest version of the Model Cacher from the WoolyAI Downloads.
Install the wheel file using pip install <path to the wheel file>.

Usage

The model cacher may need to be modified to match the model you are trying to cache. This is done post-install, under the ~/.local/lib/python3.10/site-packages/woolyai_model_cacher/cli.py (path may vary).

def compute_hashes_from_source(source: str, dtype_name: Optional[str]) -> list[str]:
  import torch

  torch_dtype = _resolve_torch_dtype(dtype_name)
  kwargs = {"torch_dtype": torch_dtype} if torch_dtype is not None else {}
  model = AutoModelForCausalLM.from_pretrained(source, **kwargs)

$ woolyai-vram-model-cache --help
usage: woolyai-vram-model-cache [-h] [--root ROOT] {list,add,delete} ...

Manage per-model chunk SHA1 hashes for WoolyAI models.

positional arguments:
  {list,add,delete}
    list             list models and number of hashes
    add              compute & store hashes (overwrites)
    delete           delete whole model file or all models

options:
  -h, --help         show this help message and exit
  --root ROOT        override storage dir (e.g. ./.wooly/shared_mem or
                     /etc/wooly/shared_mem)

List cached models

woolyai-vram-model-cache list

Add a model (compute and store hashes)

woolyai-vram-model-cache add --source <huggingface-model-id-or-local-path>
woolyai-vram-model-cache add --source meta-llama/Llama-2-7b-hf --dtype float16

Delete a cached model

woolyai-vram-model-cache delete --model <model-name>
woolyai-vram-model-cache delete --all  # delete all cached models

Specify a custom location for the model cache

woolyai-vram-model-cache --root <custom-root-directory> add --source <huggingface-model-id-or-local-path>

Prerequisites​

Setup​

Usage​

List cached models​

Add a model (compute and store hashes)​

Delete a cached model​

Specify a custom location for the model cache​