Skip to main content

Deployment Options

Deployment Options​

JAX support coming very soon

WoolyAI can be deployed and used as a service in your organization, supporting multiple teams and models. There are three main ways to deploy WoolyAI:

  1. WoolyAI Kubernetes GPU Operator (useful for small, medium, and large scale deployments with Kubernetes available)
  2. WoolyAI Controller (useful for small, medium, and large scale deployments without Kubernetes available)
  3. Direct to WoolyAI Server (useful for small scale deployments with one GPU node)

WoolyAI Kubernetes GPU Operator​

The WoolyAI GPU Operator handles deploying the WoolyAI Server pods on all GPU nodes in your cluster as well as injecting the WoolyAI libraries into the Kubernetes ML pods.

info

The WoolyAI GPU Operator is a Kubernetes helm chart.

  1. Setup Guide for WoolyAI GPU Operator

WoolyAI Controller for non-Kubernetes managed GPU nodes​

The WoolyAI Controller is a router with a Web and REST API interface that is responsible for routing kernel execution requests from ML CUDA containers running with Wooly Client libraries to the GPU node cluster running WoolyAI Server.

info

The WoolyAI Controller can be deployed as a docker container (Dockerfile or Kubernetes).

  1. Setup Guide for WoolyAI Controller
  2. Setup Guide for WoolyAI Server
  3. Setup Guide to install WoolyAI libraries inside ML CUDA Containers

Direct to WoolyAI Server​

The Direct to WoolyAI Server is a simple way to deploy WoolyAI on a single GPU node. You simply run the WoolyAI Server container and your ML CUDA container with Wooly Client libraries on the same machine.

info

No Kubernetes or Controller required.

  1. Setup Guide for Direct to WoolyAI Server
  2. Setup Guide to install WoolyAI libraries inside ML CUDA Containers