Introduction
What is Wooly?​
WoolyAI technology is a game-changing CUDA abstraction layer that sits at the intersection of software and GPU hardware and completely disrupts how GPU resources are consumed. We decouple the Kernel Shader execution from applications that use CUDA into a Wooly Abstraction layer. In this abstraction layer, we compile these applications to a new binary and Shaders are compiled into a Wooly Instruction Set. At runtime, Kernel Shader launch events initiate transfer of Shader over the network from a CPU host to a GPU host where they are recompiled and their execution is managed to achieve max GPU resource utilization, isolation between workloads and cross compatibility with hardware vendors before being converted to be passed on to the respective GPU hardware runtime and drivers. Wooly abstraction layer is in principle similar to an Operating System which sits on top of the hardware and enables the most efficient, and reliable execution of multiple workloads.
Instead of hard-partitioning GPUs or overpaying for unused GPU reserved/hold time, Wooly Abstraction Layer manages execution of multiple user workloads on GPU similar to how an Operating System manages execution of multiple applications. This allows for:
- Efficient allocation of GPU memory and processing resources to every running workload
- Maximum GPU utilization at all times
- Flexible GPU memory and processing cycles allocation at runtime to meet pre configured SLA
- Tracking of Workload Resource-usage metrics We are calling this Unbound GPU Execution Era. Note - Currently supports Pytorch applications only. Support for other non-Pytorch applications using CUDA like Ollama and more are coming soon.
Get Started with WoolyAI Acceleration Service today!
WoolyAI Acceleration Service is our GPU Cloud service built on top of our CUDA abstraction layer - WoolyStack.
- Sign up for WoolyAI, and we'll send you a token.
- Deploy the Wooly Client Container.
- Login to the WoolyAI GPU Acceleration service using your token from inside the container.
- Configure your custom Pytorch training/finetuning environment inside the container and execute it. It will automatically use the WoolyAI GPU Acceleration Service.