Launching Workloads with GPU Fractions¶
This tutorial shows you how to use GPU fractions to optimize resource utilization in Run:AI. GPU fractions allow multiple workloads to share a single GPU, maximizing efficiency and reducing costs.
What are GPU Fractions?¶
GPU fractions allow you to: - Share GPUs among multiple workloads - Optimize Resource Usage by allocating only what you need - Reduce Costs by avoiding over-provisioning - Increase Throughput by running more workloads simultaneously
When to Use GPU Fractions¶
Ideal Use Cases: - Development and experimentation - Small model training - Inference workloads - Data preprocessing tasks - Jupyter notebooks for exploration
Avoid for: - Large model training requiring full GPU memory - High-performance computing workloads - Production inference with strict latency requirements
Prerequisites¶
Before starting, ensure: - Your cluster has GPU fractions enabled - You have a project with GPU quota - Your workload can benefit from fractional GPU resources
Method 1: Using the UI¶
1. Create a New Workload¶
- Navigate to Workload manager → Workloads
- Click "+NEW WORKLOAD"
- Choose your workload type (Training, Workspace, etc.)
2. Configure Compute Resources¶
- Resource Type: Select "GPU Portion" instead of "Whole GPU"
- Fraction Size: Choose from available options:
- Memory Allocation: Automatically calculated based on fraction size
3. Example Configuration¶
Small Development Workload:
Compute Resource: GPU Portion
GPU Fraction: 0.25 (25%)
GPU Memory: ~6GB (on 24GB GPU)
CPU: 2 cores
Memory: 4Gi
Medium Training Job:
Compute Resource: GPU Portion
GPU Fraction: 0.5 (50%)
GPU Memory: ~12GB (on 24GB GPU)
CPU: 4 cores
Memory: 8Gi
Method 2: Using the CLI¶
1. Basic GPU Fraction Submission¶
# Submit workload with 25% GPU
runai submit "fractional-training" \
--image pytorch/pytorch:latest \
--gpu-request-type portion \
--gpu-portion-request 0.25 \
--cpu-request 2 \
--memory-request 4Gi
2. Advanced Fractional Configurations¶
Jupyter Notebook with Small Fraction:
runai submit "jupyter-dev" \
--image jupyter/tensorflow-notebook \
--gpu-request-type portion \
--gpu-portion-request 0.1 \
--cpu-request 1 \
--memory-request 2Gi \
--port 8888:8888 \
--interactive
Model Training with Medium Fraction:
runai submit "bert-fine-tune" \
--image huggingface/transformers-pytorch-gpu \
--gpu-request-type portion \
--gpu-portion-request 0.5 \
--cpu-request 4 \
--memory-request 8Gi \
--volume /data/models:/workspace/models \
--command "python fine_tune.py --batch_size 16"
Inference Service with Dynamic Scaling:
runai submit "inference-api" \
--image tensorflow/serving:latest-gpu \
--gpu-request-type portion \
--gpu-portion-request 0.25 \
--cpu-request 2 \
--memory-request 4Gi \
--port 8501:8501 \
--service-type LoadBalancer
Method 3: Dynamic GPU Fractions¶
Dynamic fractions allow workloads to request more GPU resources when available and scale down when needed.
1. Configure Dynamic Fractions¶
runai submit "dynamic-training" \
--image pytorch/pytorch:latest \
--gpu-request-type portion \
--gpu-portion-request 0.25 \
--gpu-portion-limit 0.75 \
--cpu-request 2 \
--memory-request 4Gi
2. Dynamic Fraction Policies¶
Elastic Training:
# workload-config.yaml
apiVersion: run.ai/v1
kind: Workload
metadata:
name: elastic-training
spec:
resources:
gpu:
type: portion
request: 0.25 # Minimum GPU needed
limit: 1.0 # Maximum GPU that can be used
scaling: elastic # Allow dynamic scaling
cpu:
request: 2
limit: 8
memory:
request: 4Gi
limit: 16Gi
Monitoring GPU Fraction Usage¶
1. Check Utilization¶
Via CLI:
# Monitor workload resource usage
runai top <workload-name>
# Get detailed resource information
runai describe <workload-name>
# List all workloads with resource usage
runai list --show-resources
Via UI: - Navigate to your workload in the dashboard - View real-time GPU utilization metrics - Monitor memory usage and CPU consumption
2. GPU Sharing Visibility¶
# See which workloads share the same GPU
runai get node <node-name> --show-workloads
# Check GPU allocation across cluster
runai cluster-info --show-gpu-allocation
Optimizing Code for GPU Fractions¶
1. Memory Management¶
Limit TensorFlow GPU Memory Growth:
import tensorflow as tf
# Configure GPU memory growth
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
PyTorch Memory Management:
import torch
# Set memory fraction (for 25% GPU)
torch.cuda.set_per_process_memory_fraction(0.25)
# Enable memory pooling
torch.cuda.empty_cache()
# Monitor memory usage
def check_gpu_memory():
print(f"Allocated: {torch.cuda.memory_allocated() / 1024**3:.2f} GB")
print(f"Reserved: {torch.cuda.memory_reserved() / 1024**3:.2f} GB")
2. Batch Size Optimization¶
# Adjust batch size based on GPU fraction
import os
def get_optimal_batch_size():
gpu_fraction = float(os.environ.get('GPU_FRACTION', '1.0'))
base_batch_size = 32
# Scale batch size with GPU fraction
optimal_batch_size = int(base_batch_size * gpu_fraction)
return max(optimal_batch_size, 1) # Minimum batch size of 1
batch_size = get_optimal_batch_size()
print(f"Using batch size: {batch_size}")
3. Model Size Considerations¶
# Check if model fits in fractional GPU memory
def check_model_size(model, gpu_fraction=0.25):
"""Estimate if model fits in fractional GPU memory"""
param_size = sum(p.numel() * p.element_size() for p in model.parameters())
buffer_size = sum(b.numel() * b.element_size() for b in model.buffers())
total_size_gb = (param_size + buffer_size) / 1024**3
available_memory = 24 * gpu_fraction # Assuming 24GB GPU
print(f"Model size: {total_size_gb:.2f} GB")
print(f"Available memory: {available_memory:.2f} GB")
return total_size_gb < available_memory * 0.8 # 80% utilization threshold
Best Practices¶
1. Choosing the Right Fraction¶
Development Work:
# Use small fractions for experimentation
--gpu-portion-request 0.1 # 10% for quick tests
--gpu-portion-request 0.25 # 25% for development
Production Inference:
Training Jobs:
# Use larger fractions or whole GPUs
--gpu-portion-request 0.75 # 75% for medium models
--gpu-request 1 # Whole GPU for large models
2. Resource Planning¶
Calculate GPU Requirements:
def calculate_gpu_needs():
"""Calculate GPU fraction needed based on model requirements"""
# Model parameters
model_params = 110_000_000 # 110M parameters (BERT-base)
bytes_per_param = 4 # float32
# Training overhead (gradients, optimizer states, activations)
training_overhead = 4 # 4x model size
# Total memory needed (in GB)
memory_needed = (model_params * bytes_per_param * training_overhead) / 1024**3
# GPU memory available (24GB GPU)
gpu_memory = 24
# Required fraction
fraction_needed = memory_needed / gpu_memory
# Add safety margin
recommended_fraction = min(fraction_needed * 1.2, 1.0)
return recommended_fraction
3. Monitoring and Alerts¶
# Set up resource monitoring
runai submit "monitored-workload" \
--gpu-portion-request 0.5 \
--alert-on-gpu-utilization-low 50 \
--alert-on-memory-usage-high 90
Troubleshooting¶
Common Issues¶
Out of Memory Errors:
# Reduce batch size
batch_size = batch_size // 2
# Enable gradient checkpointing
model = torch.utils.checkpoint.checkpoint_sequential(model, segments=2)
# Use mixed precision training
from torch.cuda.amp import autocast, GradScaler
scaler = GradScaler()
Poor Performance:
# Check if you need more GPU resources
runai describe <workload-name> | grep "GPU Utilization"
# Consider upgrading to larger fraction
runai update <workload-name> --gpu-portion-request 0.5
GPU Not Allocated:
# Check cluster availability
runai cluster-info
# Verify project quota
runai describe project <project-name>
Debugging Commands¶
# Check GPU sharing on node
kubectl describe node <node-name> | grep nvidia.com/gpu
# View detailed resource allocation
runai get workload <workload-name> -o yaml | grep resources -A 10
# Monitor real-time GPU usage
watch -n 1 'runai top <workload-name>'
Next Steps¶
Now that you understand GPU fractions:
- Experiment with Different Sizes: Test various fractions for your workloads
- Implement Dynamic Scaling: Use dynamic fractions for elastic workloads
- Optimize Your Code: Adapt applications for fractional GPU resources
- Monitor Resource Usage: Set up comprehensive monitoring
Related Guides¶
- Jupyter Notebook Quickstart - Perfect for fractional GPU usage
- Standard Training Quickstart - When to use whole GPUs
- Distributed Training Quickstart - Scaling beyond single GPUs