Environment Assets¶
Environment assets define container images and runtime configurations that your workloads will use. They standardize the software stack and ensure consistent environments across teams.
What are Environment Assets?¶
Environment assets specify: - Container Image: The base Docker image with your tools and frameworks - Environment Variables: Runtime configuration and secrets - Working Directory: Default execution path - Commands: Startup commands and entry points
Creating Environment Assets¶
Method 1: Using the UI¶
- Navigate to Assets → Environments
- Click "+ NEW ENVIRONMENT"
- Configure the environment:
Basic Configuration:
Container Settings:
Environment Variables:
Method 2: Using YAML¶
Create pytorch-env.yaml
:
apiVersion: run.ai/v1
kind: Environment
metadata:
name: pytorch-gpu-training
namespace: runai-<project-name>
spec:
image: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
workingDir: /workspace
env:
- name: PYTHONPATH
value: /workspace
- name: OMP_NUM_THREADS
value: "4"
command: ["/bin/bash"]
args: ["-c", "jupyter lab --allow-root --ip=0.0.0.0"]
Apply with:
Common Environment Examples¶
1. Data Science Environment¶
Jupyter with TensorFlow:
Name: tensorflow-jupyter
Image: tensorflow/tensorflow:2.13.0-gpu-jupyter
Working Directory: /workspace
Environment Variables:
JUPYTER_ENABLE_LAB: yes
JUPYTER_TOKEN: ""
Command: jupyter lab --allow-root --ip=0.0.0.0 --port=8888
2. PyTorch Training Environment¶
Research and Development:
Name: pytorch-research
Image: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
Working Directory: /workspace
Environment Variables:
CUDA_VISIBLE_DEVICES: all
PYTHONPATH: /workspace:/workspace/src
WANDB_PROJECT: research-experiments
Command: python -m pytest --version && python --version
3. Custom Environment with Dependencies¶
MLflow Tracking:
Name: mlflow-training
Image: python:3.9-slim
Working Directory: /workspace
Environment Variables:
MLFLOW_TRACKING_URI: http://mlflow-server:5000
EXPERIMENT_NAME: model-training
Setup Commands:
- pip install torch torchvision mlflow scikit-learn
- pip install transformers datasets
Using Environment Assets¶
In Workload Creation¶
Via UI: 1. Create new workload 2. Environment section → Select your environment asset 3. Optionally override environment variables
Via CLI:
runai submit "training-job" \
--environment pytorch-gpu-training \
--gpu 1 \
--volume /data:/workspace/data
Override Environment Settings¶
Add Extra Environment Variables:
runai submit "custom-training" \
--environment pytorch-gpu-training \
--env BATCH_SIZE=32 \
--env LEARNING_RATE=0.001 \
--gpu 1
Override Working Directory:
runai submit "notebook-session" \
--environment tensorflow-jupyter \
--working-dir /workspace/notebooks \
--interactive
Environment Variable Management¶
1. Common Variables¶
GPU Configuration:
Python Environment:
Framework Specific:
# TensorFlow
TF_CPP_MIN_LOG_LEVEL=2
TF_GPU_ALLOCATOR=cuda_malloc_async
# PyTorch
TORCH_HOME=/workspace/.torch
OMP_NUM_THREADS=4
2. Experiment Tracking¶
Weights & Biases:
MLflow:
Best Practices¶
1. Image Selection¶
Use Official Images:
# Good choices
pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
tensorflow/tensorflow:2.13.0-gpu
nvidia/cuda:11.8-cudnn8-devel-ubuntu20.04
Version Pinning:
# Pin specific versions for reproducibility
Image: pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
# Avoid: pytorch/pytorch:latest
2. Environment Organization¶
Naming Convention:
# Framework-purpose-version pattern
pytorch-training-v2.0
tensorflow-inference-v2.13
jupyter-datascience-v3.9
Scope Management: - Project scope: Team-specific configurations - Cluster scope: Organization-wide base images
3. Security Considerations¶
Avoid Hardcoded Secrets:
# Don't do this
Environment Variables:
API_KEY: sk-1234567890abcdef # Never hardcode secrets
# Do this instead
Environment Variables:
API_KEY_FILE: /workspace/secrets/api-key
Use Minimal Base Images:
Troubleshooting¶
Common Issues¶
Image Pull Failures:
# Check image exists
docker pull pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel
# Verify registry access
kubectl get secrets -n runai-<project>
Environment Variable Issues:
# Debug inside running workload
runai exec <workload-name> -- env | grep CUDA
runai exec <workload-name> -- echo $PYTHONPATH
Permission Problems:
# Check user context in container
runai exec <workload-name> -- whoami
runai exec <workload-name> -- ls -la /workspace
Debugging Commands¶
Test Environment:
# Submit test workload
runai submit "env-test" \
--environment pytorch-gpu-training \
--command "python --version && pip list | head -10" \
--gpu 0.1
# Check environment variables
runai logs env-test
Validate GPU Access:
runai submit "gpu-test" \
--environment pytorch-gpu-training \
--command "python -c 'import torch; print(torch.cuda.is_available())'" \
--gpu 0.1
Next Steps¶
- Create Base Environments: Start with framework-specific environments
- Customize for Teams: Add team-specific tools and configurations
- Version Control: Maintain multiple versions for different use cases
- Test Thoroughly: Validate environments before team deployment
Related Assets¶
- Compute Resources - Pair environments with appropriate hardware
- Data Sources - Mount data into your environments
- Credentials - Secure access to external services