Modal Labs
High-performance AI infrastructure that developers love, with sub-second cold starts and instant autoscaling.
Cold Start Overhead
Sub-10ms
GPU Scaling
0 to 1000+ GPUs instantly
Free Tier
$30/month free compute
Compute Billing
Pay-per-second
About Modal Labs
Modal is a serverless cloud platform engineered for high-performance AI and ML workloads. Developers use the Modal Python SDK to define and run containerized functions in the cloud, abstracting away complex infrastructure management. The platform is optimized for speed, boasting sub-second cold starts and autoscaling capabilities that can provision thousands of GPUs on-demand. Key use cases include deploying and scaling LLM inference, fine-tuning open-source models on multi-node clusters, and programmatically scaling secure, ephemeral sandboxes for running untrusted code like AI agents. With integrated observability, a pay-per-second pricing model, and a globally distributed GPU infrastructure, Modal aims to make cloud development as seamless as local development for teams of all sizes.
Core Workloads
Inference
Deploy and scale inference for LLMs, audio, and image/video generation with low latency.
Model Training
Fine-tune open-source models on single or multi-node GPU clusters.
Sandboxes
Programmatically scale secure, ephemeral environments for running untrusted code and AI agents.
Batch & Async
Run large-scale parallel jobs for evals, embeddings, and dataset generation.
Platform Features
Python Sdk
Define all infrastructure, from logic to hardware, directly in Python code.
Serverless Gpus
Access a wide range of GPUs (H100s, A100s, A10Gs) on-demand across multiple cloud providers.
Fast Runtimes
AI-native runtime with instant container boot and super-fast autoscaling.
Observability
Integrated logging and full visibility into every function, sandbox, and container.
Security & Governance
Soc2 & Hipaa
Compliance with industry standards for data security.
Battle-Tested Isolation
Secure, isolated environments for all code execution.
Data Residency
Controls for specifying the geographic region for data processing.