Compute Resources
Selecting the right compute tier keeps latency low while controlling costs. This guide explains the available options, auto-scaling parameters, and strategies for different workloads.
Available Tiers
Refer to the targon.Compute reference for the full list of constants. They fall into:
- CPU tiers (
CPU_SMALL→CPU_XL) for lightweight services, background jobs, and request routing. - GPU tiers by accelerator family — H200, H100, B200 (each
SMALL→XL), plus RTXP6000B (SMALL) and RTX4090 (SMALL→LARGE).
Use the named constants in your functions for clarity:
@app.function(resource=Compute.CPU_MEDIUM)
def preprocess():
...
Auto-Scaling Controls
@app.function() accepts min_replicas and max_replicas to configure multi-replica scaling:
@app.function(resource=Compute.H200_MEDIUM, min_replicas=1, max_replicas=4)
def serve():
...
min_replicaskeeps warm instances ready for low-latency response.max_replicascaps the number of concurrent replicas to manage cost.- Targon ensures
min_replicas ≤ max_replicasand validates the values at decoration time.
Combine these settings with observability dashboards to tune concurrency for your workload.
Matching Workloads to Tiers
- API backends / web hooks: Start with
CPU_SMALLorCPU_MEDIUM. Increase tiers only if you see sustained CPU saturation. - Batch jobs / ETL: Use
CPU_LARGEorCPU_XLfor parallel processing. Keepmin_replicas=0to avoid idle charges. - LLM inference: Pick a tier that matches model size and throughput — start smaller within your accelerator family (for example
H200_SMALL) and scale up (*_LARGE/*_XL) as needed. - Hybrid workloads: Split your app into multiple functions with different resources (for example, CPU preprocessing function + GPU inference function).
Cost Optimization Tips
- Lower
min_replicasduring off-hours to reduce idle consumption. - Use per-request batching in your application logic to make better use of GPU time.
- Cache model weights inside the image to shorten cold starts and avoid repeated downloads.
- Monitor timeout and adjust
timeoutparameter to catch runaway jobs.
Related Reading
- Compute API reference
- LLM deployment guide
- Serverless guide for dashboard-based configuration