Compute Resources
Selecting the right compute tier keeps latency low while controlling costs. This guide explains the available options, auto-scaling parameters, and strategies for different workloads.
Available Tiers
Refer to the targon.Compute reference for the full list of constants. They fall into two families:
- CPU tiers (
CPU_SMALL→CPU_XL) for lightweight services, background jobs, and request routing. - GPU tiers (
H200_SMALL→H200_XL) for accelerated workloads such as LLM inference and model fine-tuning.
Use the named constants in your functions for clarity:
@app.function(resource=Compute.CPU_MEDIUM)
def preprocess():
...
Auto-Scaling Controls
@app.function() accepts min_replicas and max_replicas to configure multi-replica scaling:
@app.function(resource=Compute.H200_MEDIUM, min_replicas=1, max_replicas=4)
def serve():
...
min_replicaskeeps warm instances ready for low-latency response.max_replicascaps the number of concurrent replicas to manage cost.- Targon ensures
min_replicas ≤ max_replicasand validates the values at decoration time.
Combine these settings with observability dashboards to tune concurrency for your workload.
Matching Workloads to Tiers
- API backends / web hooks: Start with
CPU_SMALLorCPU_MEDIUM. Increase tiers only if you see sustained CPU saturation. - Batch jobs / ETL: Use
CPU_LARGEorCPU_XLfor parallel processing. Keepmin_replicas=0to avoid idle charges. - LLM inference: Select
H200_SMALLfor compact models, scale up toH200_LARGEorH200_XLwhen running multi-billion parameter models or requiring high throughput. - Hybrid workloads: Split your app into multiple functions with different resources (for example, CPU preprocessing function + GPU inference function).
Cost Optimization Tips
- Lower
min_replicasduring off-hours to reduce idle consumption. - Use per-request batching in your application logic to make better use of GPU time.
- Cache model weights inside the image to shorten cold starts and avoid repeated downloads.
- Monitor timeout and adjust
timeoutparameter to catch runaway jobs.
Related Reading
- Compute API reference
- LLM deployment guide
- Serverless guide for dashboard-based configuration