Compute Resources

Selecting the right compute tier keeps latency low while controlling costs. This guide explains the available options, auto-scaling parameters, and strategies for different workloads.

Available Tiers

Refer to the targon.Compute reference for the full list of constants. They fall into two families:

CPU tiers (CPU_SMALL → CPU_XL) for lightweight services, background jobs, and request routing.
GPU tiers (H200_SMALL → H200_XL) for accelerated workloads such as LLM inference and model fine-tuning.

Use the named constants in your functions for clarity:

@app.function(resource=Compute.CPU_MEDIUM)
def preprocess():
    ...

Auto-Scaling Controls

@app.function() accepts min_replicas and max_replicas to configure multi-replica scaling:

@app.function(resource=Compute.H200_MEDIUM, min_replicas=1, max_replicas=4)
def serve():
    ...

min_replicas keeps warm instances ready for low-latency response.
max_replicas caps the number of concurrent replicas to manage cost.
Targon ensures min_replicas ≤ max_replicas and validates the values at decoration time.

Combine these settings with observability dashboards to tune concurrency for your workload.

Matching Workloads to Tiers

API backends / web hooks: Start with CPU_SMALL or CPU_MEDIUM. Increase tiers only if you see sustained CPU saturation.
Batch jobs / ETL: Use CPU_LARGE or CPU_XL for parallel processing. Keep min_replicas=0 to avoid idle charges.
LLM inference: Select H200_SMALL for compact models, scale up to H200_LARGE or H200_XL when running multi-billion parameter models or requiring high throughput.
Hybrid workloads: Split your app into multiple functions with different resources (for example, CPU preprocessing function + GPU inference function).

Cost Optimization Tips

Lower min_replicas during off-hours to reduce idle consumption.
Use per-request batching in your application logic to make better use of GPU time.
Cache model weights inside the image to shorten cold starts and avoid repeated downloads.
Monitor timeout and adjust timeout parameter to catch runaway jobs.

Compute API reference
LLM deployment guide
Serverless guide for dashboard-based configuration

Available Tiers​

Auto-Scaling Controls​

Matching Workloads to Tiers​

Cost Optimization Tips​

Related Reading​

Available Tiers

Auto-Scaling Controls

Matching Workloads to Tiers

Cost Optimization Tips

Related Reading