Skip to main content

Compute Resources

Selecting the right compute tier keeps latency low while controlling costs. This guide explains the available options, auto-scaling parameters, and strategies for different workloads.

Available Tiers

Refer to the targon.Compute reference for the full list of constants. They fall into two families:

  • CPU tiers (CPU_SMALLCPU_XL) for lightweight services, background jobs, and request routing.
  • GPU tiers (H200_SMALLH200_XL) for accelerated workloads such as LLM inference and model fine-tuning.

Use the named constants in your functions for clarity:

@app.function(resource=Compute.CPU_MEDIUM)
def preprocess():
...

Auto-Scaling Controls

@app.function() accepts min_replicas and max_replicas to configure multi-replica scaling:

@app.function(resource=Compute.H200_MEDIUM, min_replicas=1, max_replicas=4)
def serve():
...
  • min_replicas keeps warm instances ready for low-latency response.
  • max_replicas caps the number of concurrent replicas to manage cost.
  • Targon ensures min_replicas ≤ max_replicas and validates the values at decoration time.

Combine these settings with observability dashboards to tune concurrency for your workload.

Matching Workloads to Tiers

  • API backends / web hooks: Start with CPU_SMALL or CPU_MEDIUM. Increase tiers only if you see sustained CPU saturation.
  • Batch jobs / ETL: Use CPU_LARGE or CPU_XL for parallel processing. Keep min_replicas=0 to avoid idle charges.
  • LLM inference: Select H200_SMALL for compact models, scale up to H200_LARGE or H200_XL when running multi-billion parameter models or requiring high throughput.
  • Hybrid workloads: Split your app into multiple functions with different resources (for example, CPU preprocessing function + GPU inference function).

Cost Optimization Tips

  • Lower min_replicas during off-hours to reduce idle consumption.
  • Use per-request batching in your application logic to make better use of GPU time.
  • Cache model weights inside the image to shorten cold starts and avoid repeated downloads.
  • Monitor timeout and adjust timeout parameter to catch runaway jobs.