Pricing

Pay for what you use pricing.

We pay for the compute you use, and you only pay for what you use.

Pricing Calculator

4 cores
4 cores32 cores
16 GB
16 GB32 GB

Hourly Rate

$1.99/hour

Per Second

$0.0006/second

Let's compute the cost of your AI workloads.

Frequently asked questions

Your questions answered.
How does the pricing work?
Our pricing is usage-based, you only pay for the compute resources when your code is actively running. There are no charges when your service is idle, and we automatically scale up or down based on your traffic needs. Billing is calculated rounded up to the nearest second.
Does the service automatically scale?
Yes, your inference services automatically scale based on your traffic needs. You can configure upper and lower bounds for your service, and we'll handle the scaling within those limits. This ensures you have enough capacity during peak times while optimizing costs during lower traffic periods.
What types of GPUs are available?
Currently, we support Nvidia L4 GPUs, which offer 24GB of memory, suitable for most AI workloads. We are actively working to expand our GPU offerings to include more options, ensuring compatibility with a wider range of applications, including large language models.
What are the startup times?
Our startup times can be as low as 5 seconds. However, if your container loads a large model, there may be additional overhead. We are continuously optimizing our infrastructure to minimize startup delays and ensure efficient scaling.