Cloud Run Adds GPUs

August 21, 2024 Wietse Venema

Cloud Run, the scale-to-zero, fully-managed container platform on Google Cloud, adds GPUs as a public preview.



Here's the summary:

Specs

  • One NVIDIA L4 GPU (24GB vRAM) per Cloud Run instance (many instances per Cloud Run service).
  • Drivers are pre-installed.
  • Minimum instance size to enable GPU is 4 vCPU and 16 GiB memory.

Use Cases

Autoscaling

  • Scale to zero: When there are no incoming requests, Cloud Run stops all remaining instances and you’re not charged.
  • Fast cold start: When scaling from zero, processes in the container can use the GPU in approximately 5 seconds. You can get Gemma 2 (2B, Q4_0) to return tokens after 11 seconds (best case).
  • Maximum instances: Defaults to 7, and there is a quota increase available.
  • Scale out speed: During the launch event, Frank showed a service that generated images with Stable Diffusion. He scaled out the service to 100 GPU instances in under 4 minutes (watch the demo).

Allow List During Public Preview

During public preview, access is gated to ensure good quality of service. Link to request access: g.co/cloudrun/gpu

Regions

  • Today: us-central1
  • Later: europe-west4 (Netherlands) and asia-southeast1 (Singapore)

Links

* * *