Open models on Cloud Run
This is a list of links from my recent talk on open models at DevFest Berlin
- My slide deck
- Cloud Run GPU
- Can you run it? - Does a model fit in the VRAM of any GPU.
- Hugging Face TGI
- Google Gemma
- Getting started with Gradio
- Ollama - Great LLM inference server for your desktop
- Hugging Face Hub
- 140 GPU instances in four minutes
- Video: Deploy TGI on Cloud Run
- Tutorial: Deploy TGI on Cloud Run
- Hugging Face Deep Learning containers on Google Cloud - Containers with PyTorch for serving and training, TGI and TEI. Managed by Google Cloud and Hugging Face
- Neural Magic ran over half a million evaluations on quantized LLMs and found they maintain accuracy
- Understanding Cloud Run request concurrency
- Benchmark of LLM inference server startup times on Cloud Run - Look for the Performance heading in this blog