Cloud Run at Cloud Next 2025
Get ready for Cloud Next 2025, where Cloud Run takes center stage with sessions covering everything from new features and serverless GPUs to AI agents and high-availability services.
Cloud Run is a container platform on Google Cloud, and I'm part of the Cloud Run team at Google. Before joining Google, I wrote the O'Reilly book on it.
Get ready for Cloud Next 2025, where Cloud Run takes center stage with sessions covering everything from new features and serverless GPUs to AI agents and high-availability services.
Google Cloud Run now offers NVIDIA L4 GPUs in public preview, enabling fast, flexible, and cost-effective deployment of AI workloads like large language models with scale-to-zero capabilities.
At TEQNation 2024, I demonstrated how to build a complex LLM app using RAG and ReACT, grounded in over 20 million Hacker News comments, and explained how I did data scraping, indexing, and implementation with LangChain and Cloud Run.
Learn to build a Retrieval Augmented Generation (RAG) app with LangChain and Gemini on Cloud Run, using a vector database to answer questions from Cloud Run release notes.
At Cloud Next 2024, I joined LangChain's founder and others to show how to build generative AI applications on Google Cloud using LangChain. I've included a demo of L'Oreal's internal GPT and a hands-on guide to building a LangChain app from scratch.
Deploy Google's Gemma 2B open model on Cloud Run using Ollama for CPU-based inference with this sample project, providing an accessible way to run smaller LLMs in a serverless environment.
My O'Reilly book on Google Cloud Run teaches you how to deploy containerized applications on a highly scalable serverless platform, with practical examples for developers, sysadmins, and cloud engineers.