Making Thousands of Open LLMs Bloom in the Vertex AI Model Garden: A Collaborative Journey

April 12, 2024

A Collaboration to Empower AI Builders

Hugging Face, a leading language AI company, has announced a significant integration with Google Cloud, making it easier for developers to deploy foundation models securely and reliably. The new feature, "Deploy on Google Cloud," offers a seamless experience for developers to harness the power of open LLMs (Large Language Models) within their own Google Cloud environment. This collaboration simplifies the process of building production-ready Generative AI applications, removing the burden of managing infrastructure and servers. With just a few clicks, developers can now deploy thousands of open models as API Endpoints on Vertex AI or Google Kubernetes Engine (GKE), benefiting from dedicated configurations and assets provided by Hugging Face.

How it Works: A Seamless Deployment Journey

The deployment process is straightforward and user-friendly, whether starting from the Hugging Face Hub or directly within the Google Cloud Console.

From the Hub:

Open the "Deploy" menu and select "Google Cloud."

You will be directed to the Google Cloud Console, where you can choose between Vertex AI or GKE as your deployment environment.

With Vertex AI, it's as simple as a single click to deploy the model. For GKE, follow the provided instructions and manifest templates to deploy on a new or existing Kubernetes Cluster.

From Vertex Model Garden:

Access the Vertex Model Garden, Google Cloud's hub for discovering and deploying models.

Select the "Deploy From Hugging Face" option to search and deploy Hugging Face models within your Google Cloud console.
Use the provided form to search for specific model IDs.

Choose your desired model, and Vertex AI will automatically fill in the required configurations for deployment to Vertex AI or GKE.

A Partnership with Exciting Possibilities

This collaboration between Hugging Face and Google Cloud marks a significant step towards making AI more open and accessible. With the new deployment options, developers can focus on building innovative AI applications without the hassle of managing infrastructure. The companies plan to continue their partnership, enabling more experiences and tools to build AI with open models on Google Cloud. This exciting journey will empower developers and organizations to leverage the power of Generative AI and drive innovation in various industries. To learn more, visit the Hugging Face blog, where you can also explore other recent announcements, including serverless GPU inference and chatbot advancements.

About Hugging Face: Hugging Face is a language AI company that provides tools and models to help developers build the next generation of AI applications. With a focus on open and accessible AI, Hugging Face offers a comprehensive platform, including the Hugging Face Hub, a collaborative hosting platform for ML models, datasets, and projects.

About Google Cloud: Google Cloud offers a suite of cloud computing services, including infrastructure, data analytics, AI, and machine learning. With a focus on security and reliability, Google Cloud provides scalable and flexible solutions for businesses and developers. Vertex AI, part of Google Cloud, is a managed machine learning platform that simplifies the process of building, deploying, and managing ML models.

Original Article: Making thousands of open LLMs bloom in the Vertex AI Model Garden

Future Tech Feed