#How to import and deploy a pre-trained texttoimage model on Google Cloud for a high-traffic project?

14 messages · Page 1 of 1 (latest)

hoary cave
#

Hello, I am working on an e-commerce project and I need a text-to-image model. I want to deploy this model on Google Cloud Platform (GCP), but this process seems quite new and complicated for me. Since I have limited time, I would like to know which of the following scenarios is more suitable:

Using ready-made GitHub models: For example, pre-trained models like Stable Diffusion. Can I import and use these models on GCP? If possible, can you share the recommended steps for this?

Google Cloud Marketplace: Would it be easier to buy a ready-made solution from GCP Marketplace? If so, what are the recommended APIs or services?

My goal:
To take inputs from user data (e.g. a string array) in the backend and return output via a text-to-image API.
Since I have an e-commerce project, I need a scalable solution for high traffic.
Information:
Backend: Requests will come via REST API.
My project allows users to create customized visuals (e.g. product designs).
Instead of training a model from scratch, I prefer ready-made solutions that will save time.
My questions:
Which way is more practical and faster? A ready-made model from GitHub or a solution from Google Cloud Marketplace?
If I prefer a model from GitHub, what steps should I follow to import these models to GCP?
How can I optimize a scalable text-to-image solution on GCP for a high-traffic application?
What platforms am I asking about:
If you have experience with Stable Diffusion or similar models, can you share them?
I would like to get suggestions from those who have started such a project on Google Cloud.

brittle moth
# hoary cave Hello, I am working on an e-commerce project and I need a text-to-image model. I...

realistically you'll probably need to filter and rewrite the user inputs before passing them to generate an image

use something like Llama Guard or whatever your preferred provider offers for filtering

use any LLM to rewrite it to work better with the image generation model you choose, and order it to follow whichever generated image style guidelines you want to standardize

Google bundles most of their AI-related offerings under Vertex AI, read its documentation or use some other provider like fal.ai

Models are not hosted directly on GitHub - The code for running them may be, but the model itself is typically hosted in Hugging Face.

#

The fastest in terms of development time is just using a serverless inference API
As far as performance goes it shouldn't be a too big difference between self-hosting or using a third-party provider, but using a serverless API means no need to handle scaling yourself

#

as far "importing to GCP", read the documentation of each specific model / inference program, it may vary but many of them should have docker images

hoary cave
# brittle moth realistically you'll probably need to filter and rewrite the user inputs before ...

Thanks for the detailed explanation! Since I have limited time, instead of dealing with LLM-based solutions, I am looking for a direct inference API or an easy-to-integrate solution as you said. I especially want to work with pre-trained models like DALL-E or Stable Diffusion.

My limitations:
Google Cloud is not mandatory, but if I can find a solution there, I might consider using it.
Instead of hosting the model myself (self-hosting), I would prefer to use a serverless inference API without spending time on scaling and infrastructure management. For example, a solution like Hugging Face or OpenAI API.
My question is:
Apart from Hugging Face or OpenAI, do you recommend any other inference API providers? Which providers are more affordable and scalable?
How can I create a "standardized style guideline" for DALL-E, Stable Diffusion, or similar models? I want to keep this guideline simple and user-friendly.
As you mentioned, which one would you recommend for a quick start among Vertex AI or alternative providers? For example, if you have a service you recommend like Fal.ai, can you share your experiences?
Why API?
In my e-commerce platform, I aim to allow users to create customized product images. However, since these transactions will occur frequently, speed, stability, and easy integration are among my priorities.
While I am open to Docker images or self-hosting solutions, an API-based solution makes more sense to me in terms of scalability.

brittle moth
#

How can I create a "standardized style guideline" for DALL-E, Stable Diffusion, or similar models? I want to keep this guideline simple and user-friendly.
really just pass the user prompt to a model and have it generate according to those guidelines
nearly any provider that supports image generators will also support llms, and the cost and latency of formatting the prompt is very cheap comparing to generating an image out of it

#

as far as comparing different providers go, compare them yourself
it's more or less the same thing - just one request (or function call using their sdk) for reformating the prompt, then one for generating the actual image

#

enforcing a consistent style without having control of the prompt fed into the model is not really viable, unless you train the model to generate in some specific style

hoary cave
#

Which ones should I do on the backend and frontend side? I think the main part is that I will be in the backend, and there will be APIs in the frontend. Tell me what I will do here. I use dotnet for backend, nextjs for frontend.

brittle moth
#

this is probably too much spoonfeeding, but I wanted something to keep me occupied for a few mins so made this as an example

the "connect to an AI" part is really that simple, you still need to understand how network requests work overall though

#

(code in the spoiler file)

gaunt cradle
cerulean swift
#

not really unsloth either