#I'm not sure I follow. If you're using
1 messages · Page 1 of 1 (latest)
That is a good point, however, my understanding is that due to the difference between 'genai' and 'generativeai' (I think these are two different divisions within Google as you explained last time), their API structures in coding are different as a result.
Please refer to the snippet below, as I have presented both to illustrate the differences between 'genai' model and 'generativeai' model.
Conclusion:
- This works after repeated testing in my app - and the key is to separate them and treat each individually in a block with dedicated structure.
Gemma 3-27B-IT (New dedicated block)
elif model == "gemma-3-27b-it":
logging.debug("[get_chat_response] Using Gemma 3-27B-IT via Google Generative AI SDK.")
try:
# Step 1: Create a GenerativeModel instance for the model
generative_model = genai.GenerativeModel("gemma-3-27b-it")
# Step 2: Generate content using the model instance
response = generative_model.generate_content(user_input)
return response.text # Extract the generated text from the response
except Exception as e:
logging.error(f"[Gemma 3-27B-IT] Error generating content: {str(e)}")
return f"[Gemma 3-27B-IT] Error: {str(e)}"
Gemini 1.5 (or 2.0 depending on the specific case)
elif model in ["gemini-1.5-flash", "gemini-1.5-flash-002", "gemini-1.5-pro", "gemini-1.5-pro-002"]:
logging.debug("[get_chat_response] Using Gemini 1.5 endpoint.")
api_endpoint = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent"
headers = {
"Content-Type": "application/json",
"x-goog-api-key": GOOGLE_GEMINI_1_5_API_KEY,
}
data = {"contents": [{"parts": [{"text": user_input}]}]}
resp = requests.post(api_endpoint, headers=headers, json=data)
logging.debug(f"[get_chat_response] Gemini 1.5 HTTP status code: {resp.status_code}")
if resp.status_code == 200:
try:
return resp.json()["candidates"][0]["content"]["parts"][0]["text"]
except (KeyError, IndexError) as e:
return f"[Gemini 1.5] Error parsing response: {e}"
else:
return f"[Gemini 1.5] Request failed: {resp.status_code} - {resp.text}"
It looks like in your code, if your model is gemma, then you use the old library. But if it is Gemini, then you send the request directly without using the library.
(And I'm not sure how that formatted string works for the Gemini case either, to be honest)
It has nothing to do with genai vs generativeai (in this case, both are run by the same division and using the same endpoint)
(Here is part of a bash script I used to test that:
#MODEL=gemini-2.0-flash-lite
MODEL=gemma-3-27b-it
apiVersion=v1beta
#apiVersion=v1
#method=streamGenerateContent
method=generateContent
URL="https://generativelanguage.googleapis.com/${apiVersion}/models/${MODEL}:${method}"
echo $URL
curl \
-X POST "${URL}" \
-H 'Content-Type: application/json' \
-H "x-goog-api-key: ${API_KEY}" \
-d @<(echo '{
"contents": [
{
"parts": [
{
"text": "What is the answer to life the universe and everyitng? Answer briefly."
}
]
}
],
"generationConfig": {
"temperature": 0.9,
"topK": 40,
"topP": 0.9,
},
"safetySettings": [
{
"category": "HARM_CATEGORY_HARASSMENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_HATE_SPEECH",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
},
{
"category": "HARM_CATEGORY_DANGEROUS_CONTENT",
"threshold": "BLOCK_MEDIUM_AND_ABOVE"
}
]
}')
I believe your observation is correct.
I used the "old library" because the SDK url below advised me to do so:
https://ai.google.dev/gemini-api/docs/sdks?hl=en&_gl=1*yo9b9*_up*MQ.._gaOTg0ODg0NTY1LjE3NDE4OTU4ODE._ga_P1DBVKWT6VMTc0MTg5NTg4MS4xLjAuMTc0MTg5NTg4MS4wLjAuMTY3ODYzNTY1OQ..
Apparently, it works. lol
Thank you for sharing this code.
I will convert it to py in Windows environment, and test it.
Will let you know.
I don't see genai.GenerativeModel anywhere on that page you linked to, which talks about the new SDKs. {:
I coded my Gemma block portion based on this (refer to the snapshot) from that url page.
The Gimini 2.0 code block works fine, so I did not change anything there.
I only added Gemma block yesterday since Gemma 3 came out recently.
Wait... there is a NEW SDK?
Where is that? I am so lost.
That block that you just posted an image of IS the new SDK. But that's not what any of your code that you pasted above is showing.
(I mean, unles I'm misreading something. which has certainly been known to happen. many times.)
I see what you mean now.
My code was edited and improvised based that snippet, not just "copy and paste" - that usually wouldn't work.
BTW, my app has many LLMs, and Google takes a big chunk of it with its popular Gemini & Gemma model, especially these are free.
So, I have to ensure the code works for the whole app as well as writing modularized code that can be reused.
I will share some snippet with you.
This is the chatbot portion.
Chathead's can be renamed, with a timestamp, and more.
Users can upload .pdf and use Gemini 2.0 to summarize, analyze, etc.
Dark / Light mode; hide / unhide; / Refresh, etc.
TTS: Use Windows SpeechSynthesis api to generate voice in any lanugage packages installed your Windows OS.
Any other UI / UX function.
This is the Dynamic Text Editor section:
- Users can keep all the interaction with LLM, and synchronize it into this editor dynamically.
- Then users will be able to edit it, download it, delete it...
- There are 1,000 tabs on this editor currently, then users can use an AI model (let's say Gemma3) to go into this "database" then anaylyze per users request.
This is the sumamrizer portion => summarize content based on given url, then convert it into .png with a click.
work-in-progress:
- Vision => Text-to-image (Gemini Vision) I am working on this one. Since this one is not technically "free" from Gemini, so it is a bit tricky.
I think the Gemini's vision now allows users to not only using text-to-image, but also be able to "edit an uploaded image", right?
Not sure where that information is available, if you know, pls let me know.
Thanks.
"edit an uploaded image" is basically text-and-image-to-text-and-image. You give it an image and instructions in the prompt, and it replies.
See https://ai.google.dev/gemini-api/docs/image-generation#gemini
I am also working on "uploading a mp4" then ask Gemini to analyze it.
Does the new Gemma-3 model have new figures / function with respect to image or voice editing, pls?
No. Gemma 3 does handle image input, it does not do other modalities for input, and text only output.
Gemma 3 is not as capable a model as the latest Gemini models.
My mission in creating this application is to become your ultimate AI sidekick! I am building a platform that lets you tap into the power of multiple LLMs – think of it as having a whole team of brilliant (and tireless) digital assistants at your command, either for free or at a low cost. It keeps a detailed record of all your conversations, building a ‘memory’ of your work and learning right there on your desktop, or a database of your choice. But I don’t stop there! This application can also help you unlock your superpowers: analyze your progress, pinpoint what you’re crushing, and suggest ways to level up. And because learning shouldn’t feel like a chore, I am committed to making the whole experience… well, actually enjoyable – visually, with a clean and intuitive interface, aurally, with subtle and satisfying feedback sounds, and even tactilely, through responsive animations that feel natural and engaging. I am aiming for an experience that feels less like using software and more like collaborating with a thoughtful partner.
I also tested the webscrape and thinking models. There are more restrictions on this, e.g. I am pretty sure that techcrunch.com recently updated (or enhanced) its website’s authorization settings to disallow any webscraping related activities. lol
Gemini’s thinking model is very good in my opinion, but I have not yet been able to develop a method to utilize it in a way that can match or even outperforms o1 from OpenAI. I think if we can effectively leverage this model’s "thinking" capability – it wil be able to create great potentials to benefit everyone.