While we are talking about quotas, can you maybe add an easy way of checking how much you've used/got left?
Apparently now you need to use google-cloud-quotas to get the limit and google-cloud-monitoring to get the usage.
VS Code copilot managed to implement the first part, getting the limit using gemini-2.5-pro, but when I asked gemini to implement the second part it said that integrating cloud-monitoring is too complex and it can't do it !!!!
The thing is that the entry level of provisioned throughput is so high! I just want a reliable model experience for my small Dev team using models through Vertex but I don't think there's anything I can buy there to ensure it.
lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.
The way dependency resolution works in Java with the special, Google only, giant dynamic BOM resolver is hell on earth.
We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.
Read timeouts aren't configurable on the Vertex SDK.
Ramoz, good to hear that native Structured Outputs are working! But if the docs are 'confusing and partially incomplete,' that’s not a good DevEx. Good docs are non-negotiable. We are in the process of revamping the whole documentation site. Stay tuned, you will see something better than what we have today.
Product idea for structured outputs: Dynamic Json field... like imagine if I want a custom schema generated (e.g. for new on-the-fly structured outputs).
That’s correct! You can send images through uploading either the Files API from Gemini API or Google Cloud Storage (GCS) bucket reference. What we DON’T have a sample on is sending images through bytes. Here is a screenshot of the code sample from the “Get Code” function in the Vertex AI studio.
https://drive.google.com/file/d/1rQRyS4ztJmVgL2ZW35NXY0TW-S0...
Let me create a feature request to get these samples in our docs because I could not find a sample too. Fixing it
simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away.
We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.
I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can
In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.
100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.
simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.
For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.
However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.
I think you are talking about generativeai vs. vertexai vs. genai sdk.
And you are watching us evolve overtime to do better.
Couple clarifications
1. Going forward we only recommend using genai SDK
2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)
No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"
Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.
I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.
Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.
One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.
EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.
Hey there, I’m Chris Cho (x: chrischo_pm, Vertex PM focusing on DevEx) and Ivan Nardini (x: ivnardini, DevRel). We heard you and let us answer your questions directly as possible.
First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex
We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.
I’m going to respond to some of the comments given here directly on the thread
Can we avoid weekend changes to the API? I know it's all non-GA, but having `includeThoughts` suddenly work at ~10AM UTC on a Sunday and the raw thoughts being returned after they were removed is nice, but disruptive.
Can you tell me the exact instance when this happened please? I will take this feedback back to my colleagues. But in order to change how we behave I need a baseline and data
Thoughts used to be available in the Gemini/Vertex APIs when Gemini 2.0 Flash Thinking Experimental was initially introduced [1][2], and subsequently disabled to the public (I assume hidden behind a visibility flag) shortly after DeepSeek R1's release [3] regardless of the `include_thoughts` setting.
At ~10:15AM UTC 04 May, a change was rolled out to the Vertex API (but not the Gemini API) that caused the API to respect the `include_thoughts` setting and return the thoughts. For consumers that don't handle the thoughts correctly and had specified `include_thoughts = true`, the thinking traces then leaked into responses.
I love that you're responding on HN, thanks for that! While you're here I don't suppose you can tell me when Gemini 2.5 Pro is hitting European regions on Vertex? My org forbids me from using it until then.
Thanks for replying, and I can safely say that most of us just want first-class conformity with OpenAI's API without JSON schema weirdness (not using refs, for instance) baked in.
Hi, one thing I am really struggling with in AI studio API is stop_sequences. I know how to request them, but cannot see how to determine which stop_sequence was triggered. They don't show up in the stop_reason like most other APIs. Is that something which vertex API can do? I've built some automation tools around stop_sequences, using them for control logic, but I can't use Gemini as the controller without a lot of brittle parsing logic.
Is there an undocumented hardcoded timeout for Gemini responses even in streaming mode? JSON output according to a schema can get quite lengthy, and I can't seem to get all of it for some inputs because Gemini seemingly terminates requests
Google usually doesn't care what users say at all. This is why they so often have product-crippling bugs and missing features. At least this guy is making a show of trying before he transfers to another project.
It is incredibily lame for a gargantuan company like Google and their thousands of developers and PMs and this and that ... to come to a remote corner of the web to pretend they are doing what they should have done 10 years ago.
>Chat, briefly, what does a PM at a company like Google do?
"A Product Manager (PM) at Google is responsible for guiding the development of products from conception to launch. They identify user needs, define product vision and strategy, prioritize features, work with cross-functional teams (engineering, design, marketing), and ensure the product aligns with business goals. They act as the bridge between technical teams and stakeholders to deliver successful, user-focused solutions."
Some might have ignored your question, but in the spirit of good conversation, I figured I’d share a quick explanation of what a PM does, just in case it helps!