Hacker News new | past | comments | ask | show | jobs | submit | chrisheecho's comments login

We have moved our quota system to Dynamic Shared Quota (https://cloud.google.com/vertex-ai/generative-ai/docs/quotas) for 2.0+ models. There are no quotas in DSQ. If you need a guaranteed throughput there is an option to purchase Provisioned Throughput (https://cloud.google.com/vertex-ai/generative-ai/docs/provis...).


While we are talking about quotas, can you maybe add an easy way of checking how much you've used/got left?

Apparently now you need to use google-cloud-quotas to get the limit and google-cloud-monitoring to get the usage.

VS Code copilot managed to implement the first part, getting the limit using gemini-2.5-pro, but when I asked gemini to implement the second part it said that integrating cloud-monitoring is too complex and it can't do it !!!!


The thing is that the entry level of provisioned throughput is so high! I just want a reliable model experience for my small Dev team using models through Vertex but I don't think there's anything I can buy there to ensure it.


lemming, this is super helpful, thank you. We provide the genai SDK (https://github.com/googleapis/python-genai) to reduce the learning curve in 4 languages (GA: Python, Go Preview: Node.JS, Java). The SDK works for all Gemini APIs provided by Google AI Studio (https://ai.google.dev/) and Vertex AI.


The way dependency resolution works in Java with the special, Google only, giant dynamic BOM resolver is hell on earth.

We have to write code that round robins every region on retries to get past how overloaded/poorly managed vertex is (we're not hitting our quotas) and yes that's even with retry settings on the SDK.

Read timeouts aren't configurable on the Vertex SDK.


Ramoz, good to hear that native Structured Outputs are working! But if the docs are 'confusing and partially incomplete,' that’s not a good DevEx. Good docs are non-negotiable. We are in the process of revamping the whole documentation site. Stay tuned, you will see something better than what we have today.


Product idea for structured outputs: Dynamic Json field... like imagine if I want a custom schema generated (e.g. for new on-the-fly structured outputs).


ooh i like!


That’s correct! You can send images through uploading either the Files API from Gemini API or Google Cloud Storage (GCS) bucket reference. What we DON’T have a sample on is sending images through bytes. Here is a screenshot of the code sample from the “Get Code” function in the Vertex AI studio. https://drive.google.com/file/d/1rQRyS4ztJmVgL2ZW35NXY0TW-S0... Let me create a feature request to get these samples in our docs because I could not find a sample too. Fixing it


simonw, 'Google's service auth SO hard to figure out' – absolutely hear you. We're taking this feedback on auth complexity seriously. We have a new Vertex express mode in Preview (https://cloud.google.com/vertex-ai/generative-ai/docs/start/... , not ready for primetime yet!) that you can sign up for a free tier and get API Key right away. We are improving the experience, again if you would like to give feedback, please DM me on @chrischo_pm on X.


We built the OpenAI Compatible API (https://cloud.google.com/vertex-ai/generative-ai/docs/multim...) layer to help customers that are already using OAI library to test out Gemini easily with basic inference but not as a replacement library for the genai sdk (https://github.com/googleapis/python-genai). We recommend using th genai SDK for working with Gemini.


So, to be clear, Google only supports Python as a language for accessing your models? Nothing else?


We have Python/Go in GA.

Java/JS is in preview (not ready for production) and will be GA soon!


What about providing an actual API people can call without needing to rely on Google SDKs?


you can do so with the AI SDK from Vercel, open router, etc or just sending raw http requests


I couldn’t have said it better. My billing friends are working to address some of these concerns along with the Vertex team. We are planning to address this issue. Please stay tuned, we will come back to this thread to announce when we can In fact, if you can DM me (@chrischo_pm on X) with, I would love to learn more if you are interested.


Can you allow prepaid credits as well please?


100% this. We actually use OpenRouter (and pay their surcharge) with Gemini 2.5 Pro just because we can actually control spend via spent limit on keys (A++ feature) and prepaid credit.


one step ahead of you ;)


simonw, good points. The Vertex vs. non-Vertex Gemini API (via AI Studio at aistudio.google.com) could use more clarity.

For folks just wanting to get started quickly with Gemini models without the broader platform capabilities of Google Cloud, AI Studio and its associated APIs are recommended as you noted.

However, if you anticipate your use case to grow and scale 10-1000x in production, Vertex would be a worthwhile investment.


Why create two different APIs that are the same, but only subtly different, and have several different SDKs?


I think you are talking about generativeai vs. vertexai vs. genai sdk.

And you are watching us evolve overtime to do better.

Couple clarifications 1. Going forward we only recommend using genai SDK 2. Subtle API differences - this is a bit harder to articulate but we are working to improve this. Please dm at @chrischo_pm if you would like to discuss further :)


So. Three different SDKs.

No idea what any of those SDK names mean. But sure enoough searching will bring up all three of them for different combination of search terms, and none of them will point to the "recommend only using <a random name that is indistinguishable form other names>"

Oh, And some of these SDKs (and docs) do have a way to use this functionality without the SDKs, but not others. Because there are only 4 languages in the world, and everyone should be happy using them.


I think you can strongly influence which SDK your customers use by keeping the Python, Typescript, and Curl examples in the documentation up to date and uniformly use what you consider the ‘best’ SDK in the examples.

Overall, I think that Google has done a great job recently in productizing access to your models. For a few years I wrote my own utilities to get stuff done, now I do much less coding using Gemini (and less often ChatGPT) because the product offerings do mostly what I want.

One thing I would like to see Google offer is easier integrated search with LLM generation. The ‘grounding’ examples are OK, but for use in Python I buy a few Perplexity API credits and use that for now. That is the single thing I would most like to see you roll out.

EDIT: just looked at your latest doc pages, I like the express mode setup with a unified access to regular APIs vs. Vertex.


Thanks! - I like it too :)


Hey there, I’m Chris Cho (x: chrischo_pm, Vertex PM focusing on DevEx) and Ivan Nardini (x: ivnardini, DevRel). We heard you and let us answer your questions directly as possible.

First of all, thank you for your sentiment for our latest 2.5 Gemini model. We are so glad that you find the models useful! We really appreciate this thread and everyone for the feedback on Gemini/Vertex

We read through all your comments. And YES, – clearly, we've got some friction in the DevEx. This stuff is super valuable, helps me to prioritize. Our goal is to listen, gather your insights, offer clarity, and point to potential solutions or workarounds.

I’m going to respond to some of the comments given here directly on the thread


Had to move away from Gemini because the SDK just didn't work.

Regardless of if I passed a role or not, the function would say something to the effect of "invalid role, accepted are user and model".

Tried switching to openAI compatible SDK, it threw errors for tool call calls and I just gave up.

Could you confirm if it was a known bug that was fixed?



You don't have to specify role when you call through Python (https://cloud.google.com/vertex-ai/generative-ai/docs/start/...)

(which I think is what you are using but maybe i'm wrong).

Feel free to DM me on @chrischo_pm on X. Stuff that you are describing shouldn't happen


Can we avoid weekend changes to the API? I know it's all non-GA, but having `includeThoughts` suddenly work at ~10AM UTC on a Sunday and the raw thoughts being returned after they were removed is nice, but disruptive.


Can you tell me the exact instance when this happened please? I will take this feedback back to my colleagues. But in order to change how we behave I need a baseline and data


Thoughts used to be available in the Gemini/Vertex APIs when Gemini 2.0 Flash Thinking Experimental was initially introduced [1][2], and subsequently disabled to the public (I assume hidden behind a visibility flag) shortly after DeepSeek R1's release [3] regardless of the `include_thoughts` setting.

At ~10:15AM UTC 04 May, a change was rolled out to the Vertex API (but not the Gemini API) that caused the API to respect the `include_thoughts` setting and return the thoughts. For consumers that don't handle the thoughts correctly and had specified `include_thoughts = true`, the thinking traces then leaked into responses.

[1]: https://googleapis.github.io/python-genai/genai.html#genai.t...

[2]: https://ai.google.dev/api/generate-content#ThinkingConfig

[3]: https://github.com/googleapis/python-genai/blob/157b16b8df40...


Can you ask whoever owns dashboards to make it so I can troubleshoot quota exceeded errors like this? https://x.com/spyced/status/1917635135840858157


We are working on fixing this and showing the critical ones in AIS. I agree it is crazy there is 700+ items here. Real pain in the neck to deal with.


I love that you're responding on HN, thanks for that! While you're here I don't suppose you can tell me when Gemini 2.5 Pro is hitting European regions on Vertex? My org forbids me from using it until then.


Yeah, not having clear time lines for new releases on the one hand, but being quick with deprecation of older models isn't a very good experience.


Thanks for replying, and I can safely say that most of us just want first-class conformity with OpenAI's API without JSON schema weirdness (not using refs, for instance) baked in.


Or returning null for null values, not some "undefined" string.

Or not failing when passing `additionalProperties: false`

Or..


Hi, one thing I am really struggling with in AI studio API is stop_sequences. I know how to request them, but cannot see how to determine which stop_sequence was triggered. They don't show up in the stop_reason like most other APIs. Is that something which vertex API can do? I've built some automation tools around stop_sequences, using them for control logic, but I can't use Gemini as the controller without a lot of brittle parsing logic.


Thank you feedback noted


Is there an undocumented hardcoded timeout for Gemini responses even in streaming mode? JSON output according to a schema can get quite lengthy, and I can't seem to get all of it for some inputs because Gemini seemingly terminates requests


This is probably just you hitting the model's internal output length maximum. Its 65,536 tokens for 2.5 pro and flash.

For other models, see this link and open up the collapsed section for your specific model: https://ai.google.dev/gemini-api/docs/models


Thanks! It might just be that!


This is so cringe.

I hope it doesn't become a trend on this site.


A team taking the opportunity to engage directly with their users to understand their feedback so they can improve the product? So cringe.


Google usually doesn't care what users say at all. This is why they so often have product-crippling bugs and missing features. At least this guy is making a show of trying before he transfers to another project.


It’s the US style, which has made its way across the pond too: you have to make upbeat noises to remove any suspicion you’re criticizing.


Unlike others ... you got it.

It is incredibily lame for a gargantuan company like Google and their thousands of developers and PMs and this and that ... to come to a remote corner of the web to pretend they are doing what they should have done 10 years ago.


Google should have cleaned up its Gemini API 10 years ago?


>Chat, briefly, what does a PM at a company like Google do?

"A Product Manager (PM) at Google is responsible for guiding the development of products from conception to launch. They identify user needs, define product vision and strategy, prioritize features, work with cross-functional teams (engineering, design, marketing), and ensure the product aligns with business goals. They act as the bridge between technical teams and stakeholders to deliver successful, user-focused solutions."

Some might have ignored your question, but in the spirit of good conversation, I figured I’d share a quick explanation of what a PM does, just in case it helps!


This sounds accurate. I see myself as a Pain Manager more than a Product manager. Product just solves the pain that users have ;)

Sometimes we get it right the first time we launch it, I think most of the time we get it right over a period of time.

Trying to do a little bit better everyday and ship as fast as possible!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: