Hacker News new | past | comments | ask | show | jobs | submit login
Generative AI support on Vertex AI is now generally available (cloud.google.com)
81 points by blitz on June 10, 2023 | hide | past | favorite | 30 comments



Extremely curious that PaLM-E, PaLI, and GPT-4 were trained to be multimodal (accept non-text inputs, such as images) but the released API's are text-only. In GCP's case, here, they've released PaLM-2 which is not multimodal like PaLM-E and PaLI. This prevents using it for visual reasoning[0].

I'm just wondering why multiple parties seem reluctant to allow the public to use this.

0: https://visualqa.org


The image compression/decompression from their special token system wouldn't be free, it would be just as expensive as any other per-pixel transformation on an image file, and it would be entirely custom software doing it that they would have to run on their servers. Image upload and download is a very significant increase in net traffic compared to just text and could make the whole venture cost a lot more. And finally, an image even when downsized is going to be composed of a lot of tokens, so that's going to be a lot of computational cost just to run inference on it. If they haven't implemented statefulness (which many haven't right now despite the simplicity of the technique, field is still very new), that computational cost must be repeated with every fresh API call.

Basically, multi-modal functionality should be an OOM increase in compute, traffic, and storage requirements for anyone providing it compared to a text-only model (or an only-text-allowed model).


The voice of the people is sometimes a bit raucous


Plus, there is a frenzy on how to maximally exploit these as fast as possible from all angles, and all parties.

Anyone who acts all casual, as if there is not a constellation of vultures circling AI right now should consider themselves 'off-grid'


I wish they would just open the floodgates. The vultures will realize that their extractive problems won't be solved by a generative model, no matter how "multimodal" its inputs are. Of course, that won't happen, because that would require certain charlatans admitting that their models won't hold up in half the places the even more greedy vultures are vying for.


Presumably they're harder to censor or enforce ideological constraints on. I can't see any other reason other than them being worried about bad press because someone made the model do something that they want to play up as bad.


I can think of two very important reasons just off the top of my head.

1. --- It will kill captchas for good. Half of the internet is protected by Cloudflare or Google captchas at this point. Spam, fraud, and other trouble has a maximum possible volume because you can only pay a human in India so little to solve them for you. If you have an algorithm that can complete it, the game is up. Sites may as well not have a captcha at all. Prevention then becomes much more Orwellian with hardware TPM attestation solutions and the internet as we know is forever changed.

2. --- It will show corporations and governments just how all-seeing video surveillance could be. Human or (by some reports, above-human) level computer vision is a Pandora's box all by itself.

OpenAI might simply be wanting to avoid opening any more family-size cans of worms than there already are.


> If you have an algorithm that can complete it, the game is up.

This is very much already a thing, I'm sad to say.


We are waiting to launch a new iOS app that has text generation using vertex AI for GA. So we will go live next live.

We started with GPT API but switched to Vertex AI due to speed. We will still use GPT API as backup still though.


how does Vertex compare to GPT in terms of quality of output? Also do you use it as it is, or do any fine tuning?


Quality of output is at same level as GPT. The biggest issues for us:

1. text-bison limited to 1024 output tokens.

2. Output format we ask for JSON. But it is not valid json many times (, after last element, missing } after element etc). We have to write our own parsing code in the end to work around these JSON format issues.


I've been demoing it and have found it struggles to reliably output structured JSON at the moment. I'm curious if folks have had different experiences and if so what their prompts were.


We fine-tuned bison with input set to doc content, and output as JSON, but the generation keeps getting prefixed with some of the input. Waiting to hear back from Google about what we might be doing wrongly. The JSON itself looks great, though.

Edit: sorry, that was a different experiment. The one that worked well was an address splitter, trained off Google Address Validator output, funnily enough. Still, the output JSON got prefixed with some of the address input.


Guidance from Microsoft or Jsonformer might help. Haven’t tried with vertex but it’s a common problem across LLMs that both projects fix.


Interesting statement and would be keen to see if businesses would trust Google to try out these capabilities, or other smaller recent services as the preferred choice given their flexibility of integration with existing cloud choices.

It seems we may find companies on all major cloud providers in the near future to guarantee access to unique proprietary services that cloud providers are starting to differentiate themselves with from their competitors


Sure, IaaS is commodified so the next opportunity for differentiation & value add is in services.

For GCP specifically, the Anthis/Omni stuff seems like a way to sell those services even if the infrastructure isn't actually in GCP.


I really really wonder how the price of vertex.so compares - in practice - to the openai api for use by a startup with unpredictable and non-sustained usage??? The multitenancy assumptions that are part of the openai api cost structure might make it much cheaper. Has anybody modeled this? I realize the LLM’s aren’t equivalent today, but longterm they could be.


How does the pricing compare with OpenAI's GPT-3 Turbo? Seems double the price?


They seem like discounting it heavily right now. I haven't seen much charge yet on my bill even though I have used is quite a lot. But things might change so not really sure how the bill will be at the end of the month.


What’s the competing Azure/OpenAI service?

Is it Azure OpenAI service (that seems too simple!)?


Anyone experiment with the embeddings api? How does gecko compare to embeddings-ada?


Where can you find gecko? Has it finally been published?


textembedding-gecko reached GA on June 7th, together with text-bison.

You can view all the models available to you on Model Garden.


It doesn't seem to be available to me. How do I get access? If I remember correctly gecko is small enough to run on device.


Are you referring to the text-gecko model? I don't think it's released to the public yet.

> If I remember correctly gecko is small enough to run on device.

Yes, I also remember reading that it's designed to be lightweight enough to run on a mobile device.


i didn't try it as this requires you to give payment information for a free trial and i got sidetracked

what i did learn, is that somehow, google has all of my credit cards despite me never sharing it on the account i was using.


How did you learn that Google has all your credit cards?


it brought me to the payment page for a free trial


So this means now they're no longer free I suppose :(


[flagged]


AI moves faster than anyone could have expected.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: