Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

We saw these same fears with the release of Gmail. Why would you trust your email to Google?!! Aren't they going to train their spam filters on all your data? Aren't they going to sell it, or use it to sell you ads?

Corporations constantly put their most sensitive data in 3rd party tools. The executive in the article was probably copying his company strategy from Google docs.

Yes, there are good reasons for concern, but the power of the tool is simply too great to ignore.

Banning these tools will go the same way as prohibition did in the US, people will simply ignore it until it becomes too absurd to maintain and too profitable to not participate in.

Companies which are able to operate without these fears will move faster, grow more quickly, and ultimately challenge companies restricted to operate without.

Now I think the article should be a wake-up call for OpenAI. Messaging around what is and what is not used for training could be improved. Corporate accounts for Chat with clearer privacy policies would be great and warnings that, yes, LLMs do memorize data and you should treat anything you put into a free product on the web as fair game for someone's training algorithm.




I think this is different in that ChatGPT is expressly using your data as training in a probabilistic model. This means:

* Their contractors can (and do!) see your chat data to tune the model

* If the model is trained on your confidential data, it may start returning this data to other users (as we've seen with Github Copilot regurgitating licensed software)

* The site even _tells you_ not to put confidential data in for these reasons.

Until OpenAI makes a version that you can stick on a server in your own datacenter, I wouldn't trust it with anything confidential.


> I think this is different in that ChatGPT is expressly using your data as training in a probabilistic model.

Google tries hard to sell you on their auto-answers for emails ('smart reply'), wonder how those got trained...


Google had all the same problems, until it found a balance of functionality, security, and privacy.

OpenAI just hasn't started to try adding privacy and security yet.


A language model inherently has a privacy problem. How would you guarantee no leaks?


You simply don’t train on the user imputs. There are enough unread books, public repos, and new articles.


Not that I don't expect them to do this, but how is it expressly said to be so?

https://help.openai.com/en/articles/5722486-how-your-data-is...

> OpenAI does not use data submitted by customers via our API to train OpenAI models or improve OpenAI’s service offering. In order to support the continuous improvement of our models, you can fill out this form to opt-in to share your data with us. Sharing your data with us not only helps our models become more accurate and better at solving your specific problem, it also helps improve their general capabilities and safety.


Did you read the next paragraph?

> When you use our non-API consumer services ChatGPT or DALL-E, we may use the data you provide us to improve our models.


I definitely did not correctly read that. Thanks for the clarification. Totally misread the 'our API' bit!

It's also in the FAQ: https://help.openai.com/en/articles/6783457-chatgpt-general-...

> Will you use my conversations for training?

> Yes. Your conversations may be reviewed by our AI trainers to improve our systems.


As the saying goes, if the product is free, then you are the product.


The product is $20!


The product is pay-as-you-go, just sign up for the API keys and use the playground, or an alternative client, instead of their ChatGPT webapp.


chatgpt is (still) free.


Hehe old tos trick. Here it doesn't say "will never use" but say "does not use" and I wager below or somewhere will say that they can change the tos at any time in the future unilaterally


Sticking it in your own datacenter doesn't really prevent any of these problems (except maybe #2), only now your leaks are internal and because of all the false sense of security, you might wind up leaking far more confidential and specific information (ie. an executive leaking to the rest of the team in advance that they are planning layoffs for noted reasons, whereas that executive might have used more vague terms when speaking to public chatGPT).


Sticking it in your own private datacenter would imply that you can opt in or out of using your data to train the next generation. ChatGPT does not dynamically train itself in realtime.


The implication is that you would bother with ChatGPT at all to train it on the relevant local data, the key value aspect to ChatGPT beyond general public use.


It prevents all of those problems as it puts all the data / data movement under your control.


How so?


Because it's your own data center, which means you own the data and can set up firewall rules to prevent any software running there from leaking data to outside the data center.


Well, you can stick it on Azure.


Trusting Gmail with corporate communication was was a terrible idea (and explicitly illegal in a lot of industries), and companies didn't start to adopt it until Google released an enterprise version with table-stakes security features like no training on the data, no ad targeting, auditing, compliance holds and more.

There's a huge difference between trusting a third party service with strict security and data privacy agreements in place vs one that can (legally) do whatever they want with your corporate data.


This is vital for professional adoption. We cannot live in a world where basically all commercial information, all secrets are being submitted to one company.


Was?


Well it was, until Google Workspace (G Suite) came along and provided essentially an enterprise version of Gmail.


I still question the wisdom of giving data to the worlds largest spyware company that makes its money by converting mass surveillance into dollars.


Hosting your own servers for email and business files is infinitely more costly from a performance, uptime, and personnel standpoint, and self-hosted office with network shares is not suitable for most businesses' needs of multi-user collaboration (sure, you can use Office / M365 desktop apps which do collaboration, but then you're forced to use the desktop apps).

Google Workspace solves the issues of data privacy both by having extreme user data & datacenter access controls[0,1], a robust terms document that details how data is collected and used[2], and enterprise customers can access an audit report that details what and when things are accessed by Google employees[3].

0: https://storage.googleapis.com/gfw-touched-accounts-pdfs/goo...

1: https://workspace.google.com/security/

2: https://workspace.google.com/terms/premier_terms.html

3: https://support.google.com/a/answer/9230474?hl=en


> Companies which are able to operate without these fears will move faster

Or the fears are real and companies that operate without them will be exploited, or extinguished for annoying their customers.


Company I work for uses GMail - but we have business relation with them as we pay for Business licenses that have business data handling in the agreement.

If employee sets random GMail account that is not covered by agreement that is personal account. Sending company data to personal email account might be grounds for firing person.

Setting up some account at random with OpenAI and putting company details like customer names or else there is data breach.

Companies will let people use the tools - but it is not like one can start setting up random accounts without approval from management. Of course there are different types of companies with less or more red-tape.


If your company's code is all repositories on Github (or bitbucket, or any similar service), worrying about ChatGPT is quite silly.

And on the other hand, if your company doens't use Github etc due to security concern, it's a very good sign telling you need to ban ChatGPT too.


No, it's not silly to worry about it. Many companies store data in third party systems to which they retain control over access. Once you put data into chatGPT what control do you have over it?


If a company/government risk model allows for giving Google all the most sensitive information, then that says something about trust. It has not gone unnoticed how those risk model differs when something like tiktok arrived, or with earlier Huawei 5G modems.

The enterprise version of gmail was just an additional step to instill trust. In practice it is still a decision based on trust rather than physics. An "enterprise version of privacy guaranties" for Huawei 5G modems or tiktok apps would not make governments suddenly happy with the risk model where sensitive data would have a minor risk of ending up in China.


> We saw these same fears with the release of Gmail. Why would you trust your email to Google?!!

The original Gmail TOS explicitly stated that they scanned the content. They only stopped for the rollout of Gsuite.


IIRC (been a while, so maybe I'm wrong) Google was also the reason Amazon swapped email formats for purchases. They realized they were giving a ton of data to Google through the receipts about products purchased, so now they just give you the vague order emails.


I recommend you take all your proprietary code and copypaste it to ChatGPT. You can help improve our collective generator.

If you’re an artist, just send all your work to DALL-E. Why have money or fame?


Google wasn't yet evil when most people adopted Gmail.

And corporations have strict agreements with their providers. They are even required to in many cases due to GDPR and the likes. Users connecting to ChatGPT on their own accounts bypass this.


> too profitable to not participate in

Sorry, but I really struggle to see how a non AI company will actually become more profitable simply by getting their employees to use ChatGPT. In fact, the more companies that use it, the more demand there will be for "human only" services.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: