I think there's more fear of OpenAI leaking data than say, Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel or some other company that has gobs of a company's data. Microsoft also has gobs of data: anything on Office and Outlook is your company data — but the fear that they'll leak (intentional or accidental) is somehow more contained.
If we want to be intellectually honest with ourselves, we can either be fearful and have a plan to contain data from ALL of these companies, OR, we address the risk of data leaks through bugs as an equal threat. OpenAI uses Azure behind the scenes, so it'll be as solid (or not solid) as most other cloud-based tools IMO.
As for your data training their data: OpenAI is mostly a Microsoft company now. Most companies use Microsoft for documentation, code, communications, etc. If Microsoft wanted to train on your data, they have all the corporate data in the world. They would (or already could!) train on it.
If there's a fear that OpenAI will train their model on your data submitted through their silly textbox toy, but NOT through training on the troves of private corporate data, then that fear is unwarranted too.
This is where OpenAI should just get a "corporate" tier, charge more for it, and is basically make it HIPAA/SOC2/whatever compliant, and basically do that to assuage the fears of corporate customers.
> I think there's more fear of OpenAI leaking data than say, Airtable or Notion or Github or AWS/S3 or Cloudflare or Vercel or some other company that has gobs of a company's data.
There is zero fear. OpenAI openly writes that they are going to use ChatGPT chats for training. On the popup modal they show you when you load the page. That is not a fear, that is a promise that they will do leak it whatever you tell them.
If i tell you “please give me a dollar, but i warn you you will never see it again” would you describe your feeling over the transaction as “fearful that the loan won’t be repaid”?
1. Azure has the worst security of the major cloud providers; multiple insanely terrible RCE and open readable DB exposures.
2. Azure infrastructure still likely has far better security/privacy by virtue of all their compliance, (HIPAA, FedRAMP, ISO certifications etc.) than whatever startup-move-fast-ignore-compliance crap OpenAI layers on top of it in their application layer.
If we want to be intellectually honest with ourselves, we can either be fearful and have a plan to contain data from ALL of these companies, OR, we address the risk of data leaks through bugs as an equal threat. OpenAI uses Azure behind the scenes, so it'll be as solid (or not solid) as most other cloud-based tools IMO.
As for your data training their data: OpenAI is mostly a Microsoft company now. Most companies use Microsoft for documentation, code, communications, etc. If Microsoft wanted to train on your data, they have all the corporate data in the world. They would (or already could!) train on it.
If there's a fear that OpenAI will train their model on your data submitted through their silly textbox toy, but NOT through training on the troves of private corporate data, then that fear is unwarranted too.
This is where OpenAI should just get a "corporate" tier, charge more for it, and is basically make it HIPAA/SOC2/whatever compliant, and basically do that to assuage the fears of corporate customers.