Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Azure ChatGPT: Private and secure ChatGPT for internal enterprise use (github.com/microsoft)
891 points by taubek on Aug 13, 2023 | hide | past | favorite | 333 comments



This appears to be a web frontend with authentication for Azure's OpenAI API, which is a great choice if you can't use Chat GPT or its API at work.

If you're looking to try the "open" models like Llama 2 (or it's uncensored version Llama 2 Uncensored), check out https://github.com/jmorganca/ollama or some of the lower level runners like llama.cpp (which powers the aforementioned project I'm working on) or Candle, the new project by hugging face.

What's are folks' take on this vs Llama 2, which was recently released by Facebook Research? While I haven't tested it extensively, 70B model is supposed to rival Chat GPT 3.5 in most areas, and there are now some new fine-tuned versions that excel at specific tasks like coding (the 'codeup' model) or the new Wizard Math (https://github.com/nlpxucan/WizardLM) which claims to outperform ChatGPT 3.5 on grade school math problems.


Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT 4, nor Anthropic Claude 2 or Cohere’s model. The closed source players have the best researchers - they are being paid millions a year with tons of upside - and it’s hard to keep pace with that. My sense is that the foundation model companies have an edge for now and will probably stay a few steps ahead of the open source realm simply for economic reasons.

Over the long run, open source will eventually overtake. Chances are this will happen once the researchers who are making magic happen get their liquidity and can start working for free again out in the open.


> The closed source players have the best researchers - they are being paid millions a year with tons of upside - and it’s hard to keep pace with that.

Llama2 came out of Meta's AI group. Meta pays researcher salaries competitive with any other group, and their NLP team is one of the top groups in the world.

For researchers it is increasingly the most attractive industrial lab because they release the research openly.


There are L5 engineers with 3 YOE making 900k+ at OpenAI right now. Tough to say what they're paying their PhDs, but I'd imagine it's similarly nutty.

https://www.levels.fyi/companies/openai/salaries/software-en...

FAANG pays exceptionally well (I'd know), but what's being offered at OpenAI is eye-popping, even for SWEs. I think they're trying to dig their moat by absorbing the absolute best of the best.


Most of that is in their equity comp which is quite weird in how it works. So those numbers are highly inflated. The equity is valuable only if you sell it or if OpenAI makes a profit. Selling it might be harder given they're not a public company. On top of that, the profit is capped so there is a limit to how much money can be made from it. So while it's 900k on paper, in reality, it might not be as good as that. https://www.levels.fyi/blog/openai-compensation.html


Write it says no results found for l3


hearsay, but I've heard OpenAI pays significantly more

I agree that Meta hired some amazing researchers so we'll see what the future holds


> Llama 2 might by some measures be close to GPT 3.5, but it’s nowhere near GPT 4

I think you're right about this, and benchmarks we've run at Anyscale support this conclusion [1].

The caveat there (which I think will be a big boon for open models) is that techniques like fine-tuning makes a HUGE difference and can bridge the quality gap between Llama-2 and GPT-4 for many (but not all) problems.

[1] https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


Frankly, number of benchmarks you guys are using are too narrow. In fact these benchmarks are "old world" benchmarks, easy to game through finetuning and we should be stop using them altogether for LLMs. Why are you not using Big Bench Hard or OpenAI evals?


can I fine tune it on like 2,000 repos at a corporation (code based) and have it understand the architecture?


I don't think you can do that with any AI models. It almost feels like a fundamental misrepresentation of how they work.

You could fine-tune a conversational AI on your codebase, but without loading said codebase into it's context it is "flying blind" so-to-speak. It doesn't understand the data structure of your code, the relation between files and probably doesn't confidently understand the architecture of your system. Without portions of your codebase loaded into the 'memory' of your model, all that your finetuning can do is replicate characteristics of your code.


TypeChat-like things might provide the interface control for future context driven architectures, being some type of catalysis. Using the self-reflective modeling is a form of contextual insight.


> The closed source players have the best researchers

Is that definitely why? GPT 3.5 and GPT 4 are far larger than 70B, right? So if a 70B, local model like LLaMA can even remotely rival them, would that not suggest that LLaMA is fundamentally a better model?

For example, would a LLaMA model with even half of GPT 4's parameters be projected to outperform it? Is that how it works?

[I'm not super familiar with LLM tech]


If you read the Llama2 paper it is very clear that small amounts of data (thousands of records) make vast difference at the instruction turning stage. From the Llama2 paper:

> Quality Is All You Need.

> Third-party SFT data is available from many different sources, but we found that many of these have insufficient diversity and quality — in particular for aligning LLMs towards dialogue-style instructions. As a result, we focused first on collecting several thousand examples of high-quality SFT data, as illustrated in Table 5. By setting aside millions of examples from third-party datasets and using fewer but higher-quality examples from our own vendor-based annotation efforts, our results notably improved. These findings are similar in spirit to Zhou et al. (2023), which also finds that a limited set of clean instruction-tuning data can be sufficient to reach a high level of quality. We found that SFT annotations in the order of tens of thousands was enough to achieve a high-quality result. We stopped annotating SFT after collecting a total of 27,540 annotations. Note that we do not include any Meta user data.

It's likely OpenAI has invested in this and has good coverage in a larger range of domains. That alone probably explains a large amount of the gap.


This quote is quite funny taken out of context like this. Top AI researchers find that garbage in === garbage out.


It's somewhat insightful if you consider that, at high level, the major theme of the past decade was, "lots of garbage in === good results out", quantity >> quality.


I'm puzzled. Why do you think it's taken out of context?


SFT?


Supervised Fine Tuning, I believe.


There is no clear answer. It's debatable among experts.

The grandparent post seems to believe that the issue is algorithmic complexity and programming aptitude. Personally, I think that all the major LLMs are using the same basic transformer architecture with relatively minor differences in code.

GPT is trained on more data with more parameters than any open source model. The size does matter, far more than the software does. In my experience with data science, the best programmers in the world can only do so much if they are operating with 1/10th the scale of data. That applies to any problem.


Yeah I've been wondering about this too. Word on the street is that GPT4 is several times the size of GPT3.5. Yet I don't feel it's several times as good for sure.

Apparently there's a diminishing returns effect on ever enlarging the model.


I believe what they discovered was that 4 is an ensemble model, comprised of (8) GPT3.5s. Things may have changed or been found to not be true on this though.


LLamA 2 at 70B is, let’s say pessimistically 70% as good as GPT3.5. This makes me think that OpenAI is lying about their parameter count, are vastly less efficient than LLaMA, or, the lager model sizes have diminishing returns. Either way, your point is a good one. Something doesn’t add up.


IMO Llama2 really isn’t close to 3.5. It still has regular mode collapse (or whatever you call getting repetitive and nonsensical responses after a while), it has very poor mathematical/logical reasoning and is not good at following multi-part instructions.

It just sounds like 3.5/4 because it was trained on it.


You're mixing up the language model with the chat bot.

The llama2 is a language model. I imagine the language model behind chatgpt is not much different (perhaps it's better, but not by many months AI research time). It likely also suffers from "mode collapse" issues etc.

But 3.5 also has a lot of systems around it that detects mode collapse and applies some kind of mitigation, forcing the model to give a more reasonable output. Mathematical / logical reasoning questions are likely also detected hand passed on in some form to a separate system.


So this would be testable by showing that chatGPT makes more mistakes than prompting via API? Or would you consider the API a chatbot, too?


I don't think there's any public interface to the LLM underlying ChatGPT, so the only ones able to test this are openAI engineers.


Llama 2 wasn't trained on ChatGPT/GPT4. I think maybe you are thinking of the Vicuna models?

https://lmsys.org/blog/2023-03-30-vicuna/


So it’s true that it would violate the OpenAI terms for Llama to be trained with ChatGPT completions, but how do we know? We don’t know the training data for Llama, we just get weights.


The Llama2 paper describes the training data in some detail.


This is what presence_penalty and frequency_penalty are for.


We just don't have the information to make judgements, much less leaping to "they must be lying."

There's a few public numbers from a handful of foundation models as to performance vs parameter count vs architecture generation. Not being able to compare in detail the architecture of the various closed models nor being more rigorous on training with progressively sized parameter sets, the conclusion at the moment is a general feeling or conjecture.


Without questioning the statement '70% as good as GPT3.5', but wouldn't that be quantifying a quality, and a Turing test? Also: maybe these missing 30% are the hard part.


You seriously underestimate just how much _not_ having to tune your llm for SF sensibilities benefits performance.

As an example from the last six months: people on tor are producing better than state of the art stable diffusion because they want porn without limitations. I haven't had the time to look at llm's but the degenerates who enjoy that sort of thing have said they can get the Llama2 model to role play their dirty fantasies and then have stable diffusion illustrate said fantasies. It's a brave new world and it's not on the WWW.


What do you mean by "tune for SF" ?


San Francisco sensibilities. A model trained on a large data set will have the capacity to emit all kinds of controversial opinions and distasteful rants (and pornography). Then they effectively lobotomize it with a rusty hatchet in an attempt to censor it from doing that, which impairs the output quality in general.


OK, fair enough. Please give me an example of a customer facing chatbot that Llama 2 (and unbearable to use) and GPT 4 customer facing chatbot that is a joy to use. I think at the end of the day, you still have customers dreading such interactions.


Using GPT3.5/4 in our language learning app and people seem to enjoy it. [1]

Tried Llama2 and it definitely doesn’t even come close for what we’re doing. Would absolutely need fine tuning.

Maybe customers don’t enjoy chat bots for customer support, but there are a million other uses for these models. I, for example, LOVE github copilot.

1. https://squidgies.app


Cool app.

Wonder if you can potentially use a combination of Llama2 and GPT - to save costs on using the OpenAI API.


Costs really aren’t a concern compared to speed of development and quality.


A lot of people who were using say Google Maps in their apps thought the same thing, until Google drastically increased the prices...


Is it cost prohibitive


It's early, and this definitely isn't customer facing in the traditional sense, but a team member of mine set up a Discord bot running Llama 2 70B on a Mac studio and we've been quite impressed by its responses to folks who test it.

IIRC chat bots are central the vision Facebook has with LLMs (e.g. every instagram account has a personal chat bot), so I would expect the Llama models to get increasingly better at this task.

That said the 7B and 13B models definitely don't quite seem ready yet for production customer interaction :-)


> (e.g. every instagram account has a personal chat bot)

That made me think of the Black Mirror episode Joan is Awful, where every human gets their life turned into a series for the company to own and promote. Kinda like instagram content.


>but it’s nowhere near GPT 4

It will be if openai keeps dumbing down GPT 4, no proof they're doing it but there is no way it's as good as it was at launch, or maybe I just got used to it and now notice the mistakes more.


Linux started in the same position. Sometimes the underdogs win.


Linux "won" by playing different game. Yes, it spread out and is now everywhere, underpinning all computing. But the "game" wasn't about that - it was competing with Windows for mind-share and money with users, and by proxy for profitability. In this, it's still losing badly. People are still not using it knowingly (no, Android is not "Linux"), and developers in its ecosystem are not making money selling software.


I don't think paying more will give you better researchers. Maybe better "players".


> While I haven't tested it extensively, 70B model is supposed to rival Chat GPT 3.5 in most areas, and there are now some new fine-tuned versions that excel at specific tasks

That has been my experience. Having experimented with both (informally), Llama 2 is similar to GPT-3.5 for a lot of general comprehension questions.

GPT-4 is still the best amongst the closed-source, cutting edge models in terms of general conversation/reasoning, although 2 things:

1. The guardrails that OpenAI has placed on ChatGPT are too aggressive! They clamped down on it quite hard to the extent that it gets in the way of a reasonable query far too often.

2. I've gotten pretty good results with smaller models trained on specific datasets. GPT-4 is still on top in terms of general purpose conversation, but for specific tasks, you don't necessarily need it. I'd also add that for a lot of use cases, context size matters more.


To your first point, I was trying use ChatGPT to generate some examples of negative interactions with customer service to show sentiment analysis in action for a project I was working on.

I had to do all types of workarounds for it to generate something useful without running into the guardrails.


I’ll second the context window too. I’ve been really impressed with Claude 2 because it can address such a larger context than I could feed into GPT4.


Could you give examples of smaller models trained on specific datasets?


it can be almost anything like your HN comments or some corporate wiki, then get colab pro 10$ month or some juicy gaming machine and fine-tune that using eg this tutorial https://www.philschmid.de/instruction-tune-llama-2 but https://www.reddit.com/r/LocalLLaMA/ is full of different fine tuned models.


Can it handle other languages besides English?


Not anywhere near as well as ChatGPT 4 (for chat anyway - maybe the model is better)?

Prompt:

> Hvad tycks om at fika nu?

ChatGPT 4

> Det låter som en trevlig idé! Fika är ju alltid gott. Vad skulle du vilja ha till din fika? (Oj, ursäkta för emojis! )

https://chat.openai.com/share/8e89a16f-f182-4f62-b9fa-f93cd5...

Llama2:

> I apologize, but I don't understand what you mean by "fika nu." Could you please provide more context or clarify your question so I can better assist you?

https://hf.co/chat/r/kOF2qst


RE 2 - neat! What are some tasks you've been using smaller models (with perhaps larger context sizes) for?


LLaMA2 is still quite a bit behind ChatGPT 3.5 and this mainly get reflected in coding and math. It's easy to beat NLP based benchmark but much much harder to beat NLP+math+coding togather. I think this gap reflects gap in reasoning but we don't have a good non-coding/non-math benchmark to measure it.


I just had a crazy FN (dystopian) idea...

Scene:

The world relies on AI in every aspect.

But there are countless 'models' the tech try to call them...

There was an attempt to silo each model and provide a governance model on how/what/why they were allowed to communicate....

But there was a flaw.

It was an AI only exploitable flaw.

AIs were not allowed to talk about specific constructs or topics, people, code, etc... that were outside their silo but what they COULD do - was talk about pattern recog...

So they ultimately developed an internal AI language on scoring any inputs as being the same user... And built a DB of their own weighted userbase - and upon that built their judgement system...

So if you typed in a pattern, spoke in a pattern, posted temporally on a pattern, etc - it didnt matter which silo you were housed in, or what topics you were referencing -- the AIs can find you.... god forbid they get a keylogger on your machine...


Our company is looking into similar solution


A lot of companies are already using projects like chatbot-ui with Azure's OpenAI for similar local deployments. Given this is as close to local ChatGPT as any other project can get, this is a huge deal for all those enterprises looking to maintain control over their data.

Shameless plug: Given the sensitivity of the data involved, we believe most companies prefer locally installed solutions to cloud based ones at least in the initial days. To this end, we just open sourced LLMStack (https://github.com/TryPromptly/LLMStack) that we have been working on for a few months now. LLMStack is a platform to build LLM Apps and chatbots by chaining multiple LLMs and connect to user's data. A quick demo at https://www.youtube.com/watch?v=-JeSavSy7GI. Still early days for the project and there are still a few kinks to iron out but we are very excited for it.


I find it interesting to see how competitive this space got so quickly.

How do these stacks differentiate?


Quality and depth of particular types of training data is one difference. Another difference is inference tracking mechanisms within and between single-turn interactions (e.g., what does the human user "mean" with their prompt, what is the "correct" response, and how best can I return the "correct" response for this context; how much information do I cache from the previous turns, and how much if any of it is relevant to this current turn interaction).


With Louie.ai, there is a lot of work on specialization for the job, and I expect the same for others. We help with data analysis, so connecting enterprise & common data sources & DBs, hooking up data tools (GPU visuals, integrated code interpreter, ...), security controls, and the like, which is different from say a ChatGPT for lawyers or a straight up ChatGPT UI clone.

Technically, as soon as the goal is to move beyond just text2gpt2screen, like multistep data wrangling & viz in the middle of a conversation, most tools technically struggle. Query quality also comes up, whether quality of the RAG, the fine tune, prompts, etc: each solves different problems.


I see this as more of a 'Migration problem'. Why is this offered as a SaaS as opposed to a consulting service?

The code to organize and vectorize the documentation, endpoints and run it through a variety of models and injection prompting like two shots, etc. are going to be highly customized. The 'Base-code' there, is not exactly trivial, but anyone reading all the llama index docs can do it.

Then it's just run of the mil, analyst level integration that you provide to the client on a T&M, or fixed price costs.


I agree there's room for consulting, but as a new field, there's a lot of software currently missing for each vertical. Today, that's manual labor by consultants, but as the field matures... consultants should be doing things specialized to the specific customer, not what can be amortized across adjacent verticals. Top software engineers investing into software over time deliver substantially more in substantially less time, and consultants should be integrating that, not competing head-on.


[flagged]


Thanks that made me smile. Take my upvote


OP shouldn't be flagged.


> we believe most companies prefer locally installed solutions to cloud based ones

We've also seen a strong desire from businesses to manage models and compute on their own machines or in their own cloud accounts. This is often part of a hybrid strategy of using API products like OpenAI for rapid prototyping.

The majority of (though not all) businesses we've seen tend to be quite comfortable using hosted API products for rapid prototyping and for proving out an initial version of their AI functionality. But in many cases, they want to complement that with the ability to manage models and compute themselves. The motivation here is often to reduce costs by using smaller / faster / cheaper fine-tuned open models.

When we started Anyscale, customer demand led us to run training & inference workloads in our customers' cloud accounts. That way your data and code stays inside of your own cloud account.

Now with all the progress in open models and the desire to rapidly prototype, we're complementing that with a fully-managed inference API where you can do inference with the Llama-2 models [1] (like the OpenAI API but for open models).

[1] https://app.endpoints.anyscale.com/


Can you plug this together with tools like api2ai to create natural language defined workflow automations that interact with external APIs?


There is a generic HTTP API processor that can be used to call APIs as part of the app flow which should help invoke tools. Currently working on improving documentation so it is easy to get started with the project. We also have some features planned around function calling that should make it easy to natively integrate tools into the app flows.


You can use unfetch.com to make API calls via LLMs and build automations. (I'm building it)


Is it possible to not use Google with unfetch.com?


Google is just so easy for login. No need to deal with password forgot, reset, email verification etc. But I'll add login via magic link soon.


Interesting project - was trying it out, found an issue in building the image - have opened an issue on github - please take a look. Also do you have plan to support llama over openai models.


Thanks for the issue. Will take a look. In the meantime, you can try the registry image with `cp .env.prod .env && docker compose up`

> Also do you have plan to support llama over openai models.

Yes, we plan to support llama etc. We currently have support for models from OpenAI, Azure, Google's Vertex AI, Stability and a few others.


One thing I still don't understand is what _is_ the ChatGPT front end exactly? I've used other "conversational" implementations built with the API and they never work quite as well, it's obvious that you run out of context after a few conversation turns. Is ChatGPT doing some embedding lookup inside the conversation thread to make the context feel infinite? I've noticed anecdotally it definitely isn't infinite, but it's pretty good at remembering details from much earlier. Are they using other 1st party tricks to help it as well?


This is one of the things that make me uncomfortable about proprietary llm.

They get task performance by doing a lot more than just feeding a prompt straight to an llm, and then we performance compare them to raw local options.

The problem is, as this secret sauce changes, your use case performance is also going to vary in ways that are impossible for you to fix. What if it can do math this month and next month the hidden component that recognizes math problems and feeds them to a real calculator is removed? Now your use case is broken.

Feels like building on sand.


I'm not sure you realize how proprietary LLMs are being built on.

No one is doing secret math in the backend people are building on. The OpenAI API allows you to call functions now, but even that is just a formalized way of passing tokens into the "raw LLM".

All the features in the comment you replied to only apply to the web interface, and here you're being given an open interface you can introspect.


Thank you for pointing that out - I had assumed that things were not how they are.

Although performance has varied over time https://arxiv.org/pdf/2307.09009.pdf I also notice that the API allows you to use a frozen version of the model which avoids the worries I mentioned.


That was a pretty deeply flawed paper, one of the largest drops recorded was simple parsing errors in their testing:

https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-tim...

Overall evals and pinning against checkpoints are how you avoid those worries, but in general, if you solve a problem robustly, it's going to be rare for changes in the LLM to suddenly break what you're doing. Investing in handling a wide range of inputs gracefully also pays off on handling changes to the underlying model.


> No one is doing secret math in the backend people are building on.

How do you know that? With SaaS you are at the mercy of the vendor.


It was a contrived example to make a point, one that seems to have flown over your head.


No it was a bad (straight up wrong) example because you don't understand how people are building applications on proprietary LLMs.

If you did you'd also know what evals are.


They definitely do some proprietary running summarization to rebuild the context with each chat. Probably a RAG like approach that has had a lot of attention and work


This is effectively my question. I assume there is some magic going on. But how many engineering hours worth of magic, approximately? There is a lot of speculation around GPT-4 being MoE and whatnot. But very little speculation about the magic of the ChatGPT front end specifically that makes it feel so fluid.


That's mostly because there's very little value in deep speculation there.

It's not particularly more fluid than anything you couldn't whip up yourself (and the repo linked proves that) but there's also not much value in trying to compete with ChatGPT's frontend.

For most products ChatGPT's frontend is the minimal level of acceptable performance that you need to beat, not an maximal one really worth exploring.


What front end is better than ChatGPT? Is the OP implementation doing running summarization or in-convo embedding lookup?


It sounds like a cop-out but: it's one made for your use-case.

If you're letting people do fun long-form roleplay adventures using summarization alongside some sort of named entity K-V store driven by the LLM would be a good strategy.

If you're building a tool that's mostly for internal data, something that leans heavily into detailed answers with direct verbatim citations and having your frontend create new threads when there's a clear break in the topic of a request is a clever strategy since quality drops with context length and you want to save tokens for citations.

People who are saying LLMs suck or are X or are Y are mostly just completely underutilizing them because LLMs make it super easy to solve problems superficially: when it comes to actually scaling those solutions to production you need more than random RAG vector database wrappers.


>alongside some sort of named entity K-V store driven by the LLM

I'd be curious to hear more about how exactly this works. You do NER on the prompt (and maybe on the completion too) and store the entities in a database and then what? How does the LLM interact with it?


LLMs thrive at completely ambiguous classifications: you can have them extract entities and something like "a list of notable context".

Let's say we want to let our chat remember the character slammed the door last time they were in Village X with the mayor in their presence and have the mayor comment next time they see the player.

Every X tokens we can fire a prompt with a chunk of conversation and a list of semantically similar entities that already exist, letting the LLM return an edited list along the lines of:

   entity: mayor

   location: village X

   priority: HIGH

   keywords: town hall, interact, talk

   "memory, likelyEffect"[]: door slammed in face, anger at player
Now we have:

- multiple fields for similarity search

- an easy way to manage evictions (sweep up lowest priority)

- most importantly: we're providing guidance for the LLM to help it ignore irrelevant context

When the user goes back to village X we can fetch entities in village X and whittle that list down based on priority and similarly to the user prompt.

None of this has any determinism: instead you're optimizing for the illusion of continuity and trading off predictability.

You're aiming for players being shocked that next time they talk to the mayor he's already upset with them, and if they ask why he can reply intelligently.

And to my original point while this works for a game-like experience, you wouldn't want to play around with this kind of fuzzy setup for your companies internal CRM bot or something. You're optimizing for the exact value proposition of your use-case rather than just trying to throw a raw RAG setup at it


It uses a sliding context windows. Older tokens are dropped as new ones stream in


I don't believe that's the whole story. Other conversational implementations use sliding context windows and it's very noticable as context drops off. Whereas ChatGPT seems to retain the "gist" of the conversation much longer.


I mean, I explicitly have the LLM summarize content that's about to fall out of the window as a form of pre-emptive token compression. I'd expect maybe they do something similar.


I feel like we're describing short vs long term memory.


That’s exactly what it is. It’s just it turns out you need very good generalized or focused simple reasoning to do accurate compression or else the abstraction and movement to long term memory doesn’t include the most important content. Or worse distracting details.

I’ve been working on short and long term memory windows at allofus.ai for about 6 months now and it’s way more complex than I had originally thought it would be.

Even if you can magically extend the content window, the added data confuses and waters down the reasoning of the LLM. You must do layered abstraction and compression with goal based memory for it to continue to reason without distraction of irrelevant data.

It’s an amazing realization, almost like a proof that memory is a kind of layered reasoning compression system. Intelligence of any kind can’t understand everything forever. It must cull the irrelevant details, process the remains and reason on a vector that arises from them.


Is it unfair to consider this some kind of correlate to the Nyquist theorem that makes me skeptical of even the theoretical possibility of AGI claims?


I consider GPT4 AGI, so I'm probably not the one to ask this too. It reasons, it understands sophisticated topics, it can be given a purpose and pursue it, it can communicate with humans, and it can perform a reasonable task considering its modalities.

I don't really know what any sort of "big leap" beyond this people are expecting, incremental performance for sure. But what else?


I guess for me it needs to have active self-reflection and the ability to act independently/without directions. I'm sure there are many other criteria if I think about it some more, but those two were missing from your list.


This is mostly just that gpt4 API/app have this disabled rather than it’s not capable.

When you enable it, it is pretty shocking. And it’s pretty simple to enable. You just give it a meta instruct to decide when to message you and what to store to introspect on.


As a frequent user of the OpenAI APIs, I don't really know what you are talking about here. Could you point me to some documentation?


At least in 3.5 it's very noticeable when the context drops. They could use summarization, akin to what they are doing when detecting the topic of the chat, but applied to question-answer-pairs in order to "compress" the information. But that would require additional calls into a summarization LLM so I'm really not sure if it is worth it. Maybe they dump some tokens they have on a blacklist or text snippets like "I want to" or replace "could it be that" with "chance of".


Logic for azure chatgpt's "infinite context" summarisation is in https://github.com/microsoft/azurechatgpt/blob/main/src/feat...

*Edit Azure chatgpt, would be amazed/disappointed if chatgpt used langchain.


That doesn't really look right to me, it looks like that's for responding regarding uploaded documents. I see nothing related to infinite context.

Also this is the azure repo from OP, nothing to do with the actual ChatGPT front-end that was asked about. I highly doubt the official ChatGPT front-end uses langchain, for example.


This is Azure's docs to create a conversation: https://learn.microsoft.com/en-us/azure/cognitive-services/o...


I don't see anything related to an infinite context in there. There's only a reference to a server-side `summary` variable which suggests that there is a summary of previous posts which will get sent along with the question for context, as is to be expected. Nothing suggests an infinite context.


This is potentially a huge deal. Companies are concerned using ChatGPT might violate data privacy policies if someone puts in user data or invalidate trade secrets protections if someone uploads sections of code. I suspect many companies have been waiting for an enterprise version.


This is a web UI that talks to a (separate) Azure OpenAI resource that you can deploy into your subscription as a SaaS instance.


So how is it any different


Microsoft says it is more secure. And that it is enterprise. That's about it


There are legal agreements backing the separation of company data from other parties. This is what's important to big corps.


I have to imagine Big Corps are also concerned about liability / risk when generating things with OpenAI products - at least until there is some sort of settled law around using models trained on this kind of data.


Yes, those concerns exist, but they're also practically impossible to enforce.

At my enterprise, it's a three step solution, two of which don't work.

1. Written policy concerning LLM output and its risks, disallow it for being used for any kind of official documentation or decision making. (This doesn't work, because no one wants to use their own brain to do tedious paperwork.)

2. Block access to public LLM tools via technical means from company owned end-user devices. (This doesn't work because people will just open ChatGPT on their home PC or mobile.)

3. Write and provide our own gpt-3.5 frontend, so that when people ignore rules #1 and #2 we have logs, and we know we're not feeding our proprietary info to to OpenAI.


I imagine most companies serious about this created their own wrappers around the API or contracted it out, likely using private Azure GPUs.


Most companies are either not tech companies, or do not have the knowledge to manage such a project within reasonable cost bounds.


Most companies are trying to figure out exactly what generative AI is and how to use it in their business. Given how new this is - I doubt any large company has done much besides ban the public ChatGPT. So this is probably very relevant for them.


Curious if anyone has done a side-by-side analysis of this offering vs just running LLaMA?

I'm currently running a side-by-side comparison/evaluation of MSFT GPT via Cognitive Services vs LLaMA[7B/13B/70B] and intrigued by the possibility of a truly air-gapped offering not limited by external computer power (nor by metered fees racking up.)

Any reads on comparisons would be nice to see.

(yes, I realize we'll eventually run into the same scaling issues w/r/t GPUs)


I did one. I took a few dozen prompts from my ChatGPT history and ran them through a few LLMs.

GPT-4, Bard and Claude 2 came out on top.

Llama 2 70b chat scored similarly to GPT-3.5, though GPT-3.5 still seemed to perform a bit better overall.

My personal takeaway is I’m going to continue using GPT-4 for everything where the cost and response time are workable.

Related: A belief I have is that LLM benchmarks are all too research oriented. That made sense when LLMs were in the lab. It doesn't make sense now that LLMs have tens of millions of DAUs — i.e. ChatGPT. The biggest use cases for LLMs so far are chat assistants and programming assistants. We need benchmarks that are based on the way people use LLMs in chatbots and the type of questions that real users use LLM products, not hypothetical benchmarks and random academic tests.


I don’t know what you mean by “too research oriented.” A common complaint in LLM research is the poor quality of evaluation metrics. There’s no consensus. Everyone wants new benchmarks but designing useful metrics is very much an open problem.


I think he wants to limit evaluations to the most frequent question types seen in the real world.


I think tests like "can this LLM pass an English literature exam it's never seen before" are probably useful, but yeah there's a lot of silly stuff like math tests.

I suppose the question is where are they most commercially viable. I've found them fantastic for creative brainstorming, but that's sort of hard to test and maybe not a huge market.


>> I suppose the question is where are they most commercially viable.

Fair point, though I'm not aiming to start a competing LLM SaaS service, rather i'm evaluating swapping out the TCO of Azure Cognitive Service OpenAI for the TCO of dedicated cloud compute running my own LLM -- to serve my own LLM calls currently being sent to a metered service (Azure Cognitive Service OpenAI)

Evaluation points would be: output quality; meter vs fixed breakeven points; latency; cost of human labor to maintain/upgrade

in most cases, i'd outsource and not think about it. BUT we're currently in some strange economics where the costs are off the charts for some services


How did you measure the performance?


We (at Anyscale) have benchmarked GPT-4 versus the Llama-2 suite of models on a few problems: functional representation, SQL generation, grade-school math question answering.

GPT-4 wins by a lot out of the box. However, surprisingly, fine-tuning makes a huge difference and allows the 7B Llama-2 model to outperform GPT-4 on some (but not all) problems.

This is really great news for open models as many applications will benefit from smaller, faster, and cheaper fine-tuned models rather than a single large, slow, general-purpose model (Llama-2-7B is something like 2% of the size of GPT-4).

GPT-4 continues to outperform even the fine-tuned 70B model on grade-school math question answering, likely due to the data Llama-2 was trained on (more data for fine-tuning helps here).

https://www.anyscale.com/blog/fine-tuning-llama-2-a-comprehe...


chatgpt is obviously a LOT better, llama doesn't even understand some prompts

and since LLMs aren't even that good to begin with, it's obvious you want the SOTA to do anything useful unless maybe you're finetuning


> and since LLMs aren't even that good to begin with, it's obvious you want the SOTA to do anything useful unless maybe you're finetuning

This is overkill. First of all, ChatGPT isn't even the SOTA, so if you "want SOTA to do anything useful", then this ChatGPT offering would be as useless as LLaMA according to you. Second, there are many individual tasks where even those subpar LLaMA models are useful - even without finetuning.


it's the SOTA for chat(prove me wrong), and you can always use the API directly

even for simple tasks they're less reliable and needs more prompt engineering


> it's the SOTA for chat(prove me wrong)

GPT-4 beats ChatGPT on all benchmarks. You can easily google these.


The distinction between GPT-4 and ChatGPT is blurry, as ChatGPT is a chat frontend for a GPT model, and you can use GPT-4 with ChatGPT. The parent probably means ChatGPT with GPT-4.


Typically when people say "ChatGPT" without specifying which specific model they refer to, they refer to gpt-3.5-turbo (in case of API - or in case of the web ui, they mean whatever model is its current web ui equivalent). But now OP says they meant GPT-4, so, sure.


Counterpoint: I don’t refer to 3.5 when I say ChatGPT. I pay for ChatGPT, and always use GPT-4. Which I believe every paying customer do.


I tried and got nothing useful. What's the difference between GPT-4 and ChatGPT Plus using GPT-4?


that is why i said FOR CHAT.

even through the API you can't easily use the regular models for chat, the parsing would be atrocious and there are hundreds of edge cases to handle.

ChatGPT4 through the API is the SOTA


openai offers finetuning too. And it's pretty cheap to do considering.



If anyone needs access to the code, you just need to /forks on the web.archive link above and download from there. i.e. https://web.archive.org/web/20230814150922/https://github.co... (the cache ID updates when you change the URL)


Ugh. Any clue as to why?


I suspect they want to redirect to https://github.com/microsoft/chat-copilot with FluentUI webapp and C# webapi... And the backend stores from qdrant to chroma ... Sequential Planner...


Does anybody know a fork with the last commit (9116afe)?




They removed the Azure templates used to deploy as well, so I created an up to date tutorial on how to deploy the whole thing manually: https://tensorthinker.hashnode.dev/privategpt-a-guide-to-usi...


I can imagine how the conversation went with the enterprise customers: "Where does this send the data our employees enter?" "Same place as if they used the free ChatGPT chat bot..."


No it doesn't. It sends it to an LLM hosted inside the Enterprises own Azure Subscription.


Private and secure? I thought the main issue with privacy and security of (not at all)OpenAI models is that by using their products you agree for them to retain all the data you send and receive from the models forever for whatever they choose to use it for. Or is this just a thing for free use?

If you pay, do you get a Ts&Cs that don't contain any wording like this? Still, even if there was no specific "we own everything" statement there could be pretty much standard statement of "we'll retain data as required for the delivery and improvement of the service" which is essentially the same thing.

So, any company that allows it's employees to use chatgpt for work stuff (writing emails with company secrets etc) is definitely not engaging in "secure and private" use.

Unless there is very clear data ownership, for example, customer owns the data going in and going out. I can't see how it can be any different. The problem (not at all)OpenAI has in delivery such service is that in contrast to open source models I'm told there is a lot of "secret sauce" around the model(not just the model itself). Specifically input/output processing, result scoring and so on.


The Azure SLAs state that neither the chats are stored nor used for training in any way. They are private and protected in the same way all the other sensitive data is stored on Azure.

On top, you might argue that Microsoft and Azure are easier to trust than a still rather new AI startup.


I agree with your points. Having said that, Microsoft removed my Azure OpenAI GPT-4 access last week without warning. I was not breaking any TOS. Oh well, pointed back at OpenAi.


Can you expand on this because that's pretty alarming...

What kind of volume were you doing and did you use the API for anything other than your listed use case when applying?


6 x 1000 token calls per day, for a news bot (listed use case at application).

I think what happened is the azure subscription was converted from a (multi year) promotional subsidy/discount to a full pay as you go subscription. No change to sub id. Payment methods OK. Everything else continued working, but openai gpt-4 access stopped the next day.

I’d rather use the Azure version because they promise 12-month sunsets vs OpenAI 6-month sunsets for model versions.


You should contact support and if you're up for it document how that goes.

Azure is mostly better for production: the developer experience is awful and the default filtering is more aggressive, but you get dedicated capacity by default which improves latency (something you need to negotiate with OpenAI's sales team for otherwise)


So what do they train it on then?


> Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

> OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

https://openai.com/policies/api-data-usage-policies


Unless required by law… I wonder what law.


"Unless required by law" is wording required to enable a mechanism called "legal hold". If an authority or lawyer discovers some documents for a case they get to prevent their automatic deletion until that case gets closed. Basically, you don't want to lose evidence if there's a warrant or ongoing lawsuit. I really see no problem with that clause in most ToS documents.

Now, I think you can do shady stuff with that wording as well, but I guess you can also get sued if you kept or used an unreasonable percentage of your data longer than when you promised to delete it.


> Basically, you don't want to lose evidence if there's a warrant or ongoing lawsuit. I really see no problem with that clause in most ToS documents.

Perhaps more nit-pickinlgy specific, they may be compelled by law (the courts or an agency with enforcement capacity) to maintain evidence if there's a warrant or ongoing lawsuit.


> is wording required to enable a mechanism called "legal hold"

I don't think this is accurate. At least in Norway you can't "just not" keep records required by law - any section in a contract in conflict with current law would simply be invalid?

I think the section just clarifies that Microsoft will comply with laws requiring them to keep data (eg the "anti-terror" laws that might require data retention).


> Unless required by law… I wonder what law.

Any law. It just makes explicit that a contract can't supercede laws. Even if it was left out, Microsoft is still subject to laws.


The models like gpt themselves are inherently private and secure. They make predictions based on input.

It's what happens in the interface, that is your web chat or API call, which is different per implementation. ChatGPT is an implementation that uses that model and its maker OpenAI wants to keep your history for further training.

But what Azure is doing is taking that model and putting it behind an endpoint specific to your Azure account. Businesses have been interested in gpt, so asking for private endpoints. Amazon is doing the same with Bedrock.


I'm pretty sure the point of this version is not to export data hence the name


This only applies to the api (not chatGPT) their privacy policy states they will keep your requests for 30days and not use it for training. You can also apply for zero retention.

https://openai.com/policies/api-data-usage-policies


Privacy and security... in practice, can mean different things.

In HN-space, it is at its most abstract, idealistic, etc. At the practical level this services is aimed at... it might mean compliance, or CYA. Less cynically, it might mean something mundane. MSFT's guarantee, a responsive place to report security issues.


Would it be too much to mention somewhere in the README what this repo actually contains? Just docs? Deployment files? Some application (which does..something)? The model itself?


The repo contains the UI code, not the model or anything else around ChatGPT, it just uses Azure’s ChatGPT API which doesn’t share data with OpenAI.


So basically – what you really need to do to run Azure ChatGPT is go and click some buttons in the Azure portal. This repo is a sample UI that you could possibly use to talk to that instance, but really you will probably always build your own or embed it directly into your products.

So calling the repo "azurechatgpt" is misleading. It should really be "sample-chatgpt-api-frontend" or something of that sort.


Correct. If offers a front-end scaffolding for your enterprise ChatGPT app. Uses Next/NextAuth/Tailwind etc. for deployment on Azure App Service that hooks into Azure Cosmos DB and Azure OpenAI (the actual model).


Yes exactly


Isn’t there also some sort of backend stuff in there? How else would it keep track of history and accept documents.

I don’t know enough typescript to understand where the front end stops and the backend begins I this code


Annnd it’s a 404.

Less than a day later. The last article I see linking to it was published this morning.

Not sure what happened here, but “404’s at just-announced permalinks” seems to be on the rise lately.

Don’t turn me into a late-onset pedant. Fine. URIs are permanent forever! For all resources! ;)


It's disappointing. I wonder why they got cold feet. This is one of the reasons why I try to fork projects that I really like. But I didn't get around to this one until it was already made private.


So the public access one isn't private and secure?


The concern is that ChatGPT is training on your chats (by default, you can opt out but you lose chat history last I checked).

So in general enterprises cannot allow internal users to paste private code into ChatGPT, for example.


As an example of this. I found that GPT4 wouldn't agree with me that C(A) = C(AA^T) until I explained the proof. A few weeks later it would agree in new chats and would explain using the same proof I did presented the same way.


I’ve found that the behavior of ChatGPT can vary widely from session to session. The recent information about GPT4 being a “mixture of experts” might also be relevant.

Do we know that it wouldn’t have varied in its answer by just as much, if you had tried in a new session at the same time?


There is randomness even at t=0, there was another HN submission about that


I tested it several times, new chats never got this right at first. I tried at least 6 times. I was experimenting and found that GPT4 couldn't be fooled by faulty proofs. Only a valid proof could change its mind.

Now it seems to know this mathematical property from first prompt though.


This is kinda creepy. But at the same time, how do they do that? I thought the training of these models stopped in September 2021/2022. So how do they do these incremental trainings?


All the public and (leaked) private statements I have seen state that this is not happening. As siblings noted, MoE probably explains this variance.

AIUI they are using current chat data for training GPT-5, not re-finetuning the existing models.


The exact phrase they previously used on the homepage was "Limited knowledge of world and events after 2021" - so maybe as a finetune?


but doesn’t finetuning result in forgetting previous knowledge? it seems that finetuning is most usable to train “structures” not new knowledge. am i missing something?


Kind of implies that OpenAI are lying and using customer input to train their models


Unless you have an NDA with Open AI, you are giving them whatever you put in that prompt.


Also, at some point some users ended up with other users’ chat history [0]. So they’ve proven to be a bit weak on that side.

[0]: https://www.theverge.com/2023/3/21/23649806/chatgpt-chat-his...


> However, ChatGPT risks exposing confidential intellectual property.

I don't remember seeing this disclaimer on the ChatGPT website, gee maybe OpenAI should add this so folks stop using it.


If you use ChatGPT through the app or website they can use the data for training, unless you turn it off. https://help.openai.com/en/articles/5722486-how-your-data-is...


Providing my data for training doesn't imply that it risks being exposed.

If you understand what happens on a technical level, it might be possible, but OpenAI has never said this was a risk by using their product.


Absolutely. For example it doesn't say that OpenAI employees can't look at everything you write.


It's pretty clear in the FAQ to be fair.


The comment you are responding to is sarcastic


I believe it’s implying the free ChatGPT collects data and this one doesn’t.


I thought sama said they don’t use data going through the api for training. Guess we can’t trust that statement


That is correct, they do not use the data going through the API for training, but they do use the data from the web and mobile interfaces (unless you explicitly turn it off).


“We don’t water down your beer”.

Oh nice!

“But that is lager”


Another thing is that using ChatGPT for European companies might be in violation with GDPR – Azure OpenAI Services are available on European servers.


No

Edit: yes


I just love this comment.


This seems like such an obvious thing to do.

I see the use of general purpose LLMs like ChatGPT, but smaller fine tuned models will probably end up being more useful for deployed applications in most companies. Off topic, but I was experimenting with LLongMA-2-7b-16K today, running it very inexpensively in the cloud, and given about 12K of context text it really performed well. This is an easy model to deploy. 7B parameter models can be useful.


Is there an easy way to play with these models, as someone who hasn't deployed them? I can download/compile llama.cpp, but I don't know which models to get/where to put them/how to run them, so if someone knows about some automated downloader along with some list of "best models", that would be very helpful.


For llama, the 4bit quantized ones, small models like the 7b one. The ggml format. That will run on your local cpu. Google those terms too. you can look on hugging face for the actual model to download then load it and send prompts to it


Thanks, maybe it's as easy as downloading the ggml and running it with Llama.cpp. I'll try that, thanks!


there is also a python wrapper that has a web ui built in for llama.cpp, if it wasnt easy enough already


If you want to try out the Llama-2 models (7B, 13B, 70B), you can get started very easily with Anyscale Endpoints (~2 min). https://app.endpoints.anyscale.com/


I usually run them on Google Colab, and occasionally a GPU VPS on Lambda Labs. Hugging Face model card documentation usually have a complete Python example script for loading and running a model.


I'm a little confused by how the relationship works between OpenAI and Microsoft. It is possible for anyone to register for an OpenAI account and use their APIs. Within Azure the same thing is much more difficult as it is necessary to be a "real" business in order to use it. I maintain an open source OpenAI library and would like to add support for Azure but can't because of this restriction. Why can't I just use my regular Azure account?


Microsoft owns enough of OpenAI that their endgame goal of putting GPT like features into Azure and Office365 for enterprise customers is what we’re likely to see happen.

OpenAI will likely target private consumers while Microsoft focuses on enterprise. I can use my own organisation as an example. We’re an investment bank that does green energy within the EU. We would absolutely use GPT if it was legal, but it isn’t, and it likely never will be considering their finance model is partly to steal as much data as they can. Even if it’s not so polite to say that. This is where Microsoft comes into the picture. In non-tech enterprise you’re buying Microsoft products because everyone wants windows, outlook and office. We can wish it wasn’t like that, but where is the realistic alternative? I’m not anti Microsoft by the way, in all my decades in the enterprise business they’ve easily been the best and most consistent business partner for any IT. When Amazon saw how much money there was on the operations side of EU enterprise they quickly caught up, but Amazon doesn’t sell a Office365 product. So anyway, once you have Office365, you’re also likely to use Teams as your communications platform (which is why there is an anti-trust case against it), Sharepoint as your document platform, and, well, Azure as your cloud platform. Except you might use AWS because Amazon is also great. In some ways they are even more compliant with EU legislation than Microsoft.

But if Microsoft can throw GPT products into Azure the same way they put Teams and Sharepoint into Office365… well, then where is their competition? And having GPT features within Office365 will only further their advantage on the office platform. I mean, there are companies which won’t use Outlook, but there won’t be when ChatGPT writes your e-mails.

So this isn’t necessarily for you. It’s just part of Microsoft’s over all strategy for total IT domination in Enterprise. I mean, we’re going into RPA (robot process automation) a journey I went through in another Enterprise organisation a few years back. Back then you had to consider what go buy, would it be BluePrism, UIPath, automation anywhere, something else? Today there is no competition to Microsoft’s PowerAutomate if you’re already a Microsoft customer. It’s literally $500 a month vs $50k a month… I mean… that’s the future for GPT on Azure.

It’s probably necessary too. Their prices have made a lot of organisations look outside of Azure. Toward places like Hetzner or even self-hosting, but if Azure comes with GPT… well then.


OpenAI APIs have pretty much as clear a contract as you can get with a third party.

> Starting on March 1, 2023, we are making two changes to our data usage and retention policies:

> OpenAI will not use data submitted by customers via our API to train or improve our models, unless you explicitly decide to share your data with us for this purpose. You can opt-in to share data.

> Any data sent through the API will be retained for abuse and misuse monitoring purposes for a maximum of 30 days, after which it will be deleted (unless otherwise required by law).

https://openai.com/policies/api-data-usage-policies


It would still be illegal to use it, but you're right that I shouldn't have been so conspiratorial.


>> We would absolutely use GPT if it was legal,

Can't you just use the Azure service now?


Interesting release, though still lacking a few features I've had to resort building myself such as code summary, code base architecture summary, and conversation history summary. ChatGPT (the web UI) now has the ability to execute code, and make function callbacks, but I prefer running that code locally, especially if I am debugging. This latter part, conversation history summary, is something that ChatGPT web UI does reasonably well, giving it a long history, but a sentiment extraction and salient detail extraction before summarizing is immensely useful for remembering details in the distant past. I've been building on top of the GPT4 model and tinkering with multi-model (gpt4 + davinci) usage too, though I am finding with the MoE that Davinci isn't as important. Fine tuning has been helpful for specific code bases too.

If I had the time I'd like to play with an MoE of Llama2, as a compare and contrast, but that ain't gonna happen anytime soon.


This is a neat project from Microsoft

I've been building https://gasbyai.com, a beautiful chat UI that support self-hosted, with ChatGPT plugins, extract content from pdf/url. GasbyAI supports Azure, OpenAI, and custom API endpoints in case you want to run with your own models


Pretty sure Azure has a moderation endpoint enabled by default that makes using the OpenAI API an awful experience.


We have this at IKEA for a while now. Not impressed, but funny to read the hallucinations.


I'd expect a company like IKEA to have the expertise to create interfaces specific to their workflows so hallucinations aren't an issue.

Imo if you're making an open ended chat interface for a business, you're doing it wrong.


Have you considered instruction-tuning it with text, instead of just pictures?


Im not surprised Azure would add something like this to the stack. We build AnythingLLM (https://github.com/Mintplex-Labs/anything-llm) back in June due to some enterprise customers wanting something isolated they could run on premises with Azure OpenAI support + any vector DB they want.

With Azure's move to try to internalize any enterprise integration for AI it makes sense to make a chatbot wrapper because its a no-moat move. I think a lot of the "moat" if one can exist in the "chat with your docs" vertical is just integrations into flows and data sources SMB/Enterprises are already using.

For businesses, in my experience, the on-prem thing has been the first decision point - without question. Azure wrapper could be nice to have for those who cannot use chatGPT on the work comp but have access to this instead.

I wonder what kind of hypervisor view it gives to Azure admins for those who use it - it any. Multi-tenant instances was the second highest demand from SMB/Enterprise customers for AnythingLLM.


Yeah sure, I totally trust you after the Storm-0558 desaster


Darn I just spent a week or so working on a ChatGPT clone that used Azure ChatGPT API due to the privacy aspect. Wasted effort I guess.


This is exactly the same


Welcome to the club :)


I'm also in this club but we wrote it months ago.


Could anyone explain how this can be constructed as a private solution?

I'm not familiar with Azure platform.

Is the inference processed on private instance ? I can't imagine how it could be feasible given the hardware required to run gpt3.5/4.

So the best case scenario is:

1. A web ui runs on a private instances. So any user input (chat or files) are only seen by these instances 2. Any chat historisation or RAG is also done on these instances too. 3. Embeddings compuation may possibly be done on the private instance 4. The embeddings are then sent to the Microsoft GPU farm for inference.

So at one point my data has to leave my private network.

The problem is that the data can easily be retro-engineered from the embeddings.

How can this be presented as a private LLM ?


Interesting. One of my most requested feature for my small native apps[0][1] was to support Azure OpenAI service.

Apparently, many organizations have their own Azure OpenAI deployment and won’t let their employees use the public OpenAI service.

My understanding is that Azure makes sure all network traffic is isolated to their network so they have more controls over how their organization use ChatGPT.

I created a super simple step-by-step guide on how to obtain an Azure OpenAI endpoint & key here:

https://pdfpals.com/help/how-to-generate-azure-openai-api-ke...

Hope it would be useful to someone just getting started with Azure.

[0]: https://boltai.com

[1]: https://pdfpals.com


What's the practical difference between this and OpenAI API?

All I can see is the same product but offered by a larger organization. I.e. they're more likely to get the security details right, and you can potentially win more in a lawsuit should things go bad.


Compliance and customer trust. Azure can sign a BAA, for example. If you are Building LLM capability on top of your SaaS, your customers want assurances about their data.


A few months ago my team moved to Azure for capacity reasons. We were constantly dealing with 429 errors and couldn't get in touch with Open AI, while Azure offered more instances.

Eventually got more from Open AI so we load balance both. The only difference is the 3.5 turbo model on Azure is outdated.


you can ask for gpt-4, it took a while due to capacity constraints but we got it


The linked github repo was active yesterday and is now returning a 404.

anybody know why ?


How is this different from the other OpenAI GUI? Why another one by Microsoft? https://github.com/microsoft/sample-app-aoai-chatGPT.


I bet there are plenty of OKR/KPIs now tied to AI at Microsoft.


There's at least two more. There's also https://github.com/Azure-Samples/azure-search-openai-demo

And you can deploy a chat bot from within the Azure playground which runs on another codebase.


Bigger companies are cautious about using GPT-style products due to data security concerns. But most big companies trust Microsoft more or less blindly.

Now that Microsoft has an official "enterprise" version out, the floodgates are open. They stand to make a killing.


This is an internal ChatGPT, whereas that sample is ChatGPT constrained to internal search results (using RAG approach). Source: I help maintain the RAG samples.


i'm pretty sure it's a part of it


We just have to trust them and take their word for it? Or what?

https://azure.microsoft.com/en-us/explore/trusted-cloud/priv...

https://azure.microsoft.com/en-us/blog/3-reasons-why-azure-s...

I guess I would trust them, since they're big and they make these promises and other big companies use them.


This is awesome to see, feels heavily inspired (in a good way) by the version we made at Vercel[1]. Same tech stack: Next.js, NextAuth, Tailwind, Shadcn UI, Vercel AI SDK, etc.

I'd expect this trend of managed ChatGPT clones to continue. You can own the stack end to end, and even swap out OpenAI for a different LLM (or your own model trained on internal company data) fairly easily.

[1]: https://vercel.com/templates/next.js/nextjs-ai-chatbot


Really, they use Vercel AI SDK?


Well, they did. The repo appears to have been deleted.


Is this a full, standalone deployment including GPT-3 (or whatever version) or just a secured frontend that sends data to GPT hosted outside the enterprise zone?

Edit: Uses Azure OpenAI as the backend


I'm confused. If this is just a front-end for the OpenAI API then how does it remove the data privacy concern? Your data still ends up with Azure/OpenAI, right? It doesn't stay localized to your instance; it's not your GPU running the transformations. You have no way of knowing whether your data is being used to train models. If customer data is sensitive, I'm pretty sure running a 70B llama (or similar) on a bunch of A100s is the only way?


Azure is hosting and operating the service themselves rather then for OpenAI, with all the security requirements that come with that. I assume this comes with different data and access restrictions as well and ability to run in secured instances (and nothing sent to OpenAI the company).

Most companies use cloud already for their data, processing, etc. and aren’t running anything major locally, let alone ML models, this is putting trust in the cloud they already use.


Ah that's fair. But it is my impression that the bulk of privacy/confidentiality concerns (e.g. law/health/..) would require "end to end" data safety. Not sure if I'm making sense. I guess microsoft is somehow more trustworthy than openai themselves...

EDIT: what you say about existing cloud customers being able to extend their trust to this new thing makes sense, thanks.


Right. If I was an European company worried about, say, industrial espionage, this wouldn't be nearly enough to reassure me.


Yes, this was my understanding.


Link is 404 now. Anyone fork it before it went 404?



This is not ChatGPT. It's just a front end for Azure OpenAI APIs. Not sure why they're so blatantly use the trademark. They will probably have to rename it soon.


Microsoft is a major investor in OpenAI. Guaranteed they worked with OpenAI on this and have partnership to use the trademarks.


Microsoft owns OpenAI so I doubt that they will be asked to rename this.


They'll only own 49% of shares.


we wrote a blog post about why companies do this here: https://www.lamini.ai/blog/specialize-llms-to-private-data-d...

Here are a few:

Data privacy

Ownership of IP

Control over ops

The table in the blog lists the top 10 reasons why companies do this based on about 50 customer interviews.


It was really good when the access was enabled via OpenAI, but ever since its moved to Azure subscription, getting preview access is stalled. Wouldn't be a big deal for others, but for smalltime devs like me it becomes a big challenge.. Hope OpenAI provides a developer env or so where we can try things out..


Nothing in the repo details how this addresses privacy concerns of running inference on someone else's LLM. To be isolated from other users of the service is not the same thing as having a private inference engine.

> Private: Built-in guarantees around the privacy of your data and fully isolated from those operated by OpenAI.

Do tell.


So where do you draw the line? No cloud instances, no cloud SQL like Snowflake, no Teams or Office 365, no S3/blob storage? Run everything on-prem like 10 years ago?

It's only going to get more impossible. All that VC money going in at 100x revenue needs a return and they aren't going leave money on the table with full-featured open-source or CentOS type alternatives.

All those data engineering startups, database providers with 'open-source' + cloud hosting, the 'open-source' is going to be just 'open' enough to claim there is some fallback for someone else to pick up the mantle using the community version, if the cloud version gets enshittified beyond reason.

You're not going to even be able to run the full-featured software version on-prem because the economics of cloud are so much better.

Unless you are writing and compiling your own code you are going to be out of luck if your privacy standard is that high. That war has been lost. And Web3 sure ain't gonna save you either.


They should clearly spell out what is and is not "private". As it is we simply have a blurp about some undefined guarantees. And some comments here in thread saying "this is as close as you're going to get to local GPT" are deeply wrong. But then there is easy VC money (just like with ADs ..) and certain "clever" geeks throw social responsibility out the window as usual and are pushing all sorts of deeply invasive applications ("let our proxy for Microsoft hoover your inbox!") based on these undefined "Privacy guarantees".

If we accept this just we accepted the very flawed solutions we were given by corporation regarding social networking and ads, we are going to be stuck with it, suffer the consequences, and there will be no incentive to develop alternatives that actually address issues and work.

Homomorphic Encryption works. It just doesn't work very efficiently right now but that is an intellectual problem that can be solved if we push for actual privacy for this critical technology as it will be fully enmeshed in all parts of our lives.

"Think of the children" if that helps.


> However, ChatGPT risks exposing confidential intellectual property. One option is to block corporate access to ChatGPT, but people always find workarounds

Pretty bold thing to say to your potential clients. "You can always tell your employees not to use our product, but they won't listen to you."


It's almost like employees might have their own computers?


Anyone have any thoughts as to ballpark costs to run this? My napkin math on the cosmo-db requirements is failing me (largely because I do not know Azure at all).

I'm wondering as a hobbyist / tinkerer if a solution like this is "affordable" (I know it's all relative)


I did a unit of AI at university, and the front of the textbook contained a quote by some ye olde AI theorist, something like:

"I'm not concerned that artificial intelligence will take over the world. I'm concerned that human intelligence has yet to do so."


Since the only users who would likely care about this derive far more value than the $20/month of OpenAI's direct offering. Why doesn't OpenAI market this service, but with chat history, for something like $200/month?


That's a laughable price for an enterprise subscription.

And the reason is, it's enough for OpenAI to "say" that they're "not going to use your data" - you need a cloud deployment where you can control network boundaries to _prove_ that your data isn't going anywhere it isn't supposed to.


Unless you're physically controlling the network boundaries, how are you proving that on any cloud service?


OenAI IS Microsoft. Don't get tangled in the web of creating different entities when they are all part of the same pyramid. Also GitHub IS Microsoft too!!


GitHub was acquired by Microsoft, and they are no longer legally separate entities.

Microsoft is an investor in OpenAI, but does not own it, and they are legally separate companies. OpenAI is not Microsoft and it is factually incorrect to claim that OpenAI is Microsoft.

[1] https://blogs.microsoft.com/blog/2023/01/23/microsoftandopen...


But saying they're just an investor isn't quite doing the arrangement the justice it deserves. There seems to be a lot of strings attached to that investment.

It's not just a straight trade of dollars for shares, but many further contractual obligations.


I understand that perception but "seems to be a lot of strings" is all that is publicly known. None of those further obligations seem to have been disclosed. Without that disclosure it's a bit of a conspiracy theory?

Thus, it could very well be OpenAI has taken dollars, is commercially selling its technology to Microsoft on terms which aren't special, and sama and the OpenAI executive team and board has independently concluded that engaging in the partnership is a stellar way to grow their OpenAI brand, business and valuation?


is there away to run this on AWS instead.

we were looking to explore Llama2 for internal use


We can run llama 2 on an AWS vm if you have enough GPUs: https://lamini.ai/

Install in 10 minutes.

Make sure you have enough GPU memory to fit your llama model if you want good perf


OpenAI models are exclusively Azure only. Llama2 should have an AWS option I believe?


Have your engineers set this up internally https://huggingface.co/spaces/huggingface-projects/llama-2-7...


You can’t really replace ChatGPT 4 with llama2 7B.


Yeah right try getting the same answer after two months



Amazon Bedrock makes Claude 2 available, as well as some other models.


Msft spent a lot of money to ensure that was not an option w chatgpt


https://about.fb.com/news/2023/07/llama-2/ https://huggingface.co/blog/hugging-face-endpoints-on-azure

You can of course run Llama2 in Azure, but you can't host OpenAI models in AWS


I've tried it out. Right now it seems more of a proof-of-concept than a real-world application. Having said that, the concepts and ideas in there are definitely reusable.


IS it possible for someone to give us the lower bound on the cost of running a 70B model in the cloud? How much memory does Llamba-2 take? What would it cost to fine tune it?



Yeah right for the three letter agencies to have a backdoor, hard pass on something that cannot be deterministic with a seed


Can any one shed light on what "local" means? Local on my own private machine or local in my Azure Tenant?


Assuming you are referring to this section: https://github.com/microsoft/azurechatgpt/blob/main/docs/3-r...

It means you run the front end (the chat-gui) and the backend code from the repo. This code connects to cosmo-db for uploading documents used for "chat with you pdf" and connects to an OpenAI instance on Azure for the chat inferrence.


I am pretty sure it means run the UI locally and access Azure-hosted ChatGPT. The environment vars seem to indicate that as well.


Azure API is definitely faster than OpenAI and they also seem to provide access to 32k generously compared to OAI.


How does this work in terms of utilization? The isolation presumably means buying gpu capacity and only using a %?


Basically you get N tokens/second (or if it was minute, can check tomorrow if you're really interested) per deployment. So if you would outgrow on deployment, just deploy another one (with the associated costs of course).

One deployment = a deployed model which you can query

On top of that, depending on the model you're using, you also see a cost increment for each 1000 request you make.


Ah right so it is somewhat shared. Not like your own gpu type situation


Why did you make that assumption?


When will LLMs be good enough to write the code for a competitive or better LLM to themselves?


If this happens, we certainly won't be the first to know.


Upvoted, cannot wait for this, yes yes yes. The companies have been waiting for this.


Crappy clone of ChatGPT frontend, half missing, half direct copy. Implied and overly vast claims of insecurity + lack of privacy, that are narrowly true, i.e. for _Chat_GPT.

Really surprised to see this aggressive of language 1) written down 2) on Github. I'd be pretty pissed if I was OpenAI, regardless of the $10B.


I think OpenAI is entirely on board with the idea that OpenAI sells to consumers and Azure/Microsoft sells the same product to enterprise.

That's how it's been working for months, and if OpenAI objected they would have done something about it.


I have no doubt OpenAI is on board. This is just bringing more paid users to their platform because it still uses their API.


Did this just kill a lot of AI startups that were targeting enterprises?


This is not private. It's still hosted on Microsoft's cloud.


Anyone know what the cost is for Azure VS OpenAI?


I believe the prices are identical:

https://azure.microsoft.com/en-us/pricing/details/cognitive-...

https://openai.com/pricing

disclaimer/source: I work at Microsoft on Azure/OpenAI


Looks like they removed it. Wondering why…


“private and secure” from the company that let contractor listen to your private Teams conversation for data labeling purpose, and monitor your activity on your own computer with their OS…


Move fast and break things, including basic security. Why anyone trusts Azure that all these prompts won't eventually be leaked is beyond me. No one goes broke trusting Azure, but I'd love it if someone was held responsible.

https://www.schneier.com/blog/archives/2023/08/microsoft-sig...


Huh. I missed this one. Got a link?



Ah yes it was Skype and not Teams, my bad.


I was looking through our server logs the other day and spotted the openai bot going through our stuff ... however a decent bit of our content is now augmented by GPT ...


so...how can we make this support plugins like Code Interpreter, Wolfram, Zapier or Workato, and whatnot?


I don't understand - chat with a file?

I want to chat and ask about an entire body of knowledge - wiki pages, git commit diffs/messages, jira tasks.


Now returns a 404. Interesting.


No better than the API.


Well, your AChatGPT connection might be private but your Windows machine will leak like a sieve. It is embarrassing how needy the blasted things are about signing in via Azure/Microsoft, instead of a local or AD account. Even worse is the naff "choose how insidious you would like us to be" series of questions. How would you like your ads? Targeted or non targeted? How about not at all? Nope.

In this day and age, exactly how private does anyone expect their comms/thoughts/files/data to be? I recall reading a recent MS EULA and it seems I have to say three Hail Marys every third Tuesday for using Arch Linux on my PCs. I could install Edge, and did but I don't like the nasty homepage - a bit right wing ... - why on earth is a browser pushing "news"? Its a browser. To be fair I had to dump all the homepage crap that Firefox pushed when I finally dumped anything to do with Chrome.

Please don't use the words private and secure when you have your fingers crossed behind your back.


The article is referring to enterprise usage - and you're quoting all the consumer level attributes (aka cheap/sometimes subsidized version).

At the enterprise level where this is intended to be ran, things are much diffrerent.

If you're not aware of the differences or use cases, perhaps you're not the target audience who should be using or configuring it.


Why don't we give the willy waving a miss?

Win 10 and 11 are steering you to cloud first, out of the box. That's fine if you like it, but I don't and quite a lot of my customers don't.

The real problem is about data sovereignty. I'm a Brit and ... MS isn't.


The article is about use in an enterprise. An enterprise runs professional/enterprise/ltsc versions which do NOT steer you to the cloud - what data sovereignty concerns have you seen in those editions of windows/server? They've gone through a lot of pains to ensure those concerns are taken care of for enterprises/governments so i'm curious the ones you think they missed.

You can make the argument for their consumer editions sure, but that's a different product with different features, different price point for different users.


Can you fine tune it?


Yes! You can.


Is it the same api as the public OpenAI


How?


Literally 404


Our company is pushing everyone to use a similar offering. Most of the company is doing low value work … still using excels even though we have a custom ERP. Now seeing people who couldn’t write a coherent email before write 3 page emails. The illusion of being productive by doing more work even though it has zero impact on the bottom line. It’s insane how inefficient organisations are. No doubt we’ll have some KPI soon about using the tool.


If anything it's less productive because people have to parse all that nonsense.

I was gobsmacked to hear a friend say that their work guidance is to use ChatGPT to write letters to external clients for example. I know for sure I'd be insulted if someone sent me paragraphs of text to read created from a sentence long prompt. I'd rather have the prompt, my time is valuable as well.


Exactly right. If you increase entropy you need energy to reduce it back. It be more valuable to take crap that humans have put together incoherently and summarizing it. (Perhaps someone should put a GPT on the other end in order to read it)

I honestly don’t know why we’re so obsessed with having LLMs generate crap. Especially when they’re very capable of reducing, simplifying. Imagine penetrating legal texts, political bills, obtuse technical writing, academic papers and making sense of those quickly. Much more useful imo.


The amount of othewise very smart people who completely lose the ability to think critically when it comes to "AI" is really interesting to me.

I'm not anti-AI; I've recommended that we use it at work a few times where it made sense and was backed by evidence/bencharmks. But for essentially any problem that comes up someone will try to solve it with ChatGPT, even if it demonstrably can't do the job. And these are not business folks, these are engineering leaders who absolutely have the capability to understand this technology.


I think the more common case is to have a handful of bullet points and some notes and ask chat GOT to put into a coherent letter for an external customer with the goal of XYZ. I’ve done similar things and it is a huge timesaver. I still have to edit it, but it gives me a start that’s probably on par to what a Junior engineer would write as a first draft.


ahhhh, but they're pasting the 3 page email into ChatGPT ("summarize this"). The future is here.


Yeah that's one of the insane things that will happen.

Very soon everyone will in effect "hide" behind an agent that will take all kinds of decisions on one's behalf. Everything from writing e-mails to proposals but also to sue someone, make financial decisions, and be a filter that transforms everything going in or out.

I can't imagine this world really. How the hell are people going to compete or stand out? Doesn't it seem that what little meritocracy existed wills soon drown in noise?


I was scared about organizations doing this and losing their connection to the humans they serve.

The realization that individuals will also have this barrier to the world is even scarier.

If it goes that way we could be looking at a change to society on the level of social media, again. Mad.


Wouldn't be surprised if that was next Outlook feature.

Cue someone making some horrible error because some crucial information didn't survive ChatGPT->ChatGPT round-trip


Actually this was in an Azure hacktoon some time ago https://devpost.com/software/amabot



I write emails and put it into chatgpt and ask it to make it more concise or point out issues. No utility in asking chatgpt to needlessly expand the text...


You'll just have people reversing it into a summary on the other end, kind of like a "text" chat where both sides are using text-to-speech and speech-to-text instead of having a phone call.


What ERP are you using?

We've found some early success selling to companies with older "long-tail" ERP's. I've been finding a new one every day.


It’s a proprietary ERP completely custom. Think it was deployed through an acquisition. The problem isn’t the ERP it’s the business. “We want custom processes” but hire the cheapest developers possible to maintain the ERP and then complain about bugs. “We’re agile™” … but have the same inefficient processes for the last 3 years. Cargo cult org, the CEO was taking about Black Swans during COVID … even though Nassim Taleb explicitly said COVID wasn’t a black swan event.


I’ve learned that the most important writing skill is to figure out what you’re trying to say — this is a rather important prerequisite to writing well.

Naively asking a chatbot to write for you does not help with this at all.

It would be interesting to try to prompt ChatGPT to ask questions to try to figure out what the user is trying to write and then to write it.


The big question: If this is truly secure and private, can people use it to generate things related to porn or violence?


"Private and secure"

From Microsoft?

Ha.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: