I have been working on getting ChatGPT to answer questions that equity research analysts, investors would like to get from SEC filings. The application uses a combination of hybrid text search and LLMs for completion and does not rely much on embedding based distance searches.
A core assumption underlying this is that LLMs are already pretty good and will continue to get better at reading texts. If provided with the right thing to read, they will do very well on 'reading comprehension'.
Open ended writing is more susceptible to errors, especially in questions related to finance. For e.g google's revenues are just as likely to be 280.2 billion vs 279 billion in a probabilistic model that guesses the next part of the sentence - Google's revenues for FY 2022 are ....
So this leaves us with the main problem to solve; Serving the right texts to the LLM aka text search.
Once the right text is served, we can generate any pretty much anything in the text, Income statements, ceo comments, accounts payable on the fly, For e.g try - `can you get me Nvidia and AMD's income statement from March 2020 ?` as in here. https://imgur.com/gallery/H8Vfd5X
A few more examples, Apple's sales in China, Google's revenue by quarter: https://imgur.com/a/oCCay3o
Currently, the application supports ~8k companies that are registered with the SEC. Pdfs are still work in progress, so tesla etc don't work as well.
The stack is Nextjs on Supabase. So Postgres's inbuilt text search does a lot of heavy lifting.
If one thinks of the bigger picture, we can extend/improve this to pdfs and the entire universe of stocks and more. a.k.a a big component of what CapitalIQ, Factset, Bloomberg and Reuters do can now be generated on the fly accurately for a fraction of the cost.
Generating graphs with gross margin increasing etc are just one step further and stuff like EV/Ebitda, yet another step further, as one can call a stock pricing api for each date of the report.
I would guess a number of LLM applications follow a similar process, ask a question --> LLM converts to query --> datalakes/bases --> searching and serving texts --> answer.
Goes without saying, I would appreciate any feedback, especially from those who are building stuff that looks architecturally similar :) !
I've been trying something similar with parliamentary debates. They're long winded, often full of empty speech, and a chore to read.
The LLMs are able to hone in on the details and provide interesting responses like "What questions were asked of the minister that they failed to address" and "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer"
I think one thing you can try is to figure out what lies where, so chunking arbitrarily will not work as well as chunking with headings for e.g
For e.g, if the question is: "What should the opposition leader have mentioned in their response that the minister would have found difficult to answer."
An embedding based search will find it fairly difficult to match this against a text. Based on my experience, you have to figure out what is a nonanswer first (i don't think that's easy but gpt4 is very good at a lot of language based stuff.)
you can try Q1: question A1: Answer then prompt GPT, do you think A1 answers the question and then save it.
And then,
Q1: question, A1: Answer, Q2: Follow up based on the following questions, do you think A1 answers Q1 and then save it to a db.
You can then augment it in the code with our own knowledge of how politicians lie, using certain words etc :) to improve what gpt4 might miss...
> "We're definitely moving forward with strong measures to improve the situation, and while progress might not be immediately visible, we're fully engaged in this essential journey."
Could you ask an LLM if they were persuaded by a speaker of one side of a debate as a method of evaluation? Ie the bot's before and after opinion based on fine-tuning with the pro and con arguments?
I was also thinking about a society of bots type application where you could have autonomous bot researchers, debaters, judges and audience. Would be interesting to feed in the topics and grab some popcorn
I think you could ask an LLM if they were persuaded, but I don't think you'd get meaningful data from it
Leaving aside the bots that are trained to answer "as an LLM I do not have opinions..." it's going to be a very basic probabilistic yes or no based on tone and numbers of pros and cons rather than knowledge of the surrounding political context and higher order reasoning about the accuracy of the claimed pros and cons.
Yeah the bot has both no initial position and no internal opinion on fact that an argument could operate on. It might write you a cogent response but it's fundamentally not a question an LLM can meaningfully answer.
I like this question. I've always wanted to be able to compute, somehow, the "partial derivative of x with respect to y," where x is a proposition and y is an argument or a piece of evidence. It seems to me as though we're closer than ever before. The models are still fairly opaque, but I'm hoping the interpretability research will succeed and allow such introspection!
If you tell the LLM it is conservative it will report it was persuaded by the conservative speaker, if you tell it it is a liberal it will say it's persuaded by the liberal speaker. It will be happy to roleplay at being persuaded but I don't know why that would be useful?
I once had a collegue who used ChatGPT to sumarize our then employer's SEC filings. Results were, well, putting it mildly, mixed. Best case was a slightly less biased version of the "shareholder letters" (read: propaganda pieces) published around the same time.
What ChatGPT completey missed was stuff like omissions (I'll kind of give it a pass here, how can software analyse the absence of something without having access to supplemental documents; it still shoes hoe dangerous it is to rely only on a LLM for such analysis) and, more importantly, the connection between certain tid bits and, and here it became outright dangerous, ChatGPT didn't provide anything meaningful on risks and financials.
The tid bit it missed, one of the most important ones at the time, was a huge multi year contract given to a large investor in said company. To find it, including the honestly hilarious amount, one had to connect the disclosure of not specified contract to a named investor, the specifics of said contract (not mentioning the investor by name), the amount stated in some finacial statement from the document and, here obviously ChatGPT failed completely, knowledge of what said investor (a pretty (in)-famous company) specialized in. ChatGPT did even mention a single of those data points. Fun fact, said contract covered a significant junk of the amount the investing company had invested to begin with. And all that during a time in which the financial stability of the reporting company was at least questionable. Oh, and ChatGPT didn't even realize that risk (cash and equivilants on hand devided by burn rate per year is simple maths), or repeat the exact passage in which the SEC filling said that the survival of the reporting was in doubt.
In short, without some serious promp working, and including addditional data sources, I think ChatGPT is utterly useless in analyzing SEC filings, even worse it can be outright misleading. Not that SEC filings are increadibly hard to read, some basic financial knowledge and someone pointing out the highlights, based on a basic unserstanding of how those filongs actually work are supossed to work, and you are there.
I do not really imagine this as something that does all the investment work and makes a decision.
Instead, the mental model is you have an army of people who can read texts really well, as in 'reading comprehension' as they call it in english tests... This army can get you information on the fly.
Investment research involves a lot of back and forth reading and fetching tables and making conclusions, which in turn might not have much to do with stock price performance, but there's a whole industry of financial information and news for that :)
So currently, the scope is to make widely available information beyond what FactSet and CapIQ offer and even that's a long way away :)
I think the challenge with using ChatGPT to summarize or read factual data lies in the probabilistic nature of LLM outputs. So, your experience is what you should expect from LLMs. Though my understanding of OPs answer is that instead of using OpenAI to read documents directly they use OpenAI to generate queries to read the document instead.
It seems like a system incorporating an LLM would be good at parsing a document to match all investors mentioned with the amounts invested and spot any major differences. That a generalized tool can't do that out of the gate doesn't seem surprising.
I tried this as a little hobby weekend project but found that after a while it would start hallucinating answers even if previously it had gotten them right. It didn’t even take that long sometimes, where I’d ask a question about revenue, then liabilities, and them to sum some revenue numbers and they would just start to be wrong.
I wouldn’t yet feel comfortable with this without some automated reconciliation which to my mind defeated the point of my hobby project but I’m curious if you’ve seen different? No doubt you’d expect this to improve over time though.
You can try it out for yourself... :) Here's an example, that asks for AMD's cash and makes an arbitrary calculation on total liabilities, the ai is smart enough to sum up everything until equity and gets the numbers right, without any hallucination.
Total Current Liabilities 7572
Long-term debt, net of current portion 1714
Long-term operating lease liabilities 393
Deferred tax liabilities 1365
Other long-term liabilities 1787
12831
One of the advantages of data from CapIQ /Refinitiv is that you're not just pulling data from a single report but rather data has been curated across time from multiple historical reports so that historical income statements, balance sheets, footnote data etc spanning many years can be generated.
When you say that generating graphs of gross margin, EV/Ebitda is just one step further, are you talking about generating those based in just a single report's information or are you combining information from multiple years to for example show gross margin trends over 10 years and EV/TTM Ebitda?
I am talking about comparing multiple reports, i.e gross margin trends over 10 years and EV/TTM Ebitda etc across several reports. Currently only financials are possible, but the ratios depend on stock prices, so we are working on that !
You can think of it like this, you now have an army of readers that can go through tables really quickly.
FactSet, CapIQ etc use a combination of automation/manual entry and fit these tables into a homogenized schema so that they can be saved, compared etc. So if you want to get Apple's Greater China sales from 2020, you would be lucky if they decided to create an item for that. https://imgur.com/a/bp2hb7n
Here are two examples, AMD's revenues and AMD's revenue outlook using beatandraise.com
https://imgur.com/a/61jqiUk
I doubt you can get AMD's own outlook on CapIQ for e.g
Context sizes mean getting 100s of reports on one call is not possible, but multiple iterations will still do the trick. So in effect, you can actually create a dataset like FactSet for a lot lower cost, more comprehensive and can be customized to what the user wants, if you see my point... :)
It would be possible for example to get AMD's revenues over time like this, it's tedious because of context size, but it's unrestricted, so you can get whatever datapoint you want...
If we run this query over an api for income statements for all 8k companies, we pretty much have all income, balance sheet, cashflow items, shares outstanding etc. Add stock price data, that can give you EV/Ebitdas, P/Es and all that stuff.
Looks cool and seems like a valuable tool! I really like the idea of LLMs that give you rich answers like this.
I'm curious how you're accurately extracting the data though. Are you prompting to respond in a JSON format, using OpenAI's functions or something else? How do you ensure you have the correct label, dates, values, etc?
Nice. How often do functions fail? I haven't played around with them yet so no idea about reliability.
In regards to extracting the correct data, is that done through the function definitions? You specify all the fields you want to extract in the function and then let GPT go ham?
E.g. operating income, interest income, interest expense, etc.
The functions tend to fail when the prompt is complex and the user asks for a lot of fields, and typically the last field in the json is not closed i.e missing a }, i guess openai is aware of it. It doesn't fail that often to have to write a workaround, atleast not yet.
So like a lot of applications, the problem boils down to being able to serve the right text. You have something that can read and do basic inference .... You need to tell it what to read so that it can answer your question. But it can only read 16k tokens (20k words at best). So that's the basic problem. As it's universal, i.e a problem across applications, its going to get better and information will be a lot easier to get access to...
So (if I get it right) you are using the LLM to convert the human language question into (presumably) code that is basis for running "normal" searches and returning text.
Avoiding any fears of hallucinations.
But earlier you say "LLMs will get better at comprehension". So are you using an LLM to markup the original text in some way ?
Yes, that's right, I avoid hallucinations that way. In addition, I mark up existing texts so that the LLM knows what it is reading even if it is a piece of a larger text, so for e.g if you need to get Apple's Greater China sales for each quarter in 2020... https://imgur.com/a/oCCay3o
I am not making 4 calls with the entire text, I instead get pieces of each which would best match the question.
There are additional challenges, for instance, GPT4 struggles to know the difference between the words guidance and outlook, which mean the same thing but somehow they don't for GPT4.
When I say they become better readers, I meant it in a general sense as in better in the case above. You basically have someone who can read through tables really well, and that can change investment research fundamentally, which is a lot of reading tables and graphs :)
A few things on prompting:
1. Get me google cloud revenues fails, because somehow gpt4 thinks i am talking about an entity called google cloud and not google :)
2. So in order to fix it, you can either ask for Get me google's cloud revenues or get me google cloud revenues from google's results...
As you can see, inspite of all the training in the real world, gpt4 thinks google cloud is more of an entity than google is, based on that question :)
This product would be a fantastic honeypot for front running researcher interest in particular investments.
I don't see anything in your privacy policy that gives me comfort my (even anonymized) interest in a particular firm isn't feeding your own signals.
Even if your policy promised it, I'd want to see technical controls implemented, since incentive to leverage information on searching would be so high.
Hi, Thanks for bringing that concern up. I shall keep it in mind and change it based on feedback from customers if it is an issue.
Typically, people search for a company after a price move rather than before.
And these are searches on publicly available data, ie data that is not proprietary and already filed with the sec.
Needless to say, none of the chat traffic is used or will ever be used to feed any trading signals for anyone.
> Typically, people search for a company after a price move rather than before.
People search for a company for a reason, generally to inform a thesis towards buying or selling, and regardless of their thesis, there will generally be a price move when they act.
> And these are searches on publicly available data, ie data that is not proprietary and already filed with the sec.
Of course. But the information that someone is interested is not public, so you are crowdsourcing indication of interest.
Effectively you have a leading indicator (however soft) for order flow.
> Needless to say, none of the chat traffic is used or will ever be used to feed any trading signals for anyone.
Point is, it's not "needless" to say, it's "needful". And something other than "trust me" would be in order.
I am using openai's latest functions api, so you can get it to return arguments that will ensure that you get a json, it works pretty well most of the time.
The json would then be used to fetch a report from a database.
Pdf parsing was more tedious that I would have liked at this stage so I stuck to the SEC which requires that companies file in a text format :) so that helped.
I used poppler on a digital ocean droplet, but the sheer variety of company pdfs especially european companies, some of which have to be OCRed, meant results were not really uniform. GPT still does very well, but not as well as on text documents directly. So in short, this is next on the list...
A core assumption underlying this is that LLMs are already pretty good and will continue to get better at reading texts. If provided with the right thing to read, they will do very well on 'reading comprehension'.
Open ended writing is more susceptible to errors, especially in questions related to finance. For e.g google's revenues are just as likely to be 280.2 billion vs 279 billion in a probabilistic model that guesses the next part of the sentence - Google's revenues for FY 2022 are ....
So this leaves us with the main problem to solve; Serving the right texts to the LLM aka text search.
Once the right text is served, we can generate any pretty much anything in the text, Income statements, ceo comments, accounts payable on the fly, For e.g try - `can you get me Nvidia and AMD's income statement from March 2020 ?` as in here. https://imgur.com/gallery/H8Vfd5X A few more examples, Apple's sales in China, Google's revenue by quarter: https://imgur.com/a/oCCay3o
Currently, the application supports ~8k companies that are registered with the SEC. Pdfs are still work in progress, so tesla etc don't work as well.
The stack is Nextjs on Supabase. So Postgres's inbuilt text search does a lot of heavy lifting.
If one thinks of the bigger picture, we can extend/improve this to pdfs and the entire universe of stocks and more. a.k.a a big component of what CapitalIQ, Factset, Bloomberg and Reuters do can now be generated on the fly accurately for a fraction of the cost.
Generating graphs with gross margin increasing etc are just one step further and stuff like EV/Ebitda, yet another step further, as one can call a stock pricing api for each date of the report.
I would guess a number of LLM applications follow a similar process, ask a question --> LLM converts to query --> datalakes/bases --> searching and serving texts --> answer. Goes without saying, I would appreciate any feedback, especially from those who are building stuff that looks architecturally similar :) !