This is what my local LLama 13B Q5 had to say on this:
> There are two ways to approach this problem. First, we can count each person by gender and then add them up. Second, we can use the given information about the siblings and their genders to find a quicker solution.
> Let's go with the second method:
> 1. Sally has one sister (herself).
> 2. Each brother has two sisters. So, there are 3 sisters
for each of her brothers.
> 3. Since there are three brothers, there are a total of 3 x 3 = 9 sisters for them all.
> 4. Now that we know Sally has one sister and the rest belong to her brothers, we can subtract her share from the total: 9 - 1 = 8 sisters.
I am running the 7B and it gave:
Q: "Sally (a girl) has 3 brothers. Each brother has 2 sisters. How many sisters does Sally have?"
A: "If Sally has 3 brothers, and each brother has 2 sisters, then the total number of sisters is:
3 brothers x 2 sisters per brother = 6 sisters
So, Sally has 6 sisters in total."
The official deprecation was made in Dec 2023. Removing it in the next feature update is too soon. Lots of people still having working WMR units, so this feels super cheap.
I think the plan was for the Southwest to be given take off clearance quickly before the private jet arrived, but the controller got distracted by a route amendment for another flight that came in and they got their priorities wrong. In the US it's not uncommon for a 'Cleared to Land' to be given while something is then lining up and waiting for immediate Take Off clearance.
In the EU / Canada this isn't allowed (landing clearance is only given if the runway is completely clear), but then the US pushes a lot more aircraft through its busy airspace this way. San Diego is a busy airport with just a single runway (due to the geography) near a couple of military bases and has a steep approach over the city - if there had been some ground fog like there often is at KSAN then this could have ended very differently.
Also, you can see in the video that due to the steep approach, the SWA was at the beginning of the runway, but the jet on approach would have landed further down, as the landing is offset there. This still would have be been dangerous but not a definite case of collision.
Yea. The incoming jet was on a 5-mile final. Being a small jet it's "slow" so that 5-mile final is 3-5 minutes. You can see 19:04Z when the Southwest jet get's the "line up and wait". 19:07Z is the missed approach. If ATC has given them the departure clearance relatively quickly, there would have been no issue as Southwest would have hit the gas and been long gone... but ATC didn't.
This is not uncommon for aircraft to get on the runway and take off while another aircraft is on final. If you want a good video of how this more typically works (and sometimes doesn't work), here's one from an incident at SFO about 2 months ago: https://www.youtube.com/watch?v=DrRGde5J8mo
In both the SFO cases, they got the "go around" from ATC much earlier and no one got too close to each other. Keeping in mind that SFO is extremely busy and lands two parallel runways at the same time: https://www.youtube.com/watch?v=YsKhZm78DuY
They don't land planes at KSAN in the fog, because of the steep decent. So it would have ended very differently, because that Citation would have been diverted elsewhere.
Real error here was that SWA should have been told to hold short and then expedite as the SWA boys like to do after Citation was clear or to expedite from the getgo. Both would have worked just fine.
Also worth mentioning that while you can give landing clearance onto a runway that is not clear at the time, you still cannot have multiple aircraft on a given runway simultaneously, and there are very strict minimum separation requirements that vary depending on the airport which for KSAN is 5NM
This is an interesting article, and a bit of a mish mash of UI conventions, application use ideas for GPT and actual patterns for LLMs. I really do miss Martin Fowler's actual take on these things, but using his name as some sort of gestalt brain for Thoughtworks works too.
It still feels like a bit of a Wild West for patterns in this area as yet, with a lot of people trying lots of things and it might be too soon for defining terms. A useful resource is still things like the OpenAI Cookbook, that is a decent collection of a lot of the things in this article but with a more implementation bent.[1]
The area that seems to get a lot of idea duplication currently is in providing either a 'session' or a longer term context for GPT, be it with embeddings or rolling prompts for these apps. The use of vector search and embedded chunks is something that seems to be missing so far from vendors like OpenAI, and you can't help but wonder that they'll move it behind their API eventually with a 'session id' in the end. I think that was mentioned as on their roadmap for this year too. The lack of GPT-4 fine tuning options just seems to push people more to look at the Pinecone, Weaviates etc stores and chaining up their own sequences to achieve some sort of memory.
I've implemented features with GPT-4 and functions and so far it's feeling useful for 'data model' like use (where you're bringing json into the prompt about a domain noun, e.g. 'Tasks') but is pretty hairy when it comes to pure functions - the tuning they've done to get it to pick which function and which parameters to use is still hard going to get right, which means there doesn't feel like a lot of trust that it is going to be usable. It's like there needs to be a set of patterns or categories for 'business apps' that are heavily siloed into just a subset of available functions it can work with, making it more task-specific rather than as a general chat agent we see a lot of. The difference in approach between LangChain's Chain of Thought pattern and just using OpenAI functions is sort of up in the air as well. Like I said, it still all feels like we're in wild west times, at least as an app developer.
Agreed, that is a good resource for sure. For tooling I like https://promptmetheus.com/ but any pun name gets bonus points from me.
> For in-context learning, I think it is fair to expect 100k to 500k context windows sooner. OpenAI is already at 32k.
It has been interesting to see that window increase so quickly. For LLM context the biggest thing is the pay-per-token constraint if you don't run your own, so have to wonder if that is what will be around in the future given how this is trending? Just in terms of idempotent calls, throwing everything in context up every time seems like it makes it likely that OpenAI will encroach on the stores side as well and do sessions?
I've set this up in a staging environment of one of our apps to take a look. The staging environment we use for one-offs is on Heroku (can stand one up/down quickly), and the first issue is that a lot of the 'easy to deploy is a feature' PAAS platforms is that they bill by web/worker size restricted by memory rather than just pure virtual CPU power. Render etc all does this as well, and the memory headrooms are low.
This means that you often have to deal with, say 512 MB per instance, and then if using something like Puma, have to work out how to split concurrency vs memory footprint. What I'm finding is that v3.2 YJIT loves memory, so I have to trade off that, which means less concurrency per process. Benchmarking it quickly, the 15% gains I might get on the thread aren't worth having to move to just 2 threads on a 512 MB instance versus the 3 threads I can get with YJIT not enabled.
I think it's really neat and will continue to track it, but the performance in terms of memory trade-off isn't there quite as yet for our app profile. Not sure if others will find the same, but I guess its if their production environments aren't PAAS's with low memory headroom or not.
OT(ish) but a t3.medium on ec2 with 2v cores and ~4gb of mem is $33 a month ondemand. $20/mo on ri, and $10/mo spot.
I'm mostly done with Heroku. With one kinda big app left, all other envs, and projects are now just on aws without the tax of heroku.
And I was a big heroku fan, but their recent decisions made me shop around. I do miss the price $$ of metal in a dc, but not the price :clock: of metal.
The 7GB runs great on a 3080Ti. I am getting a lot of 'ValueError: Max tokens + prompt length' errors with larger files. Can this Gitlab client also replace the vocab.bpe and tokenizer.json config like Copilots? Thanks for your work on Fauxpilot, really enjoying playing with it.
I believe right now the VSCode extension just passes along the entire file up to your cursor [1] rather than trying to figure out how much will fit into the context limit – it's definitely still very early stages :)
It would be pretty simple to run the contents through the tokenizer using e.g. this JS lib that wraps Huggingface Tokenizers [2] and then keep only the last (2048-requested_tokens) tokens in the prompt. If they don't get to it first I may try to throw this together soon.
This seems to work ok on a 3080Ti, even under WSL2. Nice!
For larger files I'm getting errors for maximum token + prompt length - is there an easy way to tweak these limits for the client? I think the Copilot client needed some overrides for Fauxpilot for this, so hoping Gitlab client has that too.
> There are two ways to approach this problem. First, we can count each person by gender and then add them up. Second, we can use the given information about the siblings and their genders to find a quicker solution.
> Let's go with the second method:
> 1. Sally has one sister (herself).
> 2. Each brother has two sisters. So, there are 3 sisters for each of her brothers.
> 3. Since there are three brothers, there are a total of 3 x 3 = 9 sisters for them all.
> 4. Now that we know Sally has one sister and the rest belong to her brothers, we can subtract her share from the total: 9 - 1 = 8 sisters.
>
> So, Sally has 8 sisters.
:|