Hacker Newsnew | past | comments | ask | show | jobs | submit | abstrct's commentslogin

I’m impressed the article managed to not mention Nathan Fielder even once.


No one is allowed in the cockpit if there’s something wrong with them, so if you’re here, you must be fine.

One of the all-time great seasons of television.


I'd nearly call it journalistic negligence. In fact, I think I will.


The most limiting factor I’ve come across is hitting the context window. Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.


> Eventually your new eager employee starts to forget what you’ve taught them but they’re too confident to admit it.

Seems very realistic!


No, it would be realistic if after two weeks on the job they start telling you how to run the company.


It should already have a published series of shelf-help books by that point


Are there methods to "summarize what they've learned" and then replace the context window with the shorter version? This seems like pretty much what we do as humans anyway... we need to encode our experiences into stories to make any sense of them. A story is a compression and symbolization of the raw data one experiences.


Yeah that's a fairly well studied one. Most of these techniques are rather "lossy" compared to extending the context window. The most likely "real solution" is going to be using various tricks and finetuning on higher context lengths to just extend the context window.

Here's a bunch of other related methods,

Summarizing context - https://arxiv.org/abs/2305.14239

continuous finetuning - https://arxiv.org/pdf/2307.02839.pdf

retrieval augmented generation - https://arxiv.org/abs/2005.11401

knowledge graphs - https://arxiv.org/abs/2306.08302

augmenting the network a side network - https://arxiv.org/abs/2306.07174

another long term memory technique - https://arxiv.org/abs/2307.02738


this is a fantastically useful comment. thank you filterfiber :)


Is there a realistic way to actually increase the context window?


Yes! The obvious answer is to just increase your positions and train for that. This requires a ton of memory however (context length is squared) so most are currently training at 4k/8k and then finetuning higher similar to many of the image models.

However there's been some work that to "get extra milage" out of the current models so-to speak with rotary positions and a few other tricks. These in combination with finetuning is the current method many are using at the moment IIRC.

Here's a decent overview https://aman.ai/primers/ai/context-length-extension/

Rope - https://arxiv.org/abs/2306.15595

Yarn (based on rope) - https://arxiv.org/pdf/2309.00071.pdf

LongLoRA - https://arxiv.org/pdf/2309.12307.pdf

The bottleneck is quickly going to be inference. Since the current transformer models need the context length ^2, the memory requirements go up very quickly. IIRC a 4090 can _barely_ fit a 4bit 30B model in memory with 4096k context length.

From my understanding some form of RNNs are likely to be the next step for longer context. See RWKV as an example of a decent RNN https://arxiv.org/abs/2305.13048


I’ve absolutely explored this idea but, similar to lossy compression, sometimes important nuance is lost in the process. There is both an art and science to recalling the gently compacted information and being able to recognize when it needs to be repeated back.


If there was something like Objects in OO programming, but for LLM’s, would that solve this?

Like a Topic-based Personality Construct where the model first determines which of its “selves” should answer the question, and then grabs appropriate context given the situation.


look up "frames", it's an old concept and also influenced OOP.


The animal brain equivalent isn't summarize a context window to account for limited working memory. It's to never leave training mode to go into inference-only mode. The learned models in animal brains never stop learning.

There is nothing stopping someone from keeping an LLM in online-training mode forever. We don't do that because it's economically infeasible, not because it wouldn't work.


Putting too much information in the context window is counter-productive in my experience. Low signal/noise ratio tends to increate the likelihood of model hallucinations, and we don't want that!

What works in my experience - structuring the task similar to a human-driven workflow, breaking it down into small steps is needed. Each step could be driven by a small prompt, relevant document fragments (if RAG is used) and condensed essays/tutorials/guides that were written by a powerful LLM (ideally, GPT-4 pre-Turbo).

Using this approach, you could stay well below 8k token limit even on the most demanding tasks.

(Big size contexts are leaky on all LLMs anyway)


What about some generation-augmented retrieval augmented generation set-up where all your conversations are indexed for regular text search, and then you use the LLMs language knowledge to generate relevant search phrases the results of which are included in the current prompt?


I would imagine that daily "training" here involves something more like RLHF than just appending to a big prompt.


I think you'll need to save good responses (and bad responses that you fixed?) and regularly run more training passes.


Yeah, especially with a large knowledge base I find it important to keep a log of prompts/responses and perform team reviews of both. It’s honestly making more work than it’s saving at the moment with the hope that it’ll be more helpful down the road. On the plus side it’s made the team more interested in tasks around technical documentation and marketing material, so still a win!


The solution is RAG


Funny but also true in real life :-(

I start to feel like a one eye king under blind people.

I even remember sometimes when I told people specific things.


Is this insightful? I feel like this is trying to help me be a better founder but I don’t know what to do with this advice.

Yes, I can be successful personally without social media. Can I successfully launch and market a product without it though?


I agree with you. It depends on which area you want to be professional on.


I will always miss those I’ve lost to benzos. Some of my favourite people that I’ll never again share a new memory with.


Heh, funny you say that. Schemaverse was built for my masters thesis which explored if the application layer and data layer could be successfully merged in a way that improved data consistency and integrity, and thus arguably improved security. I'll save you the 90 page read but short answer yes, but scaling becomes an absolute nightmare.


You're welcome! Feel free to drop by #Schemaverse on Libera.Chat if you have any questions.


One of the players dockerized it all too https://github.com/frozenfoxx/docker-schemaverse

I highly recommend running your own instance for battles between classmates or employees.


The very tiny server is getting a bit crushed at the moment. If you actually want to try it out, check the tutorial tomorrow when the traffic goes back to normal.

https://schemaverse.com/tutorial/tutorial.php


I encountered the same error message, yet it seemed to be from a broken link instead. From the homepage [0] while signed in, I clicked on "How to play" right under the query entry box. It sent me to this page, which is a different url from the one you linked: https://wiki.github.com/Abstrct/Schemaverse/how-to-play

0: https://schemaverse.com/tw/index.php


Ah, it does look like GitHub changed their wiki url scheme. Thanks for the info!

That should link directly to the tutorial (link above) instead of that wiki anyways


Heh, ok did not expect to see my old project sitting on the front page <3 The server is getting gently hugged to death atm but I'll try to keep it responding.

Some added info...

- The entire thing is open source: https://github.com/Abstrct/Schemaverse/

- This was discussed on hacker news 10(!) years ago https://news.ycombinator.com/item?id=3969108

- Some players have written more code to play the game than I wrote to build it

- We have a tournament every year at DEF CON

- Postgres is awesome


> - We have a tournament every year at DEF CON

The same guy pretty much wins every year with minor modifications to his code (okay, except the last two years with the whole covid thing). That said, he's an extremely cool guy and very eager to train his potential competition. Please please please take him up on the offer.

I suggest to anyone to show up to the Defcon Is Cancelled party that he co-organizes.


Yeah, he's pretty unstoppable lately but there have been a couple new players starting to improve a lot too.

I'll happily share his secret. He recognized that it's not about any single metric, it's all about optimizing for the trophies https://github.com/Abstrct/Schemaverse/tree/master/trophies


Heads up, you're still pointing to Freenode for the IRC channel. Possibly you'd want to change that to point at Libera.Chat. The channel already exists at the latter.


Thanks! Totally forgot to update that. You can find us in #schemaverse on Libera.Chat


Thanks for creating this it helped me improve my Postgresql and SQL 8 years ago.


That's awesome! I'm glad to hear it helped. I was personally so sick of every database course I took using the same `department` and `employee` tables, I wanted something fun instead.


Great idea and fyi the how to play link in the game points to a wrong url in your github wiki.


Those should be fixed now. Thanks!


This is a pretty smart purchase, with it MasterCard is getting a fairly mature technology stack (by cryptocurrency standards at least), an incredible amount of data (mostly attribution data relating to addresses and transactions), and a team that’s experienced with the industry (which there is a massive void in skilled workers for).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: