Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: USST – A protocol to reduce LLM context redundancy by 98.5% (gist.github.com)
2 points by mgopanna 22 days ago | hide | past | favorite | 1 comment
I’ve been working on a primitive called User-Segmented Session Tokens (USST).

The Problem: Currently, if a teacher (or lead dev) wants 50 students (or junior devs) to use an LLM with a specific, deep context (e.g., a 50-page curriculum or a complex repo), all 50 users have to re-upload and re-tokenize that context. It’s redundant, expensive, and forces everyone to have a high-tier subscription.

The Solution: USST allows a "Sponsor" (authenticated, paid account) to run a Deep Research session once and mint a signed Context Token. Downstream users (anonymous/free tier) pass this token in their prompt. The provider loads the pre-computed KV cache/context state without re-processing the original tokens.

Decouples payment from utility: Sponsor pays the heavy compute; Users pay the inference. Privacy: Users don't need the Sponsor's credentials, just the token. Efficiency: Removes the "Linear Bleed" of context re-computation.

I wrote up the full architecture and the "why" here: https://medium.com/@madhusudan.gopanna/the-8-6-billion-oppor...

The Protocol Spec / Repo is the main link above.

Would love feedback on the abuse vectors and how this fits with current provider caching (like Anthropic’s prompt caching).



I wanted to share the economic model that drove me to build this. I call it the "Redundancy Tax."

When you look at the hidden costs of "Per-Seat" architecture in an education setting, the numbers get large very quickly. I broke down the cost of redundant context re-processing:

The Baseline:

    Target: ~20M connected classrooms (secondary/tertiary globally).

    Volume: 1,000 high-value interactions per year (a conservative estimate for active AI tutoring).

    The Waste: Re-processing a 35k context window for every single student query instead of reusing the cached state.
The USST Math: By shifting from "Raw Mode" (everyone tokenizes everything) to "USST Mode" (Sponsor tokenizes once, students reuse):

    We see a ~98.5% reduction in incremental token load.

    That saves roughly $0.432 per interaction in compute costs.

    0.432×1,000 interactions×20M classrooms≈$8.6 Billion annually.
The Grid Impact: Beyond the money, this is an infrastructure stability issue. A simultaneous classroom start (e.g., 10:05 AM) currently looks like a 1 Megawatt spike on the grid. With shared context tokens, that drops to a 15 Kilowatt blip (just the inference delta).

We don't need 100x more chips to solve this; we just need a protocol that stops treating every user session as a blank slate.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: