I have gotten in arguments with people who over-deploy Redis. Redis is cool, I d...

ohgr · 2025-03-08T19:18:38 1741461518

My trick is saying no to redis full stop. Every project where it was used as a cache only it developed retention ans backup requirements and every project where it was a key value store someone built a relational database on top of it.

There’s nothing worse than when someone does the latter. I had to write a tool to remove deletes from the AOF log because someone fucked up ordering of operations big time trying to pretend they had proper transactions.

ysavir · 2025-03-08T20:34:56 1741466096

I love Redis, but my rule is that we should be able to flush the redis data at any time without any problems. Any code that makes that unfeasible is rejected.

vrosas · 2025-03-08T23:14:11 1741475651

I've never done it IRL but I've always wanted to delete my company's redis instances and see what happens, chaos monkey style. If you're service breaks because it expected the cache to be there or your database immediately goes down because of too many requests, you're going to have a bad time _eventually_.

Salgat · 2025-03-09T06:58:39 1741503519

Redis does support persistence, so there are valid use cases where you expect the data to be around.

edoceo · 2025-03-09T01:51:15 1741485075

This is something one could/should simulate in test.

zombiwoof · 2025-03-09T03:39:03 1741491543

Yes, this design rule it’s very useful

Delomomonl · 2025-03-08T21:27:24 1741469244

I don't get it

I'm using redis only for temp state data like a session (when I can't use a jwt).

Or when I have to scale and need a warmed up cache

Is that bad now?

I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.

lucb1e · 2025-03-09T04:20:44 1741494044

If sessions die when your system reboots, that means you can't reboot the system (update the service) without breaking whatever any users were currently doing on your site or in your software. That does sound bad to me and like a bad fit for Redis the memory cache. (I know it can do persistence optionally but that's what the person above you was complaining about: this is not what it's good at)

Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs), or the default thingy that comes with your framework or programming language? This is built into everything I've ever used, no need to reinvent session storage or overengineer the situation with jwt or some other distributed cryptographic system and key management

citrin_ru · 2025-03-11T08:50:49 1741683049

> Why not use a regular database for this (can be as simple as an sqlite file, depending on your needs)

A lot of depends on the scale and load pattern (e. g. ratio of active and inactive sessions). For a small scale sqlite could be a good choice.

Storing session in a regular DB (say Postgres) could be more expensive (hardware wise) than in Redis and there are cases when the load is high enough to matter but the budged is not unlimited (to use a DB at any cost). Also redundancy with a Redis cluster is easier than with Postgres. I don't think Redis always better, but at some load patterns it is.

> or the default thingy that comes with your framework or programming language?

Default PHP session store is files in /tmp - works for a home page but if load is high it explodes (millions files in /tmp is too slow to work with).

physicsguy · 2025-03-09T06:41:56 1741502516

> This is built into everything I've ever used

Ah but in trendy microservices world, it isn’t in many micro frameworks, you have to reinvent it

lucb1e · 2025-03-10T17:13:56 1741626836

I didn't know what you meant so I looked up micro frameworks. Wikipedia has a page named Microframework and lists 23 examples. I don't have time to dive into each of them and most items aren't links (so not sure how relevant they are), but

- I know Flask and it has sessions

- It also lists three frameworks for PHP, which has sessions built into the language (session_start() is what I use in any project that needs a session system)

- Expressjs is one of the few others with a Wikipedia page. Looking into that, it says it requires some middleware for having sessions, which seems not only well-supported, but there is also an include from the authors of Expressjs themselves called expressjs-sessions. It's technically not in the framework, but the authors provide it and clearly keep it in mind when developing the framework so you don't have to DIY that

I can't conclude this isn't a common feature in microframeworks :p

physicsguy · 2025-03-12T15:33:10 1741793590

Most of the options you're talking about are client side sessions and even then are limited. That's certainly the case in Flask, FastAPI, Starlette.

Compare that to say, Django, Laravel, etc.

jiggawatts · 2025-03-09T03:52:04 1741492324

> I'm also wondering right now why there is no local cache with p2p self discovery and sync. Should be easier than deploying an extra piece of software.

The whole design space for this type of API is weirdly under-explored, but there are some well-supported mainstream solutions out there.

Fundamentally, Redis ought to be a NuGet library, a Rust crate, or something like it. It's just a distributed hash table, putting it onto its own servers is a bit bizarre if the only need is caching.

Microsoft's Service Fabric platform and the Orleans library both implement distributed hash tables as fundamental building blocks. Both can trivially be used "just" as a cache to replace Redis, and both support a relatively rich set of features if you need more advanced capabilities.

Of course, there's Scala's Akka and the Akka.NET port also.

eitland · 2025-03-09T07:18:33 1741504713

I wonder if you think about (things like) Hazelcast?

It is JVM based "shared cache" so can be used to transparently share results of expensive queries - but also to share sessions. It mostly just works but the free version have some issues when one upgrade data models.

I know half the people here probably loathe JVM but once one is aware of one implementation I guess it should be possible to find similar things for .Net and maybe also go and Python.

neonsunset · 2025-03-09T05:47:25 1741499245

I think you could make Garnet work as a library. Or, at the very least, use FASTER/Tsavorite KV for that instead.

jiggawatts · 2025-03-09T08:56:20 1741510580

Garnet, like Redis, is explicitly designed to be remotely accessed over the network, which is frankly disappointing and derivative.

Microsoft could do better than that!

For example, Azure App Service could use an out-of-process shared cache feature so that web apps could have local low-latency caches that survive app restarts.

neonsunset · 2025-03-09T15:44:22 1741535062

> Garnet, like Redis, is explicitly designed to be remotely accessed over the network

I know, but it is written in a sane language so my suggestion was that you can literally reference a project and make it into an embedded database. But then again, I would've tried Tsavorite/FASTER KV first.

fabian2k · 2025-03-08T19:17:55 1741461475

I prefer caching in memory, but a major limitation once you have more than one process is invalidation. It's really only easy to stuff you can cache and just expire on time, not if you need to invalidate it. At that point you need to communicate between your processes (or all of them need to listen to the DB for events).

tombert · 2025-03-08T19:41:47 1741462907

Yeah, if you need to do things across processes then something like Redis or memcached might be necessary.

The thing that bothers me is people adding it in places that don't make sense; I mentioned in a sibling thread that the I've seen people use it as a glorified global variable in stuff like Kafka streaming. Kafka's stuff is already partitioned, you likely don't gain anything from Redis compared to just keeping a local map, and at that point you can just use a Guava Cache and let it handle invalidation in-process.

koolba · 2025-03-08T19:57:06 1741463826

Not just across concurrent processes, but also serial ones. Externalizing a cache into something like Redis lets you bounce your process with no reload time. You can get around it for some things like web sessions with a signed cookie, but that opens up expiration and invalidation issue.

But that doesn’t work for caching non trivial calculations or intermediate state. There’s a sweet spot for transitory persistence.

blazing234 · 2025-03-09T01:11:30 1741482690

I think the crazy thing is people think redis is the only thing that catches in memory.

You could throw a bunch of your production data in SSAS tabular and there you go you have an in memory cache. I've actually deployed that as a solution and the speed is crazy.

Elucalidavah · 2025-03-08T19:26:00 1741461960

> need to listen to the DB for events

You could store the key->version separately, and read the said version. If the cached version is lower, it's a cache miss.

Of course, evicting something from cache (due to memory constraints) is a bit harder (or less efficient) in such setup.

Seattle3503 · 2025-03-09T03:19:29 1741490369

I wonder if there are language neutral alternatives to Infinispan.

ozim · 2025-03-08T18:43:12 1741459392

I’ve seen the same, like when I just mentioned caching a team mate would hear „implement redis”.

Then I would have to explain „no, we have caching stuff ‚in process’, just use that, our app will use more RAM but that’s what we need„.

vrosas · 2025-03-08T23:18:58 1741475938

I'm a fan of memcache specifically because ALL it can do is be a cache. No one can come in later and add a distributed queue to it. In-memory caching is also underrated, I agree. Using a hashmap and a minuscule TTL (like 5 seconds) can have huge performance benefits depending on your traffic, and it takes like 5 minutes to code up.

evil-olive · 2025-03-08T20:46:50 1741466810

an antipattern I've observed when giving system design interviews is that a lot of people, when faced with a performance problem, will throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not.

for example, if the problem we're talking about is related to slow _writes_, not slow reads, the typical usage of a cache isn't going to help you at all. implementing write-through caching is certainly possible, but has additional pitfalls related to things like transactional integrity between your cache and your authoritative data store.

GaryNumanVevo · 2025-03-08T21:08:03 1741468083

It's a super common "new to SRE" behavior to overindex on caching as a silver bullet, especially because literally every DB has mechanisms to scale reads fairly easily. In my experience, redis is often needed when you have a DB team that doesn't want to put in the effort to scale reads

sgarland · 2025-03-08T22:23:47 1741472627

Or when the devs don’t want to rewrite their schemata in a way that would massively reduce I/O requirements.

Then when you lose a cache node, the DB gets slammed and falls over, because when the DB team implemented service-based rate-limiting, the teams cried that they were violating their SLOs so the rate limits were bumped waaaay up.

lucb1e · 2025-03-09T04:41:08 1741495268

> throw out "we should add a caching layer" as their first instinct, without considering whether it's really appropriate or not

Could be worse: you could have met me! I used to laugh at caching and thought that if your website is so slow that you need a caching layer (Wordpress comes to mind), you're just doing it wrong: perhaps you're missing indexes on your database or you simply can't code properly and made it more complex than necessary (I was young, once). Most of my projects are PHP scripts invoked by Apache, so they have no state and compute everything fresh. This is fine (think <30ms typical page generation time) for 95% of the types of things I make, but in more recent years I had two projects where I really struggled with that non-pragmatic mentality and spent long hours experimenting with different writing strategies (so data wouldn't change as often and MariaDB's built-in optimizations better), indexes on low-cardinality columns, indexes on combined columns in specific orders, documenting with each query which index it requires and maps to, optimizing the query itself of course, in one experiment writing my own on-disk index file to search through some gigabytes of data much faster than the database seemed to be able to do for geospatial information, upgraded the physical hardware from HDD to SSD...

Long story short, I now run Redis and the website is no longer primarily bound by computation power but, instead, roughly equally by bandwidth

I'm still very wary of introducing Redis to projects lest I doom them: it'll inevitably outgrow RAM if I indiscriminately stick things in there, which means turning them off (so far, nearly no links or tools on my website ever turned 404 because they're all on a "keep it simple" WAMP/LAMP stack that can do its thing for many years, perhaps search-and-replacing something like mysql_query() with mysqli->query() every five years but that's about the extent of the maintenance)

So anyway I think we're in agreement about "apply where appropriate" but figured I'd share the counter-example of how one can also be counterproductive in the other direction and that there is something to be said for the pragmatic people that consider/try a cache, which often does help even if there's often a different underlying problem and my perfectionism wouldn't like it

JamesSwift · 2025-03-11T19:53:03 1741722783

I appreciate that you came around, but I think its important to highlight this common misunderstanding of the role of caching in achieving scaling beyond certain hard boundaries. You previously thought caching was bad, and instead should look at your indexes.... which are just a different kind of cache! I see this disconnect a lot, especially when in ruby/rails world when people think that the bottleneck is going to be the framework, while ignoring that the actual solution to scale is caching in various forms.

lucb1e · 2025-03-11T22:06:24 1741730784

Hmm, I was thinking about that while writing the post but I'm not sure that a data structure is the same as a cache, at least not in the standard meaning of the word where you save a pre-rendered version of something like a website (like what Wordpress needs), or store data closer to where you need it (like a CDN). Some databases, I think SQLite is one of them, don't even have a plain format but store all data in the b-tree.

I guess that, to me, a cache must always be redundant: you can delete its contents and lose no data, i.e., start again 'cold' and get back to the warm state algorithmically. That would make the sqlite thing (if I remember it correctly) definitely not a cache to me because the tree == the data. If I add a secondary btree in a database (also in sqlite), however, then that could be emptied with no loss of the original data, and I guess I can see the argument that it prepares the data for quicker consumption (if I'm wording that right) and so it's kind of a cache? Not sure about it though :D but I see what you mean!

JamesSwift · 2025-03-12T13:06:22 1741784782

A traditional cache is one form of the cache I'm referring to. What I'm arguing, is any form of "cheating" an algorithm is caching. In this case, the base case is full-table scan. If I "cheat" and pre-calculate specific queries and store that ahead of time, thats a cache.

re-thc · 2025-03-09T03:34:13 1741491253

> an antipattern I've observed when giving system design interviews is that

It's an interview though. Most people just watch youtube videos and "copy and paste" the answer.

In a way it's the format of the interview that's the problem. Similar to leet code style interviews a lot of the times we're not checking for what we need.

evil-olive · 2025-03-09T16:46:23 1741538783

> Similar to leet code style interviews a lot of the times we're not checking for what we need.

right, all interview formats are imperfect...but some are more or less imperfect than others.

a crucial difference in my mind, is that leetcode-on-the-whiteboard style interviews correspond quite poorly to the actual day-to-day job of coding.

a well-prepared system design question, on the other hand, does correspond fairly well to part of the actual job - we have an existing system, it has a performance problem, or needs a new feature we didn't anticipate originally that requires the design to be reworked in some way. and we're sitting in a meeting trying to come up with what we think will be the Least Bad option.

(importantly, the question I like to use is not a greenfield "design Tinder for dogs" / "design Youtube for cats" type of question, because as you say those can be reduced to a formula that candidates can regurgitate, instead I'm intentionally asking a question about a brownfield system that I summarize for them first)

ultimately, that's what I'm probing for with that interview style - I don't particularly care whether a candidate arrives at some "right answer" or not, I'm looking for "do I want to sit in a meeting with you and try to hammer out a design to some non-trivial problem that you and me and other people on the team will then go and implement?"

re-thc · 2025-03-10T08:19:27 1741594767

> a crucial difference in my mind > a well-prepared system design question > importantly, the question I like to use

i.e. you're just saying your version is better (than the general 1). Even if I agree with you, that's not the point. The point is that the water is polluted in >80% of the places so anyone coming to drink water will be weary regardless.

> reduced to a formula that candidates can regurgitate > instead I'm intentionally asking a question about a brownfield system that I summarize for them first

You'll get the filtered version even if you claim your water is clean. The problem isn't the question.

> do I want to sit in a meeting with you and try to hammer out a design

i.e. it's not a technical skills test but a behavioral or personal preference interview. As above but to make it 200% clear I've never done ANY real world system design like I would in ANY interview. So you're not likely going to get this outcome either.

In the real world someone is going to get this task e.g. a principal engineer and they're going to come up with some draft (maybe ask for help for a bit) and then hold a meeting to discuss / refine it. No 1 is creating these diagrams live with other people unless there's some place with enough "architect-level" engineers that have nothing to do. Furthermore, it'd be really expensive if all the stakeholders are present. What you do get is the principal filling different gaps based on different discussions potentially over a week or so.

The discussion potentially happens over many short segments as well, e.g. "should we add a cache (to the performance engineer)" ... (2 hours later) ... "I think we need a WAF, thoughts? (to the security engineer)".

In conclusion you'd not want to sit in a meeting with me and try to hammer out a design because:

- I'm trying to force myself to do something I don't do (and no 1 does) and so it's not the real me

- I'm under pressure from the interview and the broken situation so behaving differently

- I've had so little time to consider your "unique Brownfield" scenario that I'm always going to go with safe options instead of more novel or closer to my personality approaches i.e. again not me

Too · 2025-03-09T07:27:45 1741505265

Disagree on this one. In an interview there is no "the answer", it's a dialogue. I've interviewed a lot of people, often using performance related questions, and trust me, there are lots of candidates whose only answer to those is "add a cache", even after multiple follow-up questions or hints like "is there anything else that can be done?", "try thinking outside the box", "what can be done with the database itself", etc. Only a novice interviewer will be fooled by the first answer. If you cannot demonstrate more solutions after that, it shows that you clearly have no experience or problem-solving ability, which is the whole point of the interview to find out, not whether you have studied through a set of common questions.

btw, "scale up" is the second most common answer from those who can't provide better solutions. :)

re-thc · 2025-03-09T08:08:03 1741507683

> and trust me, there are lots of candidates whose only answer to those is "add a cache", even after multiple follow-up questions or hints

My point isn't that the interview can't weed out bad candidates. That's in a way the easy part. The problem is it can't identify not-bad candidates.

The interview is broken because of how standardized it is. It's like a certain game genre and most people will play it the same way. It's more like a memory test.

> In an interview there is no "the answer", it's a dialogue.

It pretends to be or you assume it is. There are numerous 'tutorials' / videos / guides on system design it's >90% rehearsed. So again, my point is the interviewee is trained and will give you the standard answer even if you deviate some. There are just too many risks otherwise. If I had a more novel approach I'd risk the interviewer not understanding or taking longer than the allocated time to finish.

Especially in big tech - interviewers are trained to look for "signals" and not whether you're good or bad. They need to tick certain boxes. Even if you have a "better" answer if it's outside the box it fails.

dcow · 2025-03-09T04:08:24 1741493304

At this point, what format of interview isn’t a problem?

re-thc · 2025-03-10T08:29:20 1741595360

> At this point, what format of interview isn’t a problem?

The original 1.

Is a surgeon going to have a surgery test before they get hired?

Is a chef going to live stream cooking a dish and then for the interviewer to virtually "taste" it before deciding on a hire?

dcow · 2025-03-10T13:56:04 1741614964

Chefs actually are subject to being tasted before hire. It doesn’t happen in a 1hr interview, no. But the owners of a restaurant either know the chef’s cooking from previous experience or invote them in to cook.

Doctors go through excessive amounts if additional schooling and board certification to prove they know how to e.g. cut open a body.

Maybe you’re asking for the software industry to become more formalized in skillset requirements? Or maybe engineers should bring their portfolio instead?

re-thc · 2025-03-10T14:48:11 1741618091

> Chefs actually are subject to being tasted before hire. It doesn’t happen in a 1hr interview

Exactly. What you're describing is more closer to doing a take home and then explaining it.

> Doctors go through excessive amounts if additional schooling and board certification to prove they know how to e.g. cut open a body.

So not part of the interview? It's not like there aren't certification in the industry, e.g. AWS, Microsoft, Java, etc etc.

> Maybe you’re asking for the software industry to become more formalized in skillset requirements? Or maybe engineers should bring their portfolio instead?

I doubt "formalization" helps (see above, some of it exists and those are off the mark just as well). In essence leet code is the "formalized" requirement. It's just a bad 1.

I'm asking to come back to common sense.

Again, doctors don't cut open a body live as part of the interview. So even if I have my AWS certifications (equivalent of your example) the interviewer will and still does ask me to do a live system design interview. How is that the same?

JamesSwift · 2025-03-11T14:36:17 1741703777

Well, thats probably because caching is generally the answer to all scaling problems. Once you hit the theoretical wall of performance, all you can do is cheat. And that generally means caching.

hajimuz · 2025-03-08T18:35:32 1741458932

In most cases It’s not about the speed, it’s about data sharing for containers or distributed systems. Filesystem or in-memory doesn’t work. I agree that in most cases a normal database is enough though.

tombert · 2025-03-08T18:40:18 1741459218

Yeah I mentioned that, if you need to share stuff between processes or differnet nodes, then maybe Redis might be a fit.

But I've seen people use Redis as a glorified "global variable" for stuff like Kafka streaming. The data is already partitioned, it's not going to be used across multiple nodes, and now you've introduced another service to look at and made everything slower because of the network. A global hashmap (or cache library, like previously mentioned) would do the job faster, with less overhead, and the code would be simpler.

Salgat · 2025-03-09T06:52:51 1741503171

We use an event database (think Kafka) as our source of truth and we've largely shifted away from redis and elasticsearch in favor of local in-memory singletons. These get pretty big too, up to 6GB in some cases for a single mapping. Since it's all event based data, we can serialize the entire thing to json asynchronously along with the stream event numbers specific to that state and save the file to s3. On startup we can restore the state for all instances and catchup on the remaining few events. The best part is that the devs love being able to just use LINQ on all their "database" queries. We do however have to sometimes write these mappings to be lean to fit in memory for tens of millions of entries, such as only one property we use for a query, then we do a GET on the full object in elasticsearch.

slt2021 · 2025-03-08T20:01:05 1741464065

redis is needed to share data with other microservices, that are possibly written in different language.

polyglot teams when you have big data pipeline running in java, but need to share data with node/python written services.

if you dont have multiple isolated micro services, then redis is not needed