Can I ask a stupid question about this? I’ve always wondered what this quote really means. It seems obvious that the author deems “cache invalidation” to be badly or awkwardly named, but is that really the joke? Isn’t cache invalidation a pretty straightforward term? Maybe I don’t feel the awkwardness as much as a native English speaker? Or is it literally that cache invalidation is hard? Isn’t cache invalidation on a completely different level as naming things, both conceptually and in difficulty?
If you're not careful with what you cache, you end up with bugs unless you invalidate the cache carefully. And doing that carefully is in itself surprisingly difficult and error-prone, unless you come up with a comprehensive scheme, at which point you probably no longer have a what we call a cache but more like a secondary index with performance/concurrency problems on it own.
It doesn't seem hard until you've been there and given up.
I think the same is true for naming. If you told someone new at programming that naming is the most difficult part, they'd laugh at you and continue using quickly thought up, confusing names that cause them to introduce bugs because they misunderstand themselves.
> Or is it literally that cache invalidation is hard?
It's literally that cache invalidation is hard. You can also think of "cache invalidation" as a substitute of concurrent programming: keeping multiple related threads of logic synchronised.
>Isn’t cache invalidation on a completely different level as naming things, both conceptually and in difficulty?
I'd say it's on a conceptually different level, but not less difficult, and because "naming things" is easy, it's more insidious.
From my point of view "naming things" is a substitute for architecting software instead of coding your way out of dead ends whilst inventing a lot of off the cuff names in the process. (of course, in the current climate, such ad-hoc solutions are now the standard, called "design patterns", and people find names that have "Factory" in them twice a normal thing.)
No, naming things is really hard. Well, giving things (functions, variables, classes, algorithms) _bad_ names is easy, giving them good names is hard. For instance, when did you last see a class called <something>Manager? Then consider the fact that “manager” means absolutely nothing.
do you know a better name for "window manager"? i don't mean to hold that up as a paragon of great naming, just genuinely curious what you'd call it.
like, i'm no fan of AbstractFactories (or classes for that matter), but i never quite got this sentiment. to me, "manager" suggests that you have a bunch of resources that should be centrally managed (created/freed, whatever), and theres's something (perhaps an object) that handles that.
ofc it's really generic, and a more concrete name should be used if possible, but i wouldn't say it's meaningless
No, it is not intended to be badly or awkwardly named. It really is the case that properly invalidating caches is surprisingly hard to do and is the root cause of many, many bugs. Of course, the term is intended to be taken broadly: "cache invalidation" includes everything from CPU level cash coherence issues with multi-processors to the maintenance of ACID compliance in a database, and probably even includes cases where a mutable variable is reused incorrectly.
I don't think this quote implies that cache invalidation is badly named. It's just one hard thing to do. To me the point of this quote / the joke is the off-by-one error applying to this list itself.
To me, the original quote, "There are 2 hard problems in computer science: cache invalidation and naming things" is intended to be striking by putting naming things at the same level of difficulty as cache invalidation, while naming things might seem an easy problem… at first. The point of the quote is to be a warning on the fact that while cache invalidation is notoriously hard, naming things is hard too.
>To me, the original quote, "There are 2 hard problems in computer science: cache invalidation and naming things" is intended to be striking by putting naming things at the same level of difficulty as cache invalidation, while naming things might seem an easy problem…
Ah, that makes sense to me, thanks. I just figured the difficulty levels the other way around, so the “punchline” couldn’t work. Guess I’m glad the most complicated caches I work with are mostly “if it’s at least this old, refresh it” ;)
"this old" according to what clock? What if the refresh fails or times out? Do you keep the old value there until a refresh succeeds?
etc, etc, etc.
Even that simple example is not remotely easy if "if it's at least this old, refresh it" is an actual requirement. Thankfully most of the time it's not, and the requirement is actually "refresh it every once in a while, about this often, but none of this is particularly important so it doesn't have to always work"
> Or is it literally that cache invalidation is hard?
Yes. It's a problem that repeats across the computing stack, can cause maddening edge-case errors, and doesn't (can't, in the general case) have a clear perfect solution.
Additionally to what others wrote in their replies, I think in a broader sense, "cache invalidation" actually kinda means a fundamental balance you need to decide on when programming, of what data/information you store vs. what you calculate on the fly. Whenever you're storing some results of some calculation, it's de facto caching the calculation.
As to naming things, I seem to feel that good naming tends to go hand in hand with good abstractions, good model of the world; this is not as easy as "copying" relations from the real world to the computer (your stereotypical "cat is-a animal" which may result in surprising problems), but finding models and ideas that are at the same time simple & elastic & robust at representing some core essential concepts of the real world. Sorry I can't give specific examples, I feel those moments are often surprisingly vague and local. Also, I may just be overinterpreting this quote...
Naming things is a true challenge too. Same level of complexity (if not more) as cache invalidation since it deals with psychology of the user/programmer, patterns of thinking, etc.
Both cache invalidation and naming are easy on the surface but surprisingly complicated and prone to error.
Cache invalidation has a risk of both false positives (something got evicted from the cache but shouldn't have) and false negatives (something should have been evicted but was not) and the effects of errors can be invisible, annoying, or disastrous, depending on the situation. Incorrect cache invalidation usually results from incorrect dependency management.
Naming is hard because names have a conflicting combination of requirements: names need to be short (they will be repeated often), expressive, accurate, unchanging, and easy to remember. It's often quite difficult to find a name for a concept that fits all those requirements. My company has spent years finding the right names for certain core concepts, but that is time well spent.
Throwing away cached data may not be hard, but knowing (actually, ‘guessing’ often better describes it) what to throw away is.
“First in, first out” may seem fine at first sight, but if your access pattern is periodic, it may mean you just threw away the data you need, and kept around data you won’t need for 11, 10, 9, 8,… months. Also, why throw away data that’s needed, statistically, once a second, and keep the data that was read in for that one of a time query?
Also, quality of service might affect caching choices. If you need room on a factory floor, you don’t move the fire extinguisher to the back room, even though you know you likely will not use it.
Similarly, you might want to prefer caching data of web pages more likely to be visited, or even that of customers paying more.
Cache invalidation is not about effectively caching immutable values (where FIFO or least recently used may be valid solutions) but about the problem of caching mutable values, the hard part is ensuring correctness without having to clear all caches everywhere whenever something changes.
Properly ensuring that when a value changes (and thus any cached copies become invalid) then every place where that value might be cached properly and timely invalidates [that part] of the cache, because otherwise other parts of the system will see stale/wrong/conflicting data which generally results in 'fun'. And it appears everywhere from memory reads in a single multicore processor (where one core might change a variable that the other core has cached) to globally distributed data storage systems with eventual consistency to state shown on the user's screen as the underlying data is getting mutated by someone/something else.
Caching isn't just hard, it adds complexity for which the costs and flaws are often obscure, especially in separate changing components and over time. Caching facts (events) isn't caching, but a legitimate copy that is forever true.
Naming things is "hard", as names tend to stick forever. Later, names may miss the moving target of recent changes.
Off by 1 errors, refers to the list itself (joke). You either spend extra effort reducing their possibility upfront, or get dragged into hours/days trying to dechiper only to discover it was a off by 1 error. Programming needs to be exact, to be correct, and very few are consistently avoiding such subtle flaws in code logic.
When you have to explain the joke, it's not funny anymore! :D
To be honest with all the tooling around programming languages nowadays I haven’t encountered a bug caused by off-by-one error in years. The other two though ... yeah still true :)
Oh, that's true of course ... I interpreted it as "rule 0xF is the most important" automatically, heh.
Also, I know that hex is cooler, but "0xF" is 33% longer than "15", is it really better? :) Of course in this kind of artistic/prosaic text it doesn't matter, express yourself and so on, but I find it really annoying in code.
Some people think that in C, the integer literal 0xff is more "byte-like" than 255, when they are really exactly the same (both have type 'int'). Pet peeve.
> Some people think that in C, the integer literal 0xff is more "byte-like" than 255, when they are really exactly the same (both have type 'int'). Pet peeve.
You are technically correct.
However the meaning of the number for purposes of bitwise operation or other relationships to binary can often be more _immediately_ obvious when using hex. Certain regularities that are present in base 2 are also present in base 16 since the base is itself a power of 2.
The simplest example is the one you gave 255 = 0xFF = 0b11111111 where the intent is very often a mask. However things get less immediately obvious with larger numbers 0xFFFFFFFF = 4294967295, so it's essentially about clarity, and even for the case of 255 which is obvious to most, it still solidifies the intended meaning.
Obviously in this article it's pure geekery and there is no objective defense, but i love it anyway :P
> Some people think that in C, the integer literal 0xff is more "byte-like" than 255
When I'm doing bitwise arithmetic, or poking hardware registers, I think in hex. 0xff/255 isn't a good example because that's easy offhand but a lot of other values take longer to "parse" in base 10. It depends on the context whether base 10 or 16 literals are easier to read and parse for humans.
I have the same kind of website. I once showed it to an employer and he almost laughed.
It's like the internet has been overtaken by an army of advertisers and wannabees photoshop artists.
I'm curious about UX, because sometimes it's aimed to make consumers feel good, and sometimes it's to be productive. I'm wondering if UX is backed by science/engineering.
> I'm wondering if UX is backed by science/engineering.
Yes it is and has been for a long time. Time and motion studies were taken throughout the 20th century and even in the 19th. (There was a recent HN posting about such a study in aircraft cockpits of the 40s or 50s and the need for adjustable seating, relaying to air crashes). Don Norman published one of the first great such books (Design of everyday things) — check out the nuclear plant photo!
There was a deep study of the efficacy of design in print media which fascinatingly to be settled on different design points in different countries, though the most scientific such studies waited until computers were readily available.
Early GUIs were controversial and UX studies going back to the early 70s by people like Stu Card and by Fitts have been both surprising and influential. Xerox PARC where Card worked even hired anthropologists.
UX researchers are pretty important today. All the “big guys” (FAANG etc) and less well funded companies as well.
Engineering? Well companies spend a lot on their sites and devices. You can decide on that part.
If that is true, one has to imagine that their skills are being employed differently than they used to be. It feels to me like UX used to be about serving the user's interests, which is definitely not something modern UX paradigms seem to concern themselves with.
On web sites the goal is typically to tune the user experience to achieve the company’s goals. Sometimes it may help the company if it takes you longer.
> If that is true, one has to imagine that their skills are being employed differently than they used to be. It feels to me like UX used to be about serving the user's interests, which is definitely not something modern UX paradigms seem to concern themselves with.
Modern UX paradigms are very much "how do we make the user interact with our product in the way that we intend?" (usually the way that goes through as many ad-laden paths as possible).
They just research, but the application of those skills is mainly done for for-profit corporations so no, it is not about serving the user's interests. It is about guiding the users to do the most profitable thing.
Maybe you're thinking more of traditional GUI / UI's? UX is pretty much fad/consultancy/design-driven at this point, though companies also exploit big data analytics in order to annoy the users the most without making them hit "X" (and lately it's anecdotally failing).
That is true for some web sites (perhaps for FB, Youtube and the like Too — feels like it but I don’t know). Activity like you describe is really part of the advertising consultants. But there are a lot of other products beyond web sites out there.
My GF did her PhD work in AI/learning (world looked different in the early 90s) and has been Sr Researcher at Microsoft, Google, LinkedIn, FB, Amazon and didn‘t work on Web sites for any of them. She also worked for some smaller companies, still big, and some of those were web sites but none ad-supported.
The issues ranged from “how do people use this feature” to “how do we find a way for new, different kinds of people to use our product? How can it fit what they want?”
She comes from an era that predates the simplistic approach called "AI" these days. Today it's basically quasi-automated generation of the 80s/90s's Expert Systems decision trees.
A better way to think about it in this context I suppose is that it's all about thinking about the human's cognitive models and how you can 1 - get a handle on what works and 2 - figure out why to be able to reproduce it. A lot of her work for the past couple of years seems to be on the former; a few years ago I would have called her studies "anthropological".
I'm not an expert in this area (though my AI work dates back to that era as well) so am just going on snippets of what she says about her work, none of course which can include anything confidential.
There is a large amount of social engineering exploitation to UX and it comes down to how can I exploit the person viewing the content. The main question to UX design is how can I control the user to tap what I want? And it's all very simple.
The main exploit you want is a reaction. Next time your commuting on public transport if you spot someone looking at social media on their phone watch their body language. As they scroll the endless pages of guff, you can start to learn their body language and how they react. With this you can start to psychoanalyze the user reactions and you can then use this to trigger a person to flip in to a emotional state; allowing you to socially engineer the person especially within a UX model. Media News does this all the time; you can easily cripple a person with just two sentences of text and a picture. It's why a good troll is so effective.
If you spark a response, you've got their attention as exactly the downvote I received is. Another example of a human exploit in the world we live in. With that downvote it can flip the persons emotional state where the feeling is of "oh,no" because my "value of 69" is now "68". A pointless number, an pathos exploit which can be manipulated by bots.
A lot of science and engineering goes into UX. Unfortunately, the actual goals of the company might not be aligned with good UX for the end users, and other pressures (design team, marketing team, time/money etc) often drives important parts of UX away.
Increasing user productivity and usability by any % doesn't directly translate to revenue. Steering users to premium features or ad click-throughs does directly translate to revenue.
A thing about these fast, static and minimalist websites (which I love) is that they use a monospaced font (which I don't love) instead of proportional, which I find easier to read. See http://antirez.com/ as another ecxample
It's not a criticism but an observation that giving up on so much modern junk seems to mean discarding something I consider good; better fonts.
I spent a lot of time thinking about this when I redesigned my own website. I really like the narrowness of Iosevka, and couldn't find a good proportional font to match its look and feel. So I just used Iosevka for the body text too. (I am currently experimenting with a quasi-proportional variant, Iosevka Aile, but I am not quite sure how I feel about it.) It looks weird, especially with justification, but I think I'll probably keep it.
I also like monospaced fonts, but using them for articles is a bit unusual. Justifying monospaced fonts seems wrong on some levels, but who am I to judge.
In my browser only the spaces between words get stretched, spaces between letters are not affected.
If you care about your readers, justifying any font is wrong. Justification makes the text “look nice” but harder to read; it should be avoided as much as possible.
Does it make the text harder to read? A nice side effect of justifying text is that it makes easier to determine when a paragraph ends (or a line break occurs), especially if there's no vertical spacing between paragraphs.
But justification needs to be paired with word breaks, otherwise lines with long words will look unnatural.
Most (all?) books I have at home follow these two rules, and if they didn't they would look amateurish to me. Of course the web is a different beast, but if the line length is reasonable, I think the same rules can apply.
No, it’s all automated. The open source world knows of TeX but there’s a plethora of proprietary justification algorithms used in commercial typesetting.
This is one of the first things you learn in any Web accessibility class:
"Centered or justified longer pieces of text can be hard to read as well. Justified text adds space in between words that can cause rivers of white space through lines making reading difficult for some users with dyslexia. If hyphenation is supported this can reduce this effect but hyphenated words can be a barrier for many readers."
https://www.w3.org/WAI/tutorials/page-structure/styling/#tex...
"Sometimes full justification makes reading more difficult because extra space between words causes “rivers of white” making it difficult to track along a line of text, or less space between words makes it difficult to distinguish separate words."
https://w3c.github.io/low-vision-a11y-tf/requirements.html#j...
"Many people with cognitive disabilities have a great deal of trouble with blocks of text that are justified (aligned to both the left and the right margins). The spaces between words create "rivers of white" running down the page, which can make the text difficult for some people to read. This failure describes situations where this confusing text layout occurs. The best way to avoid this problem is not to create text layout that is fully justified."
https://www.w3.org/TR/WCAG20-TECHS/G169.html
"Fully justifying text can also present problems for people with dyslexia, where the large uneven spaces between words and sometimes letters within words can create what’s been termed “rivers of white” that run down the page and also make the line of print hard to follow. Readers find it more difficult to find the end of sentences and can repeatedly lose their place."
http://mediaaccess.org.au/accessibledocumentservice/2015/08/...
"A lot of people seem to love justified text, arguing that it contributes to the feeling of a more consistent page layout. From a strictly accessibility-focused perspective however, justified text creates large uneven spaces between letters and words that make reading a little more difficult for all users, and even more so for users with dyslexia."
https://dboudreau.tumblr.com/post/84344543792/avoid-justifie...
I have in the past built websites where I assumed that Part A would be the most popular part and so put a lot of effort and work into Part A. Less effort was put into Parts B, C, D etc.
When I launched it and let it run for a while, it turned out - thanks to analytics - that Part B was the run-away success that was getting lots of search-engine traffic and 90% of the visits to the site.
Had I not had analytics then I'd not have known that and would not know which parts of the site people valued. As a result I put the larger part of my focus into Part B instead.
It allows optimizing many things such as the UX, the speed and size of your product, and your own time.
Examples:
* If you see that a button is used 10x than an other button, you can re-order buttons.
* You can remove buttons that are never used (=faster load times, less bytes)
* You can drop features that are never used, focus on features that are often used, saving your own time.
The other approach is "I already know what is best, I don't care about how other thinks". This is also useful because it can allow you to break out of local optimum.
Improve conversion rates.
Create a better user experience for all the users.
Understand which traffic sources are valuable so you can take better business decisions.
Know if something is broken on your site.
Are you implying that all analytics used in the world are implemented just to boost the site owner's ego?
> Are you implying that all analytics used in the world are implemented just to boost the site owner's ego?
The context of this comment is personal blogging. I’m implying that analytics used on personal blogs are implemented just to boost the site owner's ego, or at least I can’t find any other reason.
Fair point, I don't see the use of in-depth analytics for personal blogs either, maybe having the number of readers is helpful to know if your articles are read by anyone or not.
I dont like the vertical spacing of the font. While it makes the page shorter, it's bad for reading. Getting a lot of text on one page is good for programming (getting a overview, quick access of areas), but for reading it's generally worse (sequential action).
I liked Fabien's old site. With the images that faded in when you moused over them, to give you some idea of what a given article was about before you clicked on it. I don't remember it as being particularly slow to load, though I wasn't timing it or anything.
Not a big deal, I doubt I'm conscious of the design of a blog for more than a second or two after I load it, but not sure the change is an unalloyed good.
I really enjoy Fabien's articles - really great fascinating stuff.
I have to wonder about the choice of custom PHP to generate the pages, especially if you are going to start from scratch when the intention is drastic simplicity?
I can understand the rationale of picking PHP many years ago to hand-roll something, but these days there are a lot of excellent static site generators which might be a better choice? E.g. Jekyll is very simple and flexible in my experience, and you can hand-roll some HTML + CSS and get something looking and behaving (e.g. URL structure) how you like very easily. I have had less success with Hugo (in terms of URL structure etc), but I know a lot of people prefer it to Jekyll for its golang-based compile performance.
Am I missing something for the rationale for PHP here?
It would seem that most of the rules here are editorial/design decisions rather than technical restrictions really. I guess perhaps there is the legacy issue of loads of old articles that are written in PHP that would need to be migrated, but that sounds like something that can be trivially automated and would only need to be done once.
Not sure about the OP, but I have a small devlog which is basically just a Bash script which I run whenever I need to write a new post. The script takes a bunch of markdown files in a folder and spits out HTML files (common index.html + files of each individual post) with a header/footer. This HTML is then served using a web server.
I just tried visiting the Jekyll website to see how to get started, the documentation page has the following under the Getting Started section --
1) Install Ruby, Ruby Gems.
2) Understand what Gem, Gemfiles, Bundler(?) are.
3) Tells me about it's community for some reason.
4) Finally there's a page which tells me how to make a 'Hello World' blog.
5) 10 more pages explaining what Liquid, layouts, includes, datafiles, assets, deployment are.
6) I still don't know how to make a new post.
Most static site generators have too much of cognitive load. I just want a bunch of plain HTML files generated (ideally in <1s + no JS) from a bunch of markdown files, I don't really care for 99% of the features these things have. I would rather spend a few hours a write a simple script to do it myself instead of wading through the documentation looking for the relevant parts.
I left Octopress (a flavor jekyll) for those reasons[1]. Pelican[2] is what I use now. It takes a dir of markdown files and makes a decent website. It was even easy for me to write a no-nonsense theme without being good at python.
My website [0] exists by the same rules. Only differences are that I use a Makefile and Bash scripts relying on the GNU coreutils and xml2/2xml [1] to generate static pages from my template, and I use git to synchronize it with my server. Oh, and I write these in Emacs :).
I actually rather miss the commenting on Fabien's site. IIRC, he initially kept the comments underneath the older articles, but now those are gone too. On his site at least, the comments were frequently quite insightful or at least asking interesting questions.
Yes, there's also spam and trolling in comments, but eliminating them entirely feels like a waste and cuts off an important way of interacting with the readers. At least personally, I'm far more likely to ask a question or to point out a mistake in a comment than I am to send off an e-mail.
> Legal name, or online identity? I don't feel comfortable revealing my legal name online.
Name that a reader can recognize as something that identifies you to the collective. There's a 0% chance that he cares what your parents wrote on a paper at the hospital.
His contrast is too low even for my normal eyes. Did not notice that initially though, until I turned off Dark Reader extension to view website colors as designed.