I strongly recommend "A Philosophy of Software Design". It basically boils down ...

narnarpapadaddy · 2025-02-25T03:20:15 1740453615

Implicitly, IIRC, the optimal ratio is 5-20:1. Your interface must cover 5-20 cases for it have value. Any fewer, the additional abstraction is unneeded complexity. Any more, and your abstraction is likely too broad to be useful/understandable. The example he gives specifically was considering the number of subclasses in a hierarchy.

It’s like a secret unlock code for domain modeling. Or deciding how long functions should be (5-20 lines, with exceptions).

I agree, hugely usual principle.

abhis3798 · 2025-02-25T08:47:38 1740473258

This is a good rule of thumb, but what would be a good response to have interfaces because, "what if a new scenario comes up in the future"?

Copenjin · 2025-02-25T09:06:29 1740474389

The scenario NEVER comes up in the future as it was originally expected. You'll end up having to remove and refactor a lot of code. Abstractions are useful only used sparingly and when they don't account for handling something that doesn't even exist yet.

narnarpapadaddy · 2025-02-25T15:46:49 1740498409

When doing the initial design start in the middle of the complexity to abstraction budget. If you have 100 “units of complexity” (lines of code, conditions, states, classes, use cases, whatever) try to find 10 subdivisions of 10 units each. Rarely, you’ll have a one-off. Sometimes, you’ll end up with more than 20 in a group. Mostly, you should have 5-20 groups of 5-20 units.

If you start there, you have room for your abstraction to bend before it becomes too brittle and you need to refactor.

Almost never is an interface worth it for 1 implementation, sometimes for 3, often for 5-20, sometimes for >20.

The trick is recognizing both a “unit of complexity” and how many “units” a given abstraction covers. And, of course, different units might be in tension and you have to make a judgement call. It’s not a silver bullet. Just a useful (for me at least) framing for thinking about how to manage complexity.

d0mine · 2025-02-26T18:44:05 1740595445

Even one use case may be enough e.g., if one class accepts another then a protocol (using Python parlance) SupportsSomething could be used to decouple two classes, to carve out the exact boundary. The protocol may be used for creating a test double (a fake) too.

kragen · 2025-02-25T11:34:40 1740483280

If you own the code base, refactor. It's true that, if you're offering a stable interface to users whose code you can't edit, you need to plan carefully for backward compatibility.

lmm · 2025-02-25T08:59:42 1740473982

"We'll extract interfaces as and when we need them - and when we know what the requirements are we'll be more able to design interfaces that fit them. Extracting them now is premature, unless we really don't have any other feature work to be doing?"

kragen · 2025-02-25T11:32:31 1740483151

Maybe some examples would clarify your intent, because all the candidate interpretations I can think of are absurd.

The sin() function in the C standard library covers 2⁶⁴ cases, because it takes one argument which is, on most platforms, 64 bits. Are you suggesting that it should be separated into 2⁶⁰ separate functions?

If you're saying you should pass in boolean and enum parameters to tell a subroutine or class which of your 5–20 use cases the caller needs? I couldn't disagree more. Make them separate subroutines or classes.

If you have 5–20 lines of code in a subroutine, but no conditionals or possibly-zero-iteration loops, those lines of code are all the same case. The subroutine doesn't run some of them in some cases and others in other cases.

tremon · 2025-02-25T15:22:18 1740496938

That function covers 2⁶⁴ inputs, not cases. It handles only one case: converting an angular value to (half of) a cartesian coordinate.

kragen · 2025-02-25T17:11:55 1740503515

Sounds like you haven't ever tried to implement it. But if the "case" you're thinking of is the "case" narnarpapadaddy was referring to, that takes us to their clause, "Any fewer [cases], the additional abstraction is unneeded complexity." This is obviously absurd when we're talking about the sin() function. Therefore, that can't possibly have been their intended meaning.

tremon · 2025-02-25T20:40:52 1740516052

The alternative and more charitable interpretation, of course, is that a single function like sin() is not what said GP meant when using the word "interface". But hey, don't let me interrupt your tilting at straw men, you're doing a great job.

narnarpapadaddy · 2025-02-26T15:00:04 1740582004

Appreciate the charitable interpretation. Both “complexity“ and “abstraction” take many different forms in software, and exceptions to the rule-of-thumb abound so it’s easy to come up with counter examples. Regardless, thinking in terms of complexity ratios has been a useful perspective for me. :)

IMO, a function _can_ be an interface in the broadest sense of that term. You’re just giving a name to some set of code you’d like to reuse or hide.

narnarpapadaddy · 2025-02-25T14:15:30 1740492930

Think of it more like a “complexity distribution.”

Rarely, a function with a single line or an interface with a single element or a class hierarchy with a single parent and child is useful. Mostly, that abstraction is overhead.

Often, a function with 5-20 lines or an interface 5-20 members or a class hierarchy with 5-20 children is a useful abstraction. That’s the sweet spot between too broad (function “doStuff”) and too narrow (function “callMomOnTheLandLine”).

Sometimes, any of the above with the >20:1 complexity ratio are useful.

It’s not a hard and fast rule. If your complexity ratio falls outside that range, think twice about your abstraction.

narnarpapadaddy · 2025-02-25T14:21:29 1740493289

And with respect to function behavior, I’d view it through the lens of cyclomatic complexity.

Do I need 5-20 non-trivial test cases to cover the range of inputs this function accepts?

If yes, function is probably about the right level of behavioral complexity to add value and not overhead.

If I need only 1 test or if I need 200 tests it’s probably doing too much or too little.

kragen · 2025-02-25T15:31:13 1740497473

That's not what cyclomatic complexity is, and if you think 5–20 test cases is enough for sin(), open(), or Lisp EVAL, you need your head examined.

narnarpapadaddy · 2025-02-25T15:57:15 1740499035

You’re right, I suggested two different dimensions of complexity there as a lens into how much complexity a function contains. But I think the principle holds for either dimension.

I don’t think you need only 20 test cases for open(). Sometimes, more than 20 is valid because you’re saving across some other dimension of complexity. That happens and I don’t dispute it.

But the fact that you need >20 raises the question: is open() a good API?

I’m not making any particular judgment about open(), but what constitutes a good file API is hotly contested. So, for me, that example is validation of the principle: here’s an API that’s behaviorally complex and disputed. That’s exactly what I’m suggesting would happen.

Does that help clarify?

kragen · 2025-02-25T16:55:21 1740502521

Yes, open() is a good API. I can't believe you're asking that question! It's close to the Platonic ideal of a good API; not that it couldn't have been designed better, but almost no interface in the software world comes close to providing as much functionality with as little interface complexity, or serving so many different callers or so many different callees. Maybe TCP/IP, HTTP, JSON, and SQL compete along some of these axes, but not much else.

No, 20 test cases is not enough for open(). It's not even close. There are 36 error cases for open() listed in the Linux man page for it.

What constitutes a good file API is not hotly contested. It was hotly contested 50 years ago; for example, the FCB-based record I/O in CP/M and MS-DOS 1.0, TOPS-20's JFN-based interface, and OS/370's various access methods for datasets were all quite different from open() and from each other. Since about 35 years ago, every new system just copies the Unix API with minor variations. Sometimes they don't use bitwise flags, for example, or their open() reports errors via additional return values or exceptions instead of an invalid file descriptor. Sometimes they have opaque file descriptor objects instead of using integers. Sometimes the filename syntax permits drive letters, stream identifiers, or variables. But nothing looks like the I/O API of Guardian, CP/M, Multics, or VAX/VMS RMS, and for good reason.