I suspect this is related to why a string quartet is the right number of musical voices. Two violins, viola, and cello give you a very fulfilling number of separate ideas to track without overwhelming you.
I think you're taking the metaphor about a string quartet as a "conversation among equals" too literally.
In terms of perception, I'm not sure there's much of a relationship to a human conversation. To make things equal, the string players would need to take turns soloing while the others wait more or less silently to respond, each with their own solo response. You'd be bored out of your gourd if string quartets were written that way.
But more to the point, the vast majority of time in a string quartet is devoted to two or more of the players producing phrases of music in parallel, and that is musically coherent and pleasing to the players and audience. Most humans cannot track two humans speaking in parallel at all. That alone tells us that music cognition is a very different phenomenon than speech cognition.
In short, I'm not sure why a string quartet would be considered the optimal genre for humans to produce music together. And even if it is, the reasons why are even less likely to do with the protocols around human speech cognition, and certainly not with some bizarre equivalent of the "theory of mind" associated with the musical phrase produced by one of the instruments[1].
1: Small digression-- In Elliott Carter's 2nd String Quartet he actually started with a concept that each instrument was a kind of "character" in a play among the quartet. In this case, the problem with OP's metaphor becomes obvious even in the introduction-- the homogenous timbre of a string quartet makes it difficult to hear the differences among the characters. (IIRC I think even Carter admitted this.)
I recall that Charles Rosen wrote somewhere that one of the reasons the string quartet took off in the classical period was that it allowed the playing of all the notes in a dominant seventh chord without double stops. Although this was probably a better explanation for the relative paucity of string trios in the output of Mozart (1) and Beethoven (0). The establishment of four parts as the "standard" scoring for vocal ensembles can be traced back to the 15th century.
On the other hand the second and more famous dining (and conversation) club founded by Dr Johnson had originally 9 members, and gradually grew from that to dozens. Although many including Johnson may have not been entirely happy with the expansion.
Counterpoint may leave too much implied with only two voices; with four or more voices one must increasingly break or relax various rules that promote voice independence, e.g. the use of parallel motion where additional voices simply double some other line (they can't all be independent, there's too bleeding many of them!), or to drop voices for a thinner texture, for example where there are five instruments but only three or four of them are sounding together most of the time. That's a long way to say that around three to four voices is ideal if you want independent lines (except they're not really independent, like two people shouting past one another; there's a weird mix of both working together while each yet manages to stand out in good counterpoint) though even better than this claim would be to compare, say, Bach's two-part inventions to works that have more voices.
For those who do not know counterpoint, you have only three motions a voice (a horizontal line of music, traditionally sung) can make relative to another voice (move closer, apart, or to hold steady) combined with limited voice ranges (say, a doubling of frequency, or so) and limited interval choices (seven, or so) within an octave or frequency doubling, and the voices are very close to one another but only rarely cross one another, on top of all that various rules systems that forbid or frown on such things as the tritone, parallel fifths, and so on into the weeds such that with more than a few voices you quickly run out of valid options for all the voices to move independently.
Also the traditional barbershop quartet for acapella.
Interestingly, I like the 5-piece versions of all 3 of these: add a keyboardist to the rock band, a piano or harpsichord to the classical string or woodwind quartet, a female vocalist to the acapella group. Having two leads lets you do much more intricate countermelodies and harmonies.
A string quartet consists of 4 tonally adjacent instruments, and is thus much more like 4 humans talking.
A "classical" rock band consists of 4 utterly different instruments from a tonal perspective, and is thus nothing like 4 humans talking. Same thing for jazz - and its why you can have multiple instruments performing simultaneously and in ways that are not obviously connected to each other.
"Music for 18 Musicians" by Steve Reich is probably one of the masterworks of the second half of the 20th century.
Any vaguely disco-adjacent band will have more than 4 people on stage because there will be at least keyboards and horns in addition to drums, bass, guitar and vocals. Even a band that simply adds an additional person player percussion to a typical 4 piece exceeds your limit yet can wonderfully enhance the music.
If you haven't seen any of those bands, then that's your loss, but provides no reason to try to generalize about the right size for a live band.
Vocals are often a person who is also playing an instrument. So in a 4-person band you can have up to four voices, lead and rhythm guitar (or maybe keyboard), drums, and bass.
Edit: I thought you linked to yet another famous band. People keep doing that... 99% of bands a normal person would see in normal life don't even have a wikipedia page.
However your link looks about jazz.
A common rock band is rarely good with more than 4 members because people lose unity and it's just technically harder and people are rarely professionally trained.
Honestly I wouldn't know anything about rock bands because I don't listen to it at all so you could be very right. I just responded because this thread was talking about string/barbershop quartets, which are almost definitely 4 parts because that is the minimum number of people required to make a 7th chord, and then again because you didn't know what a Big Band was, which I suppose is very understandable, but as somebody that grew up around and playing on them it's super foreign to me.