Hacker News new | past | comments | ask | show | jobs | submit login
Most books don't sell only a dozen copies (countercraft.substack.com)
224 points by herbertl on Sept 10, 2022 | hide | past | favorite | 133 comments



Pretty long post for "I know that I know nothing".

However, there is gold in the comment section: Kristen McLean actually throws some numbers at us [1]. "66% of those books from the top 10 publishers sold less than 1,000 copies over 52 weeks". Well, uh, that's what I thought. Interesting nonetheless.

[1] https://countercraft.substack.com/p/no-most-books-dont-sell-...


I know that I know nothing is a very important realization, though, and if a lot of words helps someone else realize that they know nothing, too, it's probably worth it.


Agreed! Now I have that Operation Ivy ear worm stuck in my head.

https://youtu.be/hOqvWiDKv_w

Listen at your own risk.


Great quote. Huge band!


you da real MVP


Sure as hell isn’t for me.


Sure, but don’t click the publish button. There is already so much noise out there that we shouldn’t actively drown out people who actually do know something.


Completely disagree. There is value in someone laying out the facts and saying "What does this mean? No idea!" Second, these post are usually a vehicle for someone who actually knows something as seen in the comments of this article. Finally, it's their slice of the internet and they're free to publish whatever they feel.


So many people think they know something wrongly. It’s a breath of fresh air when someone doesn’t pretend they do.


Actually I think the post highlights that even the statistic you quoted means less than one thinks it means. For example, some of the books included may have been only a few weeks on the market. Some titles may be niche books with the expected lower sales volume calculated in the price. BookScan only covers a certain percentage of print sales in the US, the total sales could be more than double that.


No it's a very good stat, I think they mean measure after 52 weeks, and note they exclude self published books, it's the crème and still suffers from huge Pareto (as expected)


“data showing at least 1 unit sale over the last 52 weeks coming from publishers of all sizes, including individuals.”

“Collectively, 45,571 unique ISBNs appear for these publishers in our frontlist sales data for the last 52 weeks (thru week ending 8-24-2022).“

Both descriptions mean books sold for 1 of the 52 weeks would be included. You would need to see how quickly sales spike and fall off to see how much those statistics underestimate sales over a books first year. Similarly they cover 16,000 retailers, however it could be some of these books like say textbooks are primarily sold outside of these channels.


I used to work for a big name scientific publisher, and we published a lot of super-niche research monographs that didn't sell many copies. What a lot of people don't understand was that our bread and butter came from university libraries. We published a lot of books not because they sold individually, but libraries wanted them. They like collections!


The comment should have been the post.

And this is the unsurprising TLDR:

"The long and short of it is publishing is very much a gambler's game, and I think that has been clear from the testimony in the DOJ case. It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year."


Maybe this is splitting hairs but this didn't make publishing seem like gambling to me. It sounds like if I can publish enough titles at a low enough cost per title I'm guaranteed to make money.


You're guaranteed to make little money, yes. The big publishers make most of their revenue from a small number of titles that make publishing any other title viable.


My first self-published book sold maybe half a dozen copies, and there even was one refund. The reason is simple. It is a bad book.

The next one sold none. Which is expected since I made it openly available from my site. It hit about half a million downloads in the first year. Alas, most of them were from one IP somewhere in Czechia so I guess someone's spider got stuck in the web. But real people downloaded it too, about 12 000 copies. I even got a positive review from John D. Cook of which I'm extremely proud https://www.johndcook.com/blog/2020/04/24/programming-langua...

I'm now working on Geometry for Programming book for Manning https://www.manning.com/books/geometry-for-programmers. This would be my first collaboration with a publisher and so far it looks like it will become a moderate success. The early access sales are good, the reviews are encouraging. Even though, I would expect first year sales to be in thousands, maybe even low tens if we're lucky but no more. Definitely not enough to live of.

So while the post doesn't reveal any specific statistics, it does agree with my experience. Most books might indeed sell in very low numbers but they are not really published and maybe not even really books. If you're working with a well established publisher, you will get some moderate sales numbers guaranteed, and a even small chance to go big.


interesting. does your book cover geohashing. I've been curious how points and polygons get converted to geohashes and how the search using these work. I currently postgis as a black box but I haven't been able to find any books or articles that really explain in depth how they work.


There's a great blog written by someone who wrote a very efficient geohashing algorithm in go/assembly that may help a little bit with understanding it.

https://mmcloughlin.com/posts/geohash-assembly

The simplified TLDR is:

1. Convert lat and long to something that's represented by bits. 2. Interleave these bits such that lat and long bits alternate in the sequence. 3. Optionally, but seems to be the standard, encode the bits into the custom base32 character set.


thanks for the link. I would pay good money for a book that explains in plain english and a few go/python examples how these are implemented and the theory behind them.



My car broken, so I had to take a long intercity bus trip, for the first time in 15 years. After a short stop in a station, the driver ignited the bus again, but he forgot to turn the internal lights system on - so I couldn't read. I waited a few minutes, but he didn't turn it on, so I had to get up to ask him in the cabin. Coming back, that's when I realized mine was the one and only light on: none of the other 100 passengers had it, so they couldn't possibly be reading, unless electronically. 15 years before, I did remember, it would be at least 3 lights on most of the times, tops 5, but being the only one made me feel lonely.


The overwhelming majority of e-readers nowadays have a backlight, right? And reading "electronically" is still reading. But I guess I see your point: reading "traditionally" isn't something that many people do. Just like sitting to listen to an album from start to finish, which is something I did a lot when a was a teenager.


Seeing e-readers is very rare here in Brazil, and the only people I've seen using them were law students/professionals (to avoid the weight of paperback). Reading books didn't attract much people ever around here, at least from what I know from life experience, but it was more common in the past for sure. I've been to the US/Europe for comparison. In the early 2000's there were big pro-reading campaigns here, but from what I've seen it fostered mostly religious, self-help and harry potter. Personally, what I don't like is the poverty of imagination, and also the false experts all around, citing authors and works they didn't read most of the times (wikipedia is so much easier and faster, you know?) - I call these people wikiexperts or plainly fraudsters.


Audiobooks are the new travel book. I'm old school and like reading my books, but everyone else I know consumes them as audiobooks the majority of the time.


I don't think I've ever met a person who has listened to an audiobook. Not really a thing here, at all, at least for now. Maybe it happens, like the podcast craze is happening right now. (Brazil)


Reading on a bus would make me barf in 5 minutes


That bites. I can read on rough seas.


That’s easier. Problem is when the word you are reading is constantly tossed back and forth. Following that makes your head hurt.

On the sea is low frequency noise it is much easier.


The whole 2% makes 95% of all revenues doesn't just apply to books... it certainly is true for the video game industry, and I suspect, most of the industries being sold online.


On video game sales, I find it interesting how successful games can be while being 90% the same as an existing established game.

I’m not coming at this from some kind of ethics/copyright perspective, I think it’s totally fine to rehash the same idea. I’m just surprised they sell so well when I look and think “why would I buy this, I basically already have it”. The particular example I’m thinking of is a series called Overcooked which is a co op time management game and right now one of the hot sellers on steam is called Plate Up which seems to be basically the same thing at a similar level of polish.

The real learning I guess is to not be discouraged by existing products, because they don’t seem to prevent you from succeeding even when you can’t quite define why yours is better than the rest.


Isn't that pretty similar to how people consume everything else? With books, people tend to pick genres. "Regency era humorous romance novels with a cinnamon role love interest," "pop science economics books," "supernatural murder mystery." Heck, "Adventure about a character reincarnated in a fantasy world with video game mechanics" is still broad enough to be an entire genre (hello out there, all you LitRPG people!). In that context, a second time management cooking game doesn't seem at all odd to me. Certainly it's a breath of fresh air from first person shooter #918738.


To me Plate Up seems probably as similar to Overcooked as Mafia is to GTA (maybe even less similar, actually). Yeah both are sort of about a similar thing with a similar third-person view but after that similarities end. If you don't care about a genre all games in it are the same to you.

Definitely no need to be discouraged if a game you want to make already exists, but "same game with different polish" is not the case with games in question...


I've played Overcooked 1 and 2 with my girlfriend, just played the Plate Up demo with her, and I think you're right. Mario and Sonic games prob look the same to ol' gramps up there, too.

Overcooked has specific arcade levels each with their own infuriating gimmick. You either can beat it or you cant.

Plate Up, despite having similar mechanics (place food onto stove to cook it), is quite different. You have to take orders yourself, clean up after customers, and you win money that you use to upgrade the kitchen. That alone makes it a very different game than Overcooked's arcade levels.

I'll probably buy it.


Plateup at the top level is a simple cooking and upgrading your kitchen game.

It can also quickly turn into a Factorio-type game once you start automating the cooking of the food. Which sounds boring, if all you’re doing is serving, but it then turns into a game almost similar to Guitar Hero once you get into the overtime rounds because you’re having to put into muscle memory all of your actions with higher intensity in the later rounds.

Then there’s the insanity of automating cooking /and/ serving.


Add to that interactivity with Twitch stream viewers who can visit and order and even pay with tangible money, adding another twist and a layer of appeal to viewers and especially streamers. That fundamentally changes the game IMO, turning a game to play with friends into a game to stream and participate in.


> The real learning I guess is to not be discouraged by existing products

This applies to the real world a lot too. If you have a lot of competitors in your space, it's a good sign because the market has already been validated. It's obviously best to be the first in a good market but it's hard to predict whether the market is good when you're the only one playing.

In games I think the principle applies even more because gamers often want to play more of the same and once they finish all the Overcooked content another game that fills that need will work. Plate Up has also gotten popularized by many streamers and YouTubers playing it.


It's funny that your first thought is "why would I buy this, I basically already have it."

I think a much more common reaction people have to titles is "why would I buy this, it isn't like what I already have."


Ofc this depends on the industry, people in infrastructure for example prefer battle tested tools so breaking into the market is harder and requires a significant investment. Same with game engines, people don't write them these days.


Yeah power law distributions are everywhere nowadays. Nassim Taleb talks about this a lot and uses book sales as a reference, which is cheeky since he’s doing it in his own best seller books.


Anyone writing for publishers outside the top 10 (tbh, big 5) and KDP, isn't writing to make money - at least initially. They are writing because they like writing books. The vast majority of authors really just want people to read the book they often spent an enormous amount of time creating.


classic 20-80 rule


but more like 2-95 rule


You get somewhere there if you apply it recursively; "and of the top 20%, 20% have 80% of the 80% sales" and so on.


My father wrote technical books through the 80's and 90's. I remember him being very happy about a book that sold in the low tens of thousands of copies (I'm guessing 20k or so). I know some of them flopped. Even as a child I knew that it did not make sense financially... stay up til 4am writing for a solid year or more (he had a regular job as well), so that you can get maybe 5-10% royalty on something that likely will sell only a few thousand copies on average. The ephemeral natural of computer books makes it even worse.


Having written a bunch of books on Flash and PHP back in the early 2000s, my experience is very much aligned with this.

For a relatively young guy the income was welcome, and after the first book did well I got a pretty decent advance for the subsequent books. They all sold enough to pay back that advance and so I got quarterly royalties on top too, but given the amount of toil it required - and quick turnaround times with new Flash releases, meaning many late nights - I would have been better compensated doing pretty much anything else.

That said, money isn't the only, or even necessarily the most important or fulfilling, reward on offer. I got the writing gig because I was spending a lot of my free time on various Flash-related forums answering questions and helping people with their projects. I did that for the sheer joy of helping others along in their learning journey, and saw writing books as a massive extension of that.

It very much mirrors my experience of the public school system in the UK: teachers are chronically overworked and underpaid, but do it anyway because it's something of a calling. In fact, that probably applies to a bunch of other public sector roles too, not least the NHS primary care roles.


You don't write tech books for the royalties, you write them for the consulting opportunities they lead to.


1) There are so many much easier and faster ways of networking and getting clients than writing a book. 2) If the primary motivation per consulting gigs then there wouldn't be so many books written by people who work at FAANG-level companies, who I'm guessing are not on fiverr on weekends.


It's like any hobby. Sometimes people are able to take their hobby full time after a massive success. Most of the time it's a hobby that provides satisfaction.


I forget if there’s a term for this, but I once read (and subsequently discovered) that a large portion of disagreements are simply because people are working with different definitions for things.

This seems like an example of that: depending on how you define “book” these claims are accurate or not.


But the point is that most people will take the literal definition of "book". They wouldn't differentiate between hardcover, paperback, ebooks, audiobooks. If an "expert" said "90% of books sell less than 99 copies", without any context, most people will assume that the book sold less than 99 copies across all its manifestations and in a lifetime. The article puts light on the fact that there's more to that statement than what it means in a first glance.


Yes, I've also struggled to pause many disagreements just to clarify terms, ironically because the other person didn't get what I was doing since there is no common word for this sort of disagreement.

If anyone has a good one, please chime in.


I think it's called semantic disagreement. I have a background in philosophy, and I suspect a lot of disagreements are the result of disagreements over the meanings of a word, and the fact that we are using words differently is so very easily missed when we do disagree.

For example, we can argue endlessly over whether esports are 'sport', but it's really quite simple. You tell me what you mean by 'sport', and then I'll tell you whether or not esports are a sport.

In some cases, it might be a little weirder. For example, because of the discovery of the nature of visible light, 'light' now has more than one meaning attached, and these meanings are incompatible. If it's dark in a room, we might say "there is no light", but on the other hand it could be bathed in EM waves outside the visible spectrum (such as the cosmic background radiation), in which case it's full of 'light'. Light has these two meanings: one to do with whether I can see or not, and the other to do with EM waves. Context can help determine which meaning is being used.

As we grow and learn and communicate, we pick up how words are used by other people, and we try to match our usage to the way everyone else uses these word. But sometimes we don't pick up quite the same meanings even though there's significant overlap (semantic extension). There lies confusion and a great deal of pointless disagreement. I think it's mostly pointless (from a philosophical perspective) to debate about what a word should mean. If there's disagreement, stipulate how you're going to use words, or invent new ones for the conversation, and move on.


In high school debate, it was referred to as "grounds"- essentially, laying out definitions for terms important or relevant to the topic established the terrain, to use a battle metaphor.

If you're using the same words but with different beings, it's like two opposing forces not being on the same field of battle.

Two ships passing in the night is another metaphor, and is probably more common in English as a phrase.


Sounds like a version of what I've heard called "talking past eachother".


Semantic mismatch?

Miscontextualization?


I like the suggestions, I do think it need to be a bit shorter to be viral. Riffing on this, how does "we've misdefined this" sound?


Why not just “let’s define ours terms so we don’t talk passed each other? What do you mean by X?”


#letsdefineourtermssowedonttalkpassedeachother?whatdoyoumeanbyx? is sorta hard to type out, a shorthand term may lend itself to a common understanding of the situation. Like "a picture of yourself" got coined as a selfie, it seems obvious now through constant use, but there was a time when the phenomenon existed and we had a lexical gap in describing it.


“It’s semantics” is the briefest way I usually describe this phenomenon.


This and context. It is weird finding oneself in an extended argument only to find out you were talking about two entirely different contexts. Saturday Night live's "Emily Litella" skits on the weekend news[1] used this situation to good effect.

[1] https://www.youtube.com/watch?v=fZLeaSWY37I


Is “idiolect” the term you’re looking for? Everyone has an idiolect, which is the language unique to the speaker.


Sometimes, yea. But I more mean there’s an argument or debate and the two sides simply don’t have the same facts. And twenty minutes in you find out, “oh you thought I meant 22.1. I was taking about 24.0!”


There's a reason contracts start out by defining terms.


I really don't like "fact-checking" articles like this which don't contain many useful facts, only pedantry. The first comment (by Kristen McLean from NPD BookScan) is much more interesting than the article itself:

>>>0.4% or 163 books sold 100,000 copies or more

>>>0.7% or 320 books sold between 50,000-99,999 copies

>>>2.2% or 1,015 books sold between 20,000-49,999 copies

>>>3.4% or 1,572 books sold between 10,000-19,999 copies

>>>5.5% or 2,518 books sold between 5,000-9,999 copies

>>>21.6% or 9,863 books sold between 1,000-4,999 copies

>>>51.4% or 23,419 sold between 12-999 copies

>>>14.7% or 6,701 books sold under 12 copies

So, ~66.1% or 2/3 of books in their dataset sell under a thousand copies.


The pedantry was intended to point out that there is plenty of room for publishers to mislead when they don't detail how the data is collected. When you don't have access to the data, it is usually the best one can do.

Even the comment by Kristen McLean has limits, though they are much more forthcoming about what the data includes. That said, I think they summed it up best when they said publishing is a gambler's game. That being said, whether the outcome is good or bad for a gambler depends upon how much they invested and the return across all of those bets. Their data does not venture into financial aspects. At best, it gives us an idea of the minimum number of units sold in a particular subset of the market.


There were many useful facts that I hadn't considered, to be honest, especially regarding what was counted (probbly print sales tracked by BookScan) and how (unique ISBNs).

Also, whenever some party brings up stats to make some point, I think it's fair to examine their figures, methodology and so on and just be a bit pedantic about it. And this was a claim made in a major antitrust case. The facts, and the general pedantry, show some serious likely issues and raise some important questions about the figures a party in a major trial made, and which were then - mostly uncritically - echoed all over the place.

Regarding Kristen McLeans comment:

The "in their dataset" is a very important detail that shouldn't be overlooked. The dataset for this is the book sales figures of the top 10 publisher published books. They get the numbers from partnered retailers (some 75% of retailers according to the article), and it only covers print copies, not ebooks, not audio books. It does not cover direct sales to larger organizational buyers, like library systems, either.

And it's grouped by unique ISBN, not by title. As the article points out, most books come in many editions (hardcover, paperback, etc), each of which has their own ISBN. The article author illustrates this by telling about his own book, which is one book with 4 different unique ISBNs (though one was for an ebook, and one for the audio book, both of which wouldn't be covered by this dataset anyway).


The author's first point was that anything with a unique ISBN is treated as a different book.

> From a sales tracking perspective, books are published in multiple formats, each with different ISBNs. I wrote one novel, but from a title count POV I actually published 4 books: hardcover, paperback, ebook, and audiobook. Other books have even more formats (mass market version, movie tie-in editions, etc.) and because they all have different ISBNs, they all have different sales figures.

I would like to see stats that collapse across different versions of (largely) the same text (including new versions or editions of text books, and re-releases that include some special commentary, etc.)


I think those data points work together with the "let's think this through for a bit" approach of the article. Particularly, note that the stats that you quoted in your comment are for a 52-week period and include everything that could be considered a "book." I think it's quite logical that 2007's Farmer's Almanac would sell less than 12 copies in the last year, or some vanity-published, typo-laden sci-fi novel of the type my family would sometimes get for me for Christmas (because if it's sold by the Amazon dot Com then it can't be that bad, right?).


I was just about to comment this as well,

I think that 66% selling under 1,000 is just as damning as the "less than 12 copies" thing making the rounds from the DOJ interpretation

Books ain't it!


I dunno, seems that the “spirit” of these comments is correct. I’ve self-published half a dozen small books that have only low double digit sales. The comment that goes through the data is also really interesting.


Thus my self-publishing goal - to sell more than 12 copies. Well 14. Don't want to be superstitious !


I bet friends and family can help achieve such an ambitious goal, unless you are a really unlikeable person.


In my experience friends and family often tend to want to be given copies—which they may or may not read.


That only means that sales aren't your forté. Good friends and family members can be sold many copies each — which they may or may not read.


I guess not. I don’t really want to push friends and family to buy things from me they don’t really want.


Who has 14 friends and family? I have like one friend and two family.


My goal is to sell one to someone I don't know. Friends don't count.


Aha - the sub-clause that drags down my lofty ambition ...


Good article and then a data-rich comment from Kristen McLean. This article makes me feel better as an author. 30 years ago, the first 2 books I wrote for the scientific publisher Springer-Verlag didn’t sell many copies, although I have received a lot of nice feedback about the first book over the years.

After writing 10 books for traditional publishers, all fairly niche technical books, I switched to self publishing and my readership has gone up dramatically (I think). When my publisher returned book rights to me, I released the second edition of my Java AI book under a Creative Commons license. For the 5 years that I tracked downloads from my web site (https://markwatson.com) I averaged 300 downloads a day over 5 years, and the book PDF was also downloadable from many other sites. I imagine that most people downloading a free PDF only read a small part of my open content books, but I have no data on that.

Currently I distribute on the leanpub.com platform (which I recommend!!) and I get about 50 free downloads for each time a person decides to voluntarily pay for one of my open content books. The exception is my Common Lisp book for which about 1 in 20 people choose to pay for it.

Writing is a lot of fun and it has opened the door to meeting and sometimes becoming friends with some amazing people.


> It is true that most people in publishing up to and including the CEOs cannot tell you for sure what books are going to make their year.

Sadly, this is why books by celebrities or with a TV or movie tie in are the first to be published and have the most marketing spend.


I would expect that most people who try to write a book and sell it will sell exactly zero copies. They will successfully give a few copies to their friends, and Mom, and that's about it.


Ooooffff

This started well but it seems like a crap take at tweetsplaining

> When people reference book sales, they’re typically talking using NPD Group’s BookScan numbers

The guy is an author and this is the meaning of sales he pulls first? really?

Number of sales is what your publisher will tell you. Because they owe you money for every sale done. Because a sale is how many times people clicked the 'buy' button (be it digital, Amazon, or went into a bookstore).

"BookScan numbers" LOL

> there are dramatic differences between 1) lifetime sales, 2) sales in the first 12 months after publication, and 3) sales in any random calendar year.

By the context it's obvious what they mean FFS. This guy can't interpret a tweet and came up with this clickbait crap?


The thing about that dozen copies quote that everyone seems to gloss over is “trade titles”; every time I've seen that phrase anywhere else in a publishing context it's been distinguishing, in paperbacks, from mass market titles.


"Lies, damned lies, and statistics"


> "Lies, damned lies, and statistics"

There's a good book on this called "Damned Lies and Statistics" that I recommend to people.[1,2]

[1]: https://archive.org/details/damnedliesstatis00best

[2]: https://www.ucpress.edu/book/9780520274709/damned-lies-and-s...


There's also a much earlier "How to Lie with Statistics" (published in 1954). I read it in high school (45 years ago)

https://www.amazon.com/How-Lie-Statistics-Darrell-Huff-ebook...


HTLwS was written as tobacco-lobby propaganda.

<https://www.refsmmat.com/articles/smoking-statistics.html>


I read it like 2 years ago and was very impressed how relevant it was. 10/10 required reading.


Someone may write a technical book for a niche community. The book may help others, build a portfolio etc. It doesn't need to earn lots of sales to be satisfying or useful.


I agree.

I've published three books with a publisher, and the one I'm most proud of and which got the best reviews (19 ratings on Amazon, all 5 stars) is the most niche, and got the lowest sales numbers.

https://smile.amazon.com/Parsing-Perl-Regexes-Grammars-Recur... if anybody is interested.


A fair number of books in this vein are also published electronically in some form free of charge. There may also be a physical book because why not. I’ve done that myself.


I was very confused by the title. I read it a few times just thinking it was poor English. Had to click through to the article to find out why.

Why was the leading "No" removed?


Another misleading statement is that most books sell a thousand or fewer copies. This makes no delineation between fiction, which is generally intended for a large, broad audience, compared to non-fiction, which is more specific and more books are published as non fiction. Technical, reference books are expected to sell far fewer copies compared to teen fiction.


A histogram of how many books sold a determined amount last year would be better to explain what the article means.


This “fact-checker” headline template is quite tired by now.


I was shocked after reading that link, most books sell fewer than 5000 copies.


Because people have friends?


And because public libraries sometimes buy uninteresting books to fill their quotas.


Thomas had never seen such garbage before.


TLDR; author doesn't find the numbers shared in some Twitter gossip as plausible, but has no better data than their own gut feeling from being in the industry.

There doesn't seem to be much to take home here, other than that the original tweet isn't clear about its own denotation or veracity.


>>>0.4% or 163 books sold 100,000 copies or more

>>>0.7% or 320 books sold between 50,000-99,999 copies

>>>2.2% or 1,015 books sold between 20,000-49,999 copies

>>>3.4% or 1,572 books sold between 10,000-19,999 copies

>>>5.5% or 2,518 books sold between 5,000-9,999 copies

>>>21.6% or 9,863 books sold between 1,000-4,999 copies

>>>51.4% or 23,419 sold between 12-999 copies

>>>14.7% or 6,701 books sold under 12 copies

- Kristen McLean from NPD BookScan


BookScan isn't a reliable source of information unfortunately. It only counts when a book's ISBN is physically scanned over a scanner (no ebook, audio book, libraries, specialty sales, etc) and also only covers 75% of retail in general. Generally, you can bet BookScan largely undercounts by a very wide margin.


The above numbers are from her response in the comments in the original article, it's worth reading that to understand how those numbers are made up, she explains in depth


I would be interested to see kindle data for self-published


The majority of books written don't even get published - so the majority of books are definitely read by less than 12 people.


This way of looking at it might justify the statistic, but at the cost of making it uninteresting.


Interesting enough to me, and pretty relevant to a claim like "book-writing is rarely commercially worthwhile". No "points scored" against the publishing industry, but point-scoring is for shallow people.

For publishing, I wonder how many copies of little-bought books are read, and how many are printed -- both probably quite different to the number sold. And I also wonder how the outcome distribution compares to venture capital outcomes, and what predictor variables are useful. "Harry Potter" is a famous case of prediction being difficult (or at least badly done?) but you can probably get some signal from author (writing history, other celebrity), genre.


what counts as a book? If a book thats not published is a book, what about a collec5uon of my notes and memoirs on my blog. It got read by loads of people, should it count to the statistics?


This ambiguity around what a book is seems like an artificial one. Go to a bookstore or view the catalog in Apple Books. Those are books, even the $0.99 micro stories one might find on, e.g., Kindle. Anything else might be a book, but probably not in a way that is useful or constructive to analyze in the context of sales.


A sample of 75% of retail sales is a substantially larger sample than, say, the estimates by the US Census.

If you believe that BookScan isn't reliable, then you also have to believe that the US Census data is totally crap.


> Because this is clearly a slice, and most likely provided by one of the parties to the suit, I decided to limit my data to the frontlist sales for the top 10 publishers by unit volume in the U.S. Trade market. My ISBN list is a little smaller than the one quoted in the DOJ, but the principals will be the same.

> The data below includes frontlist titles from Penguin Random House, Simon & Schuster, Hachette Book Group, HarperCollins, Scholastic, Disney, Macmillan, Abrams, Sourcebooks, and John Wiley. The figures below only include books published by these publishers themselves, not pubishers they distribute.


When you limit your data to those published by (fairly) large publishers, you've already skewed the data irreparably. Most of them won't even look at a book unless an agent brings it to them, and most agents won't represent most would-be authors.

On the other hand, some technical books don't require agents, and O'Reilly has to be a very large publisher in terms of books sold.

Some other categories don't, either -- I know someone who publishes "cozy mysteries" through a real publisher (not a giant one), and she doesn't have an agent.


> On the other hand, some technical books don't require agents, and O'Reilly has to be a very large publisher in terms of books sold.

I think you are drastically over-estimating the share of developers that read software development related books.

Out of the 150+ software engineers I worked with on a daily basis throughout my career so far, I can guarantee you not even 5% have ever read programming books (and I work at a FAANG, not your average mom and pop shop).

It's a niche market.


Really? I can't think of any dev I've worked with who didn't at least have some reference books handy. Though to be fair, it's been 10 years or so since I've worked in-person with people on a daily basis, so maybe my impression is just way out of date.

I still buy the occasional programming book, but nothing like I used to now that we have all the online resources.


> some reference books handy

Looking at a random list [1] of O'Reilly books, I can see 3 categories:

- The ones for beginners, like "Learning Python" or "JavaScript: The Definitive Guide",

- The ones that will be outdated even before reaching the shelves of a library, such as "Kubernetes in Action" or "Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow",

- The ones that are more about concepts such as "Clean Code".

I can't see any of those being used as a reference book. The Internet and official documentations is the reference book.

[1] https://www.toptechskills.com/top-tech-books/


At least half the people at my current job buy technical books because we get a monthly book allowance. Whether they finish reading them is another story.

In my experience this is pretty common. Almost everywhere I’ve worked has had some kind of training budget, and most places have had fairly well attended book clubs.


I think O'Reilly has really tried to adapt to the online revolution. I don't know how well it's worked.


Thinking over my last few programming book purchases, they're really more or less textbooks. I get them for the structure they provide to in-depth learning, which usually doesn't work so well with online materials.


well, you have a different 150 than I do, I'm afraid, and I was also at a FAANG. A really high percentage of people I know have O'Reilly books on their shelves.


Also, the distribution is skewed. Between my wife and my own collection, we have owned at least 120 O'Reilly books.


It used to be that Addison-Wesley was the unrivaled king of CS publishing. If you saw the AW logo on the cover, you knew it was gonna be a rock solid book. Sadly, at some point they seem to have slacked off on their standards a bit, and now it looks like they play second fiddle to O'Reilly. (I don't know if at present there's much of an advantage in quality either way.)


Is this the moment in the conversation where someone steps in to commend Fred Brooks and David Parnas?


With such a large portion less than 1000, it would be nice to see it broken down more. Was it more 20, or 900?


Well the distribution would almost have to be skewed to the lower side given then general trend(but obv this isn't a very scientific method)


Can you just plot the histogram and guesstimate based on the distribution?


Before you try to try to interpret these numbers, you should be aware:

- the numbers are a for a 12 month period of sales, not for books published in a particular 12 month period (see below for why this matters)

- some of those books were published near the start of the 12 month period, so the count represents their first 52 weeks of sales

- some of the books were published in the last week or the 12 month period, so the count represents their first couple of weeks of sales

- some of the books were published almost a year before the start of the period, in which case the count represents the number of sales within the last couple of weeks of their first year of sales (sales >12 months after publishing don't count as 'frontlist' so aren't included in these numbers)

tl;dr the % figures towards the bottom of the list are probably too pessimistic


McLean's comments are spot-on, if you read them carefully. She describes herself as a "numbers gal" and she is.

Limiting it to the top publishers immediately leaves out lots of books. But OK, these are major players who are putting their own resources on the line for some books, so that's a valid slice.

For that 14.7% that sold under 12 copies: as a self-published author, I have to say, "Why not my book instead of that crap?"

The problem, of course, is that they didn't expect it to sell that few copies when they printed it. They didn't say, "Hey, this one looks like 10 copies or so. Let's go with it!"


What does "book" and "sell" mean here, following the article's excellent explanation on how those terms vary wildly. Are those figures per year or lifetime for the books?


This reminds of the Jordan Peterson thing where when he's asked "Do you believe in God" he replies with "Depends on what you mean by believe and God".


To be fair to JP, he's not always right but in this instance, he does have a point.

I believe in physics but I also believe that the map is not the territory. Go back a few centuries and the model of the atom was that of the Christmas pudding. We now know better but our own models are most certainly not correct. Thus I'm believing in something I know to be faulty on some level.

Similarly with God, each religion's conception is different. If you believe in a pagan religion you believe in lots of different gods. If you believe in a monotheistic religion you believe in one God.

It's very difficult to have a conversation free of misunderstanding on abstract matters unless you have a good grasp of the underlying concepts your conversational partner embodies in a word and vice versa. With a subject as touchy as religion, it's prudent to define terms early.

Likewise, I think asking for definitions of 'book' and 'sold' are perfectly valid questions.


Do we have similar statistics for the App Store?


Writing a book in 2022 is like opening a DVD renting shop...




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: