What ebook production problems are self-publishers facing?

Driven by curiosity (as always), I’ve just spend a large part of my lunch break browsing through various forums[1], trying to get a handle on what problems self-publishers are facing when they are creating their ebooks.

My impression is that, unlike what I expected from the work and challenges I face making ebooks for a traditional publisher, styling and formatting isn’t a major issue—formatting problems seem limited to edge cases. I’m assuming this is because most self-publishers are doing novels with very simple style needs.

The problems people seem to be facing, in no particular order:

Continue reading

The unevenly distributed ebook future

(This is the fifth post in a series on the publishing industry’s new product categories.)

Data serves the status quo.

Anything new or undiscovered by definition does not have a data footprint. Existing data collection and filtering techniques have biases that do not take the unknown or unfamiliar into account.

Unless you have a clear theory and a well-designed experiment to prove or disprove it, the only thing more data will tell you is that your preconceptions and existing biases are correct. With enough noise, your brain will find it easy to ‘discover’ patterns and correlations that support whatever it is you want supported. Data, on its own, serves your worldview.

This is the problem with almost all analytics systems in common use. Unless you are running a tightly controlled experiment, the only thing data will do is paint you a general picture of the status quo; it’ll give you the shape of, say, your web traffic—the ‘sources’ of the nameless mass that fills your comment threads with tripe—but it won’t help you discover any of the ‘whys’. Why are they here? Why did they read it? Why did they comment? Why did they (or didn’t) come back?

Why didn’t they buy my book?

To pretend that an A/B test can tell you why a reader decided not to buy the ebook edition of a footballer’s biography is to accept a worldview that is incompatible with the very act of publishing longform prose in the first place.

For a simple A/B test to be able to tell you why a reader made the decision not to choose a book or a format you have to believe that the human mind is a simplistic machine, driven entirely by pre-programmed responses to external stimuli, to be hacked by an enterprising grifter. A mind like that is never going to comprehend, let alone enjoy, extended piece of text. A humanity like that would never have risen out of the mud to read or write books.

You can A/B test small theories and small issues, but it is not an experimental model that will help you find answers to complex questions or understand complex problems.

Before we do anything else, when we have an issue, we need to come up with a theory—an idea for how things work that you can then explore and try to prove or disprove.

Then you need to figure out an experiment that specifically disproves that theory, which is sometimes next to impossible because, we in publishing don’t have access to the environment where the experiments need to be implemented and run.

If this method seems slow and awkward (the only conclusive result you can have is partial disproval, not confirmation), then that’s because it is. It’s also the only way to know. Anything else is guesswork.


It’s a classic quote that is tailor-made for the modern internet: short, facile, glib, simplistic to the point of being useless.

The future is already here — it’s just not very evenly distributed.

—William Gibson

The problem with the line is that it’s using the term future as a shorthand for technology and the changes it engenders—equating it with progress.

It has a simple message: progress remains a two-dimensional timeline (past → present → future), but that places, markets, and cultures are unevenly distributed along that timeline. Crap countries are stuck in the past. Good countries have a head start on the future.

As such it isn’t much of an improvement over the standard progress myth. In fact, it makes it worse by adding a dollop of neo-colonialism into the mix. “They are savages because they just haven’t had their share of our ‘future’ yet—not because a broken global economic system is holding them in debt-slavery”.

The publishing industry has bought into this idea wholesale. Some publishing markets are, according to this worldview, further ahead on the progress timeline than others. It also implies that advancement along the timeline is inevitable, even if it progresses at varying speeds. Romance and other genre fiction tend to dominate ebook sales and so must have more ‘future’. Non-fiction less so and must therefore have less ‘future’ and more of that crippling ballast called ‘past’. Big mainstream titles hit the ebook market in seemingly unpredictable ways. Some garner decent ebook sales while others seem to sell only in print. There, the ‘future’ seems to be randomly distributed, like a stress nosebleed over a term paper.

This, obviously, implies that the ebook will either eventually dominate universally or at least capture the same large percentage uniformly across the market.

I don’t think that’s going to happen.

The various publishing markets differ in fundamental ways that won’t be changed by ebooks. As others have said, ‘ebooks are terrific and haven’t changed a thing’.

Some will switch entirely to ebooks. Some partially. Some almost not at all.


If you’re going to generalise about readers, try not to generalise too much and stick to specific tastes and behaviours. Anybody claiming or even implying that an entire age group or economic class broadly behaves in the same way clearly hasn’t been observing book buyers for a long time. Claiming that those under twenty-four prefer print or that the more affluent prefer ebooks is useless even if it were true (probably isn’t) because those categories are too broad for us to guess what sort of books they are buying. Knowing that buyers of a specific genre prefer one format over another is clearly more useful than finding out that two-thirds of the young people who couldn’t avoid your survey didn’t like ebooks. One is actionable. The other isn’t.

It would be even better if we were able to make an educated guess of how a genre’s readers break down into behaviour groups:

  • Does a single kind of reader dominate? (casual readers, heavy readers, blockbusters only, etc.)
  • Or, is the readership more varied than that?
  • Is the distribution of the kinds of readers reliable across the genre or do sub-genres or individual titles differ substantially?

We are largely working blind here and unless you manage to get a critical mass of readers to buy from you directly and then read the books in an environment you control (good luck with that), it will be impossible to get even vaguely accurate guesses.


Some titles aren’t going to sell well as ebooks and there isn’t anything we can do about it except pray they turn into blockbusters. Because, if the title does turn out to be a blockbuster, you can always pay for a proper ebook version once the money starts rolling in.

The converse also holds true for ebook-heavy genres where the credo “ebook-first, print if popular” might well be printed above the door of every publisher (self- or other-) in the future.

If you have a title that is:

  • Visually rich.
  • Or, poses in some way to be an ebook production challenge.
  • And, is likely to appeal mostly to a print buying audience (this can happen for a variety of reasons).

Then the logical action to take is to quite simply not make an ebook version. Unless a high quality ebook is an almost free byproduct of your production workflow spending money on creating an ebook version of a title like that is likely to be a waste of money.

Conversely, print will not be viable for some markets within the industry, generally those dominated by ebook readers or have been thoroughly disrupted by apps and websites.

Either way, the single biggest concern publishers should have is to figure out ways to either discover or change the composition and shape of their readership. Making decisions on digital production will be next to impossible without that knowledge.

The Checklist: fix iBooks image handling

Mike Cane suggested that I put together a checklist of problems that need to be fixed in ebook format handling so those at fault could be made accountable.

So here goes the first entry:

Hey Apple! Fix iBook’s image handling! Because it is totally broken.

When building an ebook with images you have two options today for how to prepare images for iBooks:

  1. None of them will display properly
  2. Some of them will display properly, seemingly at random

The difference lies in a metadata toggle called “ibooks:respect-image-size-class”. It may say respect-image-size, but it lies! A more appropriate name would be respect-image-size-sometimes-if-I-feel-like-it-and-you-sacrifice-a-chicken.

But I guess that would be to long. Maybe the name used is the abridged version.

Of course, Apple’s crazy madcap plan for worldwide sensible image sizing might have worked if every ePub production app in the world followed Apple’s rules, but they don’t. So, please stop.

So, here’s my suggestion to Apple: just render images like the browser does, paying attention to the attributes on the element and its style declarations. Don’t try to be smart because when you fail (which iBooks does frequently) you just look extra dumb.

Just say no to ebook CSS and JS

You think I’m joking?

One of the biggest issue publishers face with ebook production is the somewhat adversarial attitude ereader and app vendors have taken towards publisher stylesheets.

Publisher styles are largely overridden by default at Kobo, B&N, and in Aldiko. Even iBooks requires you to slot in a set of proprietary meta tags before it respects your font and image decisions.

Problems with vendor stylesheet overrides:

  • They are inconsistent from platform to platform, vendor to vendor, and device to device.
  • They are partial and only cover a portion of the meanings and structures you can express in even basic HTML.
  • Some of them are only applied when the ebook is loaded through the vendor’s service, not when opened directly by an app (I’m looking at you Kobo).
  • They make regular CSS unpredictable. CSS is a complex language so mixing regular CSS in with a set of aggressive overrides can have unforeseen results, like low-contrast element colours, invisible text, badly sized or even indecipherable images, and more.

Vendor overrides are one of the biggest time sinks in ebook development. Because they are only partial, we have to include our own styles, but the mix is often unpredictable. The number of basic things that break, seemingly randomly, in Kobo, iBooks, or whatever random RMSDK-based ereader you have this week is too high to be disregarded. So, we test, and fix, and test, and fix.

(The simpler your stylesheet is, the less you have to test, obviously. Which is a decent motivation to make your styles as minimal as you can. Unfortunately, too many publishers and authors are dead set on forcing ebooks to mimic the styles of their print edition, filling it with all sorts of stylistic crap that’s patently inappropriate to digital. I draw the line at drop caps, which just can’t be done properly in an accessible way in digital. They are a trite Victorian affectation that compromises readability.)

To make matters worse, a lot of the ebooks publishers are releasing are full of insane crap that even the worst hack web developer wouldn’t dream of trying to pull off. Like making every element of a book either a P or a SPAN.

Which is a practice that makes the noise people at the IDPF make about non-xml HTML5 being tag soup pretty ridiculous, XHTML is just as capable of non-semantic tag soup as HTML. Oh, and it also makes any claim of EPUB3’s superior accessibility rather silly as there’s no way for screen readers to tell which P is supposed to be a heading and which is actually a paragraph. Complex accessibility features are meaningless if all the publisher gives you is a indistinct blob of tags.

Then there’s the tendency of some systems to output ebooks where the only styles come in the form of style attributes on every single element, making any attempt to work with the styles of the ebook impossible.

The biggest problem with these ebooks is that everybody thinks they are okay because they look okay when opened. Headings are bold and large, because that’s a bit of CSS most vendors respect. Quotes are indented. Italics are italicised. The basic structure of the ebook looks preserved and the stupid crap in the ebook is ignored. Vendor overrides basically work for crap ebooks. From both the publisher’s and the vendor’s perspective this is a success. The vendor is happy because an atrocious ebook file is made readable and a large portion of their inventory remains sellable. The publisher is happy because they are short-sighted fucks who just got away with not giving a flying toss about ebooks and feel fine about making zero investment in the biggest growth area in publishing since the introduction of the paperback (yes, they are morons).

But, they are both wrong. That ebook is broken and needs to be fixed. It’s inaccessible to screen readers. It’s an opaque blob to text analysis like Amazon’s X-Ray. It’s an indecipherable mess to search engines (which are going to be damn important in the future). An ebook that doesn’t have structure is broken and unacceptable.


I propose a conditional surrender

The more I discover about existing publisher ebook production processes, the more I talk to people ‘on the inside’, the clearer it becomes that a substantial portion of existing ebook inventory is quite simply rubbish. No structure. Crap stylesheet. Broken markup.

So I propose that ereader vendors simply turn all publisher styles off and never even consider enabling javascript. Considering how much of a mess these clowns are making of basic markup and CSS, how likely do you think it is that they can do javascript safely?

Not bloody likely at all.

In exchange, what we need you to do is to improve your built-in stylesheets. We need you to support common markup practices like figures and captions, headings and subheadings, horizontal rules that don’t look like a 90s flashback and so on. Best if you support them both in markup patterns and as class-based microformats.

WordPress’s classes for captions and images with .alignleft .alignright and the like are a good start. As are common microformats such as hAtom and hNews.

And if you can support basic HTML5 structures such as:

<header>
    <h1>Heading</h1>
    <h2>Subheading</h2>
    <p>Tag line</p>
</header>

Or:

<figure>
    <img blablabla />
    <figcaption>The image's caption</figcaption>
<figure>

If you manage to render every bit of those patterns appropriately (e.g. subheadings, tag lines, captions, etc.), that would be nice as well.

Oh, and don’t forget some nice styles for tables. Standard syntax highlighting for CODE elements would be a bonus.


–You aren’t serious?

I absolutely am. The key here is a full-featured built-in stylesheet that correctly styles all major structural elements of the book. This would mean that the only thing you need to do to make sure an ebook is okay is to load it and see. If it looks like a heading it will be a heading, etc.. Everything will be what it looks like. Books with crap, inaccessible, structure will have crap inaccessible styles and so be exposed immediately. Books that are properly structured will look as great as the vendor and reader (with their chosen settings) intended.

It would do to ebooks what RSS and SEO did to websites.

(In case you weren’t around in the web industry over a decade or so ago: the structural quality of web development tools and CMSes didn’t begin to improve until client apps that required structural quality began to be important, namely RSS/Atom readers and search engine crawlers. Before that most tools generated markup that was an atrocious mess of tables and font tags. Any publisher who thinks search engines won’t be important to ebooks is very mistaken.)

Ebook production would be dramatically cheaper and simpler, largely consisting of making sure that the structure of the ebook is preserved throughout the editorial process. Little to no testing required and people can focus on bikeshedding the cover design instead.

—You can do this already by just not including any styles in your book.

That only solves my problem (production costs) and it only solves it if my book is only plain text with a few headings, italics, bold, and maybe some quotes. Existing built-in stylesheets are inadequate to the job.

It doesn’t solve the problem of how to motivate publishers to improve their ebooks without making them unreadable. By robbing the faux-headings and the like of their styles you surface the blobby soupy nature of the book without destroying it. And knowing how crap the ebook in general will be, it’d probably still look a lot nicer than if you enabled all styles and let the publisher’s incompetence shine through.

The built-in stylesheets provided by vendors cover too little of what an ebook needs. Add in figures, captions, table styles, code highlighting, some structural awareness—headers, footers, article, and the like—a few microformats, and some nice horizontal rules and we’d be mostly sorted.

Then, once you got your built-in stylesheet in order, just turn off all publisher CSS completely and tell everybody to go and fix their fucking ebooks.