What ebook production problems are self-publishers facing?

Driven by curiosity (as always), I’ve just spend a large part of my lunch break browsing through various forums[1], trying to get a handle on what problems self-publishers are facing when they are creating their ebooks.

My impression is that, unlike what I expected from the work and challenges I face making ebooks for a traditional publisher, styling and formatting isn’t a major issue—formatting problems seem limited to edge cases. I’m assuming this is because most self-publishers are doing novels with very simple style needs.

The problems people seem to be facing, in no particular order:

Continue reading

Just say no to ebook CSS and JS

You think I’m joking?

One of the biggest issue publishers face with ebook production is the somewhat adversarial attitude ereader and app vendors have taken towards publisher stylesheets.

Publisher styles are largely overridden by default at Kobo, B&N, and in Aldiko. Even iBooks requires you to slot in a set of proprietary meta tags before it respects your font and image decisions.

Problems with vendor stylesheet overrides:

  • They are inconsistent from platform to platform, vendor to vendor, and device to device.
  • They are partial and only cover a portion of the meanings and structures you can express in even basic HTML.
  • Some of them are only applied when the ebook is loaded through the vendor’s service, not when opened directly by an app (I’m looking at you Kobo).
  • They make regular CSS unpredictable. CSS is a complex language so mixing regular CSS in with a set of aggressive overrides can have unforeseen results, like low-contrast element colours, invisible text, badly sized or even indecipherable images, and more.

Vendor overrides are one of the biggest time sinks in ebook development. Because they are only partial, we have to include our own styles, but the mix is often unpredictable. The number of basic things that break, seemingly randomly, in Kobo, iBooks, or whatever random RMSDK-based ereader you have this week is too high to be disregarded. So, we test, and fix, and test, and fix.

(The simpler your stylesheet is, the less you have to test, obviously. Which is a decent motivation to make your styles as minimal as you can. Unfortunately, too many publishers and authors are dead set on forcing ebooks to mimic the styles of their print edition, filling it with all sorts of stylistic crap that’s patently inappropriate to digital. I draw the line at drop caps, which just can’t be done properly in an accessible way in digital. They are a trite Victorian affectation that compromises readability.)

To make matters worse, a lot of the ebooks publishers are releasing are full of insane crap that even the worst hack web developer wouldn’t dream of trying to pull off. Like making every element of a book either a P or a SPAN.

Which is a practice that makes the noise people at the IDPF make about non-xml HTML5 being tag soup pretty ridiculous, XHTML is just as capable of non-semantic tag soup as HTML. Oh, and it also makes any claim of EPUB3’s superior accessibility rather silly as there’s no way for screen readers to tell which P is supposed to be a heading and which is actually a paragraph. Complex accessibility features are meaningless if all the publisher gives you is a indistinct blob of tags.

Then there’s the tendency of some systems to output ebooks where the only styles come in the form of style attributes on every single element, making any attempt to work with the styles of the ebook impossible.

The biggest problem with these ebooks is that everybody thinks they are okay because they look okay when opened. Headings are bold and large, because that’s a bit of CSS most vendors respect. Quotes are indented. Italics are italicised. The basic structure of the ebook looks preserved and the stupid crap in the ebook is ignored. Vendor overrides basically work for crap ebooks. From both the publisher’s and the vendor’s perspective this is a success. The vendor is happy because an atrocious ebook file is made readable and a large portion of their inventory remains sellable. The publisher is happy because they are short-sighted fucks who just got away with not giving a flying toss about ebooks and feel fine about making zero investment in the biggest growth area in publishing since the introduction of the paperback (yes, they are morons).

But, they are both wrong. That ebook is broken and needs to be fixed. It’s inaccessible to screen readers. It’s an opaque blob to text analysis like Amazon’s X-Ray. It’s an indecipherable mess to search engines (which are going to be damn important in the future). An ebook that doesn’t have structure is broken and unacceptable.


I propose a conditional surrender

The more I discover about existing publisher ebook production processes, the more I talk to people ‘on the inside’, the clearer it becomes that a substantial portion of existing ebook inventory is quite simply rubbish. No structure. Crap stylesheet. Broken markup.

So I propose that ereader vendors simply turn all publisher styles off and never even consider enabling javascript. Considering how much of a mess these clowns are making of basic markup and CSS, how likely do you think it is that they can do javascript safely?

Not bloody likely at all.

In exchange, what we need you to do is to improve your built-in stylesheets. We need you to support common markup practices like figures and captions, headings and subheadings, horizontal rules that don’t look like a 90s flashback and so on. Best if you support them both in markup patterns and as class-based microformats.

WordPress’s classes for captions and images with .alignleft .alignright and the like are a good start. As are common microformats such as hAtom and hNews.

And if you can support basic HTML5 structures such as:

<header>
    <h1>Heading</h1>
    <h2>Subheading</h2>
    <p>Tag line</p>
</header>

Or:

<figure>
    <img blablabla />
    <figcaption>The image's caption</figcaption>
<figure>

If you manage to render every bit of those patterns appropriately (e.g. subheadings, tag lines, captions, etc.), that would be nice as well.

Oh, and don’t forget some nice styles for tables. Standard syntax highlighting for CODE elements would be a bonus.


–You aren’t serious?

I absolutely am. The key here is a full-featured built-in stylesheet that correctly styles all major structural elements of the book. This would mean that the only thing you need to do to make sure an ebook is okay is to load it and see. If it looks like a heading it will be a heading, etc.. Everything will be what it looks like. Books with crap, inaccessible, structure will have crap inaccessible styles and so be exposed immediately. Books that are properly structured will look as great as the vendor and reader (with their chosen settings) intended.

It would do to ebooks what RSS and SEO did to websites.

(In case you weren’t around in the web industry over a decade or so ago: the structural quality of web development tools and CMSes didn’t begin to improve until client apps that required structural quality began to be important, namely RSS/Atom readers and search engine crawlers. Before that most tools generated markup that was an atrocious mess of tables and font tags. Any publisher who thinks search engines won’t be important to ebooks is very mistaken.)

Ebook production would be dramatically cheaper and simpler, largely consisting of making sure that the structure of the ebook is preserved throughout the editorial process. Little to no testing required and people can focus on bikeshedding the cover design instead.

—You can do this already by just not including any styles in your book.

That only solves my problem (production costs) and it only solves it if my book is only plain text with a few headings, italics, bold, and maybe some quotes. Existing built-in stylesheets are inadequate to the job.

It doesn’t solve the problem of how to motivate publishers to improve their ebooks without making them unreadable. By robbing the faux-headings and the like of their styles you surface the blobby soupy nature of the book without destroying it. And knowing how crap the ebook in general will be, it’d probably still look a lot nicer than if you enabled all styles and let the publisher’s incompetence shine through.

The built-in stylesheets provided by vendors cover too little of what an ebook needs. Add in figures, captions, table styles, code highlighting, some structural awareness—headers, footers, article, and the like—a few microformats, and some nice horizontal rules and we’d be mostly sorted.

Then, once you got your built-in stylesheet in order, just turn off all publisher CSS completely and tell everybody to go and fix their fucking ebooks.

Proprietary ebook formats versus DRM

Micah made this here statement on Twitter the other day, articulating neatly what a lot of us have been thinking for a while now:

Very true.

It’s something that has bothered me for years and years. I spent years arguing against the use of proprietary formats in interactive media academia (they were unnaturally fond of what was then Macromedia Director). Then proprietary ebook formats became my bugaboo. But tilting at windmills hasn’t gotten me anything but heartache and a reputation for being a bit of a jerk. I’ve now accepted the fact that proprietary formats are always going to be with us. If it doesn’t bother the buyer it doesn’t bother the buyer, simple as that. But I’ve often tried to figure out a good framework for discussing and analysing this dynamic between proprietary and standard formats. What’s the best way to think about this and find a way to combat proprietary formats?

One angle is that standardisation lowers cost for producers and lets them make more interesting products, but that’s not likely to sway Amazon who value the flexibility and power of an owned format and don’t bear the costs of production. And customers generally don’t care since they might not even benefit at all from lower production costs (some producers would just use the opportunity to increase their margins).

The other angle is interoperability and modularity, which increases the flexibility and value of the ecosystem as a whole. But that also changes the power dynamic in the ecosystem in less than predictable ways, something that the big dogs in the system won’t like. When you’re the biggest there’s no such thing as good unpredictable change. Amazon’s system is mostly vertically integrated anyway, leaving little room for interop. And many opportunities for really lucrative interoperability have been throttled in the crib by Apple’s stringent iOS policies. (Why ebook vendors aren’t doing more interesting things on Android where they aren’t held back by the platform owner’s policies is beyond me, but that’s a blog post for a different day.)

Then I stumbled upon the super obvious way to look at the problem. So obvious that it’s embarrassing that I haven’t pursued it as a serious argument before. Yeah, I know, I can be thick sometimes.


I didn’t hit upon it directly, but Micah’s above tweet did remind me of something I’d just read. From The Technique of the Novel – A Handbook on the Craft of the Long Narrative by Thomas H. Uzzell:

Ask any novelist in trouble with his plot what he intends the effect to be and he will answer something like this: “I intend to show that love between two such people is impossible.” This is material, not effect. Effect would be, say, the pathos or tragedy felt by the reader in a narrative about two people vainly attempting happiness in marriage. Amateurs in any art talk in terms of materials; professionals, in terms of effects.

Effects are subjective experiences; materials are objective experiences. Effect is response; materials are stimulus. Effects are the emotional qualities of things.

It’s features versus benefits all over again. In this context the materials the novelist uses are the features and the effects on the reader are the benefits. A writer should not think in terms of the materials (what you write) but in terms of the effects (how the writing affects the reader).

It’s ties into an adage from marketing, features are meaningless to the buyer, they need to be told how they benefit. But if there’s one thing I learned from my friends in marketing back in my software days it’s that this principle applies everywhere. No marketer can gloss up a Frankenstein monster app pieced together out of departmental hobby horses. Most software is a confusing turd made out of disparate components by a bunch of socially inept developers who can’t think in terms of user benefit. Moreover, they don’t really care about the user. Most developers think in terms of abstract beauty of the code and architecture, conceptual integrity of the components, and of ticking of checkboxes in a feature list. They don’t give a toss about the experience unless you can itemise it as a development checklist.

Bringing this back to ebooks…

Those who are trying to shift the market away from proprietary formats can’t try and market their way out of the problem. A tactic used by some is to harangue critics like me for pointing out important flaws in the EPUB ecosystem, but silencing critics won’t address the flaws. It will not change the fact that as a whole, the EPUB ecosystem offers readers fewer benefits than the Kindle ecosystem.

Offering equal benefits will not be enough to sway consumers. To change the status quo you need to outclass Amazon massively in benefits.

And in case you were wondering, Readium SDK is a feature, not a benefit. It’s what you do with it is what’s going to count.


My suggestion is simple: focus on the benefits Amazon can’t replicate.

If a reading app feature turns out to be a competitive advantage, Amazon is likely to copy it with ease.

Rendering or interactivity features aren’t likely to make a difference because sensible publishers will focus on making their titles cross-platform compatible. Amazon’s rendering and interactivity features are going to dominate as the lowest common denominator.

You can’t beat Amazon on selection or price. The Kindle’s ease of use is going to be hard to top. Their customer service is far above what others offer.

The one thing the EPUB ecosystem can offer that Amazon can’t, is tight interoperability between unrelated ebook vendors, services, and reading apps.

That’s it. That’s your only card to play.

  • A major retailer could implement Readmill’s Oauth integration API. Imagine buying an ebook from B&N, Kobo, or Google and having it automatically load into your Readmill library. Awesome, right? It would be even better if your Kobo library automatically synced your purchases with your Readmill library and vice versa. You wouldn’t even need new standards to do this, just the will to implement.
  • Apple could change its policies to allow in-app ebook browsing and purchase and enable more integration and interoperability in ebook reading apps.
  • We need a high quality web-based ebook reading app integrates with a host of relevant services. Most attempts I’ve seen to date are buggy, unusable, bare-bones, or half-arsed.
  • Apple could implement something like Readmill’s Oauth API, letting retailers securely send ebooks to iBooks. Or, again, better yet if they implement a library syncing API.
  • App developers could standardise rendering, including how overrides behave and how pagination affects existing web standards.

Basically, what I want is for the EPUB side of the ebook market to put their money where their mouth is. So far, they only seem to support EPUB because it isn’t Amazon and don’t take any advantage of the biggest benefits of standardisation. Namely, interoperability and modularity.

As I said, the one major advantage of a standard format is interoperability. The obsession ebook vendors have with silos and their antipathy towards easy interop is crippling their only competitive advantage over Amazon, the one big thing they can use to increase the benefit a reader gets from their ecosystem. Being able to easily mix and match reading apps with retail services and have them integrate tightly is something Amazon can’t replicate.

Copying Amazon’s vertically integrated stack when your only sensible strategy is interoperability is, quite frankly, insane.

With the way B&N, Kobo, Google, and Apple have been behaving, it’s a miracle that Amazon still doesn’t own more than 80% of the market.

Then again, what little headway they have made was largely due to illegal collusion.

B&N, Kobo, Google, and Apple separately can never compete with Amazon on price or range.

If I could hook all of their ebook retail services to Readmill so that all of my purchases are automatically added to my library, then I, as a consumer, can begin to treat them as a single market. Not having to worry about whether any given ebookstore is compatible with my chosen reading app makes me less resistant to try them all. Impulse purchases become more likely.

Together they can offer a competitive price and range. A book that isn’t available in Kobo might be available in B&N, Google Play, or the iBookstore. Proper interoperability will convert more readers away from the Kindle and so increase EPUB sales, to everybody’s benefit.

And, as I’ve been saying, the benefit is what counts.