Thursday, February 14, 2013

EPUB Zero: a radically simple(r) e-book format

Earlier this week, I participated in the W3C Workshop on Electronic Books. A common theme was the complexity of the EPUB3 specification, how difficult it was to implement, and how few implementations exist.

These ideas were expressed most forcefully by Daniel Glazman (slides available here as PDF). I'd been familiar with some of his thoughts, as he'd posted extensively about his experience with EPUB3 as he implemented an EPUB3 editor. His rant about the absurd number of navigation files particularly resonated with me. Why do we need a manifest, a spine, an NCX, a nav document, landmarks, and guides?

What would an e-book look like, if we tried to avoid as much complexity as possible? The idea wouldn't go away. I try to avoid abstract thinking, and so my natural reaction was to build a sample book and see what happened. So I'm in the middle of that process, with Moby-Dick, of course. Let me know if you want me to email you a copy.

PRINCIPLES


The goal is for an e-book to be as simple as possible, and as close to the web as possible. Is it possible to make an e-book without any e-book-specific features? Do we need anything beyond bog-standard HTML5, CSS, JavaScript, SVG, MathML, and media? I'd like to find out.

Another goal is to make authoring easier. I wonder how much of the complexity of previous e-book specifications was to make life easier for reading system developers, who have course been the major participants in the standards bodies (not that there's anything wrong with that!).

INSIDE EPUB ZERO


The Container


An EPUB Zero file is a zipped folder containing only content files. It's identified by the file extension. There is no mimetype, no META-INF, no container.xml. And so the zip process is much simpler, your operating system's zip command should work without changes. None of this
zip -v0X $FOLDER mimetype
zip -vr $FOLDER * -x $FOLDER mimetype
bullshit.

The Package


The heart of an EPUB Zero is the index.html file. The reading system (if they were ever to exist) would look inside the zip for index.html. This file provides navigation (via the nav element), defines the order of content documents (via the nav element!), and contains document metadata (see below). Not all content documents need to be in the nav element; if they're in the zip you can reference them via links and have an "out of spine" item. Same goes for images, as well as audio, and video (perish the thought).

An open question is how to define what happens when the book first opens. You might not want to see a complex table of contents as soon as you open a book. Perhaps if the nav element is hidden, the reading system would then just open the first document referenced by nav.



Metadata


I'm unsure how to handle metadata. My first thought was to use the head of index.html:

<meta charset='utf-8'> 
    <title>Moby-Dick</title>
    <meta name="dcterms.creator" content="Herman Melville"/>
    <meta name="dcterms.title" content="Moby-Dick"/>
    <meta name="dcterms.identifier" content="x9780000000000"/>
    <meta name="dcterms.language" content="en"/>
    <meta name="dcterms.modified" content="2013-02-14"/>
    <meta name="dcterms.publisher" content="Harper & Sons"/>

That strikes me as inadequate. Would this be enough for the very simplest cases, in conjunction with some sort of link to an ONIX record? The middle ground here seems like dangerous ground, as we always want to handle "one more thing…"

Content Documents


All content documents are HTML5, which of course can contain SVG and MathML. I reserve the right to use the XML serialization for any EPUB Zero I produce :)

WHAT IS AN EBOOK?


When you go to a website, you navigate through the content by clicking on links, going from page to page. What makes an e-book different is that the sequence of pages is defined ahead of time, and the reading system helps the reader navigate. Does this mean that a plain web browser won't work as a (packaged) e-book reader, without some sort of extension or scripting?

WHAT NEXT?


I haven't thought about how accessibility would work in this context, or digital rights management (could a digital signature work with this file structure?), or many other things. Maybe the ultimate answer is that we need the complexity of the existing specs. But I'd prefer to be convinced that this is too simple, rather than assume that what we have is just right.

Dave (writing, of course, as a private citizen)

20 comments:

  1. Interesting concept. I'm a big fan of the KISS ideology. I think one of the reasons the current EPUB standard is so sprawling is to keep the new players in line. The industry is beginning to accept the idea that EPUBs are just websites. Maybe with that acceptance there will be a return to existing web standards, rather than trying to reinvent the wheel.

    ReplyDelete
  2. with this model looks very promising. There could be some amount of implied packege-ness if there is content document level metadata placed in the head. (This CD is a component of EPUB package with pointers to package identifier and CD identifiers). This allows each content doc to both stand alone and be part of package.

    ReplyDelete
  3. Very good idea and something that I have been experimenting with myself.
    I think the best approach is a classic graceful degradation - if you open ePUB Zero in a browser it should just work.
    If you open it in a browser with Javascript support, perhaps all the reader functionality can be overlaid by rearranging the DOM a little bit?

    ReplyDelete
  4. dave, this is all you should need for moby dick:
    > http://www.gutenberg.org/cache/epub/2701/pg2701.txt

    -bowerbird

    ReplyDelete
  5. I was also attending the IDPF/W3C workshop and I'm glad that you created this blog to discuss things.

    Daniel raised many interesting points, but most of them are due to a single thing: compatibility. We kept many things in EPUB 3.0 for compatibility reasons (too many to list), had to settle for less than ideal solutions for other things (reference to WD for many specs, support for JS) and took some bad decisions too (metadata).

    Navigation/Package

    You mention that the navigation file provides both the navigation (NCX in EPUB 2, navigation document in EPUB 3) and the order of the documents (spine in OPF in both EPUB 2 & 3).

    I'm perfectly fine with that but it's not clear if you mean that both the navigation and the order should be available in the <nav> element or if you consider that the navigation document is good enough for both.

    While I agree that NCX and guide should be deprecated (which they are in EPUB 3) and I doubt the real usefulness of the manifest, I still see a clear benefit in separating the spine from the table of contents.

    A spine defines the order of the documents while navigation provides a set of links to various resources, including specific fragments of a document.

    Instead of limiting both the spine and navigation to HTML5 documents, I believe that we should authorize HTML5 + all core media types for images/audio/video to appear in both elements.

    That said, we don't need a separate OPF element just for the spine, a separate list in the same navigation document would work fine.

    Metadata

    I'm not a fan of your proposal for metadata, repeating the meta element all the time is every bit as ugly as what we have in EPUB 3.
    I liked how things worked in EPUB 2.0 better: a parent metadata element where all the child elements are the metadata.

    To avoid the messiness of an ID/IDref system, we'd use additional child elements to refine things.

    As for external resources, we should rely on the link element with the proper mediatypes and link relationships.

    The current suggested list of rel values in the EPUB 3.0 specification reflects a poor understanding of how links work.
    "marc21xml-record", "mods-record", "onix-record" and "xmp-record" should all be replaced by a generic "record" relationship along with the proper media type (which would make it extensible to anything else that can be identified with a media type, such as OPDS for example).

    The EPUB specification should also mention the link registry that was officially created with RFC5988, and use URLs as the proper extension mechanism.

    Finally, and this might be a little controversial, I believe that any metadata that we include in the package (which would be the navigation document too in EPUB Zero), can also be used the same way in any content document to define document-level metadata.

    What happens when you open such a file

    Displaying the navigation document when you first open such a file sounds like a bad idea: that's not what the user would expect to see.

    Opening the first document sounds like a much better idea. That said, one of the thing that I really hate right now in EPUB 2.0/3.0 is that many publishers use the cover as a non-linear element that you never see when you open a book, or can even reach using navigation.
    For some books (comics for example), the cover is every bit as part of the experience as the rest of the book. I'd like to see a mechanism that at least let the user decide if they'd like to see the cover or not when they open a book.

    ReplyDelete
    Replies
    1. Hadrien, thanks so much for the thoughtful reply. I understand many decisions in EPUB3 were made based on compatibility with earlier versions. EPUB Zero is a little thought experiment: would it be possible to throw away everything except the content documents? The answer is probably no, but it's fun to think about.

      I'll write a more detailed reply later. If the primary goal of EPUB Zero was to make things as simple as possible, a secondary goal was to see if we could avoid non-web vocabularies. Given the need for both a spine and nav, could these things be expressed in HTML5? Could we express needed metadata in HTML5?

      By the way, I fear the battle to deprecate the NCX is lost, for now. Too many retailers are demanding we keep it even in nominal EPUB3 files.

      Dave

      Delete
  6. Dave, your thought experiment sounds a lot like existing alternatives HPub and Zhook. These haven't particularly taken off which is to me a relevant datapoint.

    Of course the main enemy of simplicity is backwards compatibility. But we don't need a new format for this, we just need to agree we don't need the benefits of that backwards compatibility and then stop using the parts that are only there for that backwards compatibility, and et voila, we have a simpler EPUB. For example if you want to avoid non-web vocabularies then with EPUB 3 we can use just the nav document which is already HTML5 and forget about NCX. Ditto Guides. We don't have to step back and ask "could these things be expressed in HTML5?" - we know they can, the work to figure it out is already done. We could make some of the redundant manifest data structures optional... no need for manifest and spine and nav if all the information is in the nav and nothing extra is provide by the rest. This was even discussed in EPUB 3 WG but again backwards compatibility led to a decision not to go there.

    The big one is allowing HTML, i.e. not requiring XHTML. Since EPUB 3 reading systems will be built on browser engines there is no reason in principle we couldn't relax this constraint and immediately achieve far more alignment with the rest of the Open Web Platform. But it would have dire consequences for toolchains that want to manipulate EPUB files and since much of publishing industry uses XML heavily - and we need most of all more tools for which reliable formats are good not bad - this one is tricky for me. But this is not about simplicity - it is more complicated to have two serializations of content, HTML and XHTML, not simpler. I can't believe for example that Daniel Glazman would think that his job of writing Blue Griffon would have been easier if the content wasn't XHTML..

    ReplyDelete
  7. Dave, the thought experiment is great and worth pushing along. The full ePub3 spec makes it potentially difficult and expensive to produce content and the minimal core is hard to find. The e-retailer reading systems are all so different and incomplete they take hours of laborious testing. Daniel's objections to ePub3 are pretty spot on. I think he stole my list! Rebooting the format is needed.

    So the concept of ePub0 is core packaging corrections and the biggest challenge seems to be start-up and nav.

    The index.html page is the opening page and the nav item may be hidden. That allows the index page to have a cover image, title page text, promo or anything else. The reading system must read the nav item to know where to go to next. The index nav is a variable-spine and TOC.

    With this approach the index-nav list could contain the full section sequence for linear material or just have just one link to a start page where books use extensive internal linking. We use this heavily with text books. The spine is a big list of linear no's and the TOC links only to units.

    An ePub is the wrong place to manipulate content and that should never be a reason for making the format more complex. However is there any reason why ePub Zero can't optionally be XHTML5 or does that remove the concept zero?

    I would like to take a shot at creating an ePub0 for more navigationally complex material. It sounds like a dream come true for sophisticated presentation material that can move seamlessly from online to a secure package.

    Not trying to force things but the specification at present is something like:

    1. An ePub zero must be zip package of HTML5/XHTML5 with no errors (pass HTML5 validation tests)
    2. The opening page must be named index.html
    3. The index.html page will be displayed and must have a nav element which may be hidden
    4. The index.html page can be listed as an item in the nav element. It can appear in any position in the nav.
    5. The nav list defines the next-previous default navigation as well as the reading system presented TOC.
    5. Document metadata is the meta statements in the index file as dc terms (can we say there must at least be a title, identifier and date).
    6. The loading rule for the reading system is simply: If the epub zip package contains an index.html, open the index file, read the nav list. Wait for the user.

    I know this has started as a thought thing but this would be very straight-forward to incorporate into AZARDI as we already have an HTML5 packaging and reading mode. It just means the reading system has to look for *.opf or index.*

    Once the spec list is finalized we will include it in AZARDI Desktop. If you can get us the final rule set we can have it ready in AZARDI 19 Desktop due out in around four weeks.

    ReplyDelete
    Replies
    1. OK, I have a few questions then Richard:

      2. The opening page must be named index.html
      3. The index.html page will be displayed and must have a nav element which may be hidden
      4. The index.html page can be listed as an item in the nav element. It can appear in any position in the nav.

      With these rules, it seems that you're in favor of an index.html that serves as the first "content" document, rather than a pure "navigation" document, since you not only want to display the document but also reference it in the nav element.

      I'm more in favor of a pure replacement of OPF+navigation by a single navigation file that must be named index.html.
      The purpose of this file would still be to package the resources together into a publication (publication metadata, reading order of the documents and navigation), but it wouldn't display any kind of content (cover, introduction etc.)

      5. The nav list defines the next-previous default navigation as well as the reading system presented TOC.

      By "next-previous default navigation" you mean navigating through linear items in the spine right ?
      If that's the case, then I agree with your point.

      Delete
    2. Hi Hadrian, On the matter of the first page opening and displaying something the first time the document opens I was trying to follow Dave's idea that it is pure web.

      The ePub'aholic in me says use index.html as a poor man's OPF, but the objective of ePub Zero seems to be that this could be put up as a web site and still behave well, or loaded into a neutral Web App and do just do its thing.

      I talked to Deepak who does leads the AZARDI development (and just about everything else here) and he didn't think there would be much problem with using it as a combo nav and display page. I thought the real issue would be the reading system opening the second time at the last reading position, but he said that would not be a problem as at loading everything is processed in the package. Looking up the pages in the zip package, and even getting metadata from them is not a big overhead (for the desktop anyway). For example he does this with fixed layout AZARDI, processing all the html files to create the layout plan for AZARDI's ability to display asymmetrical page sizes and orientations.

      Another choice is to use all the html pages in the nav and use HTML5 hidden attribute to equate to linear=no. That means the reading system just uses the text in the nav component for the creation of the reader visible toc (without the hiddens) but still has full knowledge of the section extent. This has positive implications for accessibility as well.

      Anyway it is great to see the various ideas tossed around and shared. It does seem that a really good reading system, more flexible and powerful than ePub3 will emerge.

      As soon as a spec is substantially set we are ready to program. With simple/complex ideas like this having a testing environment really helps.

      Delete
  8. So it seems like this could be even simpler, and I'm wondering if it'd be practical.

    Why not have the entire book (except non-linear documents) be in the index.html? No need for a spine element then. Seems like it'd be simpler, easier to author, and more like the web. Would this have too much of a negative impact on performance?

    ReplyDelete
    Replies
    1. I think that the whole linear/non-linear model for the spine sucks, but that we still need a spine.

      Take Richard's example for a textbook: lots of documents with only a bunch of them as linear documents.

      Instead of referencing all these documents in the spine as non-linear items, only the linear documents should be referenced.
      All the other documents would be accessed either from these browsable documents, or through a table of contents.

      Of course, for something like that, we'd have to allow links to non-spine items.
      As long as we can link to items that are not in the spine, then we don't really need the concept of linearity in the spine.

      Delete
    2. I think we are in danger of conflating four distinct abstractions here:

      1. Separation of a long-form publication into more than one top-level document, primarily for implementation efficiency but also to facilitate assembly/reassembly from parts. "Moby Dick" could maybe be represented as one HTML document but that doesn't mean every publication could or should be.

      2. Linear reading order of a publication, needed if it is in multiple parts (fundamental purpose of a spine)

      3. Enumeration of resources in a publication (combination of spine and manifest, IMO a bit more complex than necessary in EPUB by how the packaging of OpenDocument ODF was grafted underneath OEBPS)

      4. Table of contents of a publication. We have Nav Document so I think we are good to go here.

      While in many simple publications some of these may have identical information and may thus seem redundant that is not always the case. For example the TOC may omit some things that are present in start-to-end reading order.

      To me the main opportunity for simplification in this area with EPUB 3.x/4 (what I see as the target of "EPUB Zero" thinking) is around #3. But the specific way to go should IMO depend on what the new W3C System Application WG does for packaged native apps built from Web Standards, as if we were going to make a big change it would likely be a good idea to align/converge with that effort, assuming it is successful (so that packaged apps and packaged documents have the same packaging). So far for example it looks to be headed towards a JSON manifest as with Chrome Packaged Apps. But it seems premature for us to go too far towards designing something that would not necessarily end up aligned with that work.

      A secondary opportunity would be simplifying the data structure for spine. Some formats have "magic names" where the spine comes from names of pieces like "xxx_01.html", "xxx_02.html" but this is poor architecture and brittle. I think if we reimagined packaging and then set ourselves to define the minimal way to represent a spine data structure we would get something cleaner. But as per above it may be premature if we want apps and docs to share packaging (apps don't need spine - one thing that distinguishes an app from a document is that only the latter has a reading order or really any deterministic means to assemble/reassemble its constituent parts).

      Secondarily we have an opportunity to simplify authoring in the cases where things are simple by enabling inferring information. For example arguably the ZIP file itself can stand in for the manifest. Having it (which again was first done for OpenDocument ODF first, then applied to EPUB) is helpful to avoid accidental accretion of "Junk DNA" because ZIP is analogous to a file system folder. But it could be the rule that if manifest is not present then the ZIP contents = the manifest. Similarly it could be the rule that if spine is not present then Nav Doc (TOC) defines the spine (as per your EPUB Zero straw man). Similarly you could have the rule that if only an index.html is present then that is the only content item. I consider such simplifications secondary because given a need to still enable the 4 logically distinct abstractions above to be handled, SW that needs to manipulate or render EPUB files would need to have more code, not less, to recognize and handle these special cases. And, there is the potential for mistakes and errors. But simplifying authoring could be sufficient motivation to incur this extra cost.

      Delete
    3. Bill, I don't really think we need to have these 4 different abstractions, 2 are good enough.

      1 and 2 can be the same thing: a spine is a list of documents that a user will browse through if they just hit the "next page" button from start to end.

      The difference with EPUB 2/3 would be the removal of non-linear items, that are only necessary because we enforce restrictions on spine items (we're having the same discussion on the EPUB WG mailing list as you know).

      I would also recommend making other media (images, video and audio files) first-class citizens that can be referenced in such a spine.

      I'm against declaring that the spine is unnecessary as long as you have a navigation document: they serve a different purpose.

      I believe that #3 (which is essentially the manifest) is unnecessary, but maybe reading system developers would argue otherwise.

      As for #4 we do have the navigation document, but as you know, this might get more complex soon with things such as panel by panel navigation.

      Now the real question is: do we need a separate file and format (OPF) for just metadata and spine ?
      Probably not, all of this could fit into a single file (HTML5 ?) where we'd also have the navigation available.

      Delete
    4. Okay, let me be clear on what I'm suggesting.

      Instead of having the spine as a "list of documents that a user will browse through if they just hit the 'next page' button from start to end", move the contents of all those documents into a single one (the index.html). So, just like the web, when you scroll down/hit next page you see more of the document until you reach the end. In combination with removing the need for non-linear items in the spine, this would remove the need for the spine entirely. (Right?)

      As far as the reasons for not doing that, I'm not really sure what benefit ePub is getting from easy assembly/reassembly from parts - could you elaborate on that?

      I'm pretty sure anything that could be done in the current system could also be done with this change - it's a fairly simple transformation, and I'm not sure what would prevent the information contained by having a break between documents from being equally well represented by XHTML.

      I think it makes the format simpler, makes in easier to author, and makes it more like the web, so it seems to be exactly in line with the goals of EPUB Zero, at least.

      Delete
    5. Hi Peter,

      In the old days, when dinosaurs roamed the earth, we did put all the content of our ebooks in a single file. But that caused problems for reading systems, which couldn't digest a 5MB html file all at once.

      Today, I think that having a single content file would be very limiting. In some cases, each component of a book may have different metadata, a different design, and maybe even different geometry. For example, some books are a mixture of fixed-layout and reflowable sections. This would also remove the ability to have images or SVG as a primary content document.

      The Zhook format (http://ochook.org/) (thanks, Bill, for letting me know about this) does use a single content file.

      Dave



      Delete
    6. I think the idea of having an epub contained in a single html is a nice idea, and Bills suggestion on how this could be handled as an alternative to manifest and zip-as-manifest, makes sense. In a single html, images and svg would pretty much behave as it does in a browser today. If images and svg-files are linked from the content flow rather from some special filelist, that is a nice thing -- and they could still be tagged as 'primary' according to the schema in question. Metadata can be added to the 'primary' parts of the html using meta+itemprop (I think), and I can't see why this excludes mixtures of fxl and flow (and scroll), or different designs on parts of the document? Even if it does, some added constraints for a single-file-epub is acceptable.

      There is a case for constructing the epub out of individual parts, but that is (or at least should be) a question of asset handling. The main thing is that it should be optional if the epub is aggregated by the production system, or by the reader system at runtime.

      I also like the idea that epub zero should be aligned with the kind of manifests used with apps. An enhanced epub should declare upfront what sorts of apis it needs, or if it needs to access external uris, etc. Also, like chrome, we could distinguish between hosted epubs and packaged epubs, to support a book in browser scenario.

      Delete
  9. dave said:
    > But that caused problems for reading systems,
    > which couldn't digest a 5MB html file all at once.

    actually, the limit is 300k. not quite the same thing.
    plus i doubt you actually have many 5mb .html files.

    but it's plain stupid to let inferior coders hold us back.


    > Today, I think that having a single content file
    > would be very limiting.

    you'll need to justify that statement better than you have.

    -bowerbird

    ReplyDelete
  10. finally, i'm found this tutorial. great !


    visit my site http://sejutabuku.com

    ReplyDelete
  11. Great blog nice n useful information , it is very helpful for me.



    ePub3 Services in UK

    ReplyDelete