Thursday, July 25, 2013

Straw-Man Spec

Consider this a straw-man outline of the spec, as I envision it.

What’s inside?

  1. An e0 file has file extension .e0
  2. An e0 is a zipped folder which contains an index.html file
  3. index.html is an HTML5 content file, and shall contain at least one nav element, unless index.html is the only HTML5 content file.
  4. By default, the first nav element in index.html serves as both primary navigation (hereafter referred to as "toc") and as an indication of the reading order (hereafter referred to as "spine", in keeping with EPUB terminology). If an li element in the first nav in index.html is not otherwise marked, the referenced file should appear in both the reading system's primary navigation interface, and be part of the linear reading order.
  5. If a file should be omitted from toc, but remain in the spine, use the hidden attribute on the list item. Among other things, this means that, in most cases, a web browser (that knows nothing of e0) opening up index.html will do the right thing.
  6. If a file should be omitted from the spine, but remain in the primary toc, add role="toc" to the list item. We’re hoping this is quite rare.
  7. If a file should be omitted from both the spine and toc, it should not appear in nav in index.html. There’s no obligation for all content documents to appear here.
  8. An e0 reading system should not open index.html directly, but display the first content file referenced in the first nav in index.html (unless there's no nav in index.html). Ideally this first content file would be the book cover.
  9. Document metadata lives in index.html. Section metadata can live in the individual content files. The metadata vocabulary is being discussed; for now use meta in the html head.
  10. If there is a cover image, it should be referenced in index.html via link rel="cover"
  11. Either the HTML or the XHTML serialization of HTML5 is allowed.
  12. MathML and SVG in HTML5 documents are allowed.

What’s not?

  1. There’s no mimetype file, META-INF folder, or container.xml file. Rather than using container.xml to point to the “root,” the reading system just needs to find index.html.
  2. There’s no special method of zipping
  3. There’s no package file.  index.html serves many of the same purposes, but tries to avoid duplication and non-HTML vocabularies. 
  4. There’s no manifest. The computer can figure out what files are in the zip, and can probably figure out what kinds of files they are.
  5. e0 does not have a CSS profile. 
  6. Landmarks and guides are omitted. We already know where the toc and cover are. If you want the reader to begin reading at a specific place, put that place at the beginning of the book!

Principles

  1. Simplicity
  2. Avoid duplication
  3. Use HTML vocabularies wherever possible
  4. Make life easier for the content creator, even if that makes life harder for the reading system developer

4 comments:

  1. That looks like a good starting position. We will modify the test cases and the AZARDI test implementation accordingly.

    Notes on the first test cases vs. the spec.

    3. We haven't got a test case with a lonely index.html. We will prep that so reading systems can exception handle that.

    4. OK.

    5. Good approach. We will update the test cases accordingly.

    6. OK. That is currently implemented.

    7. This is the major change. We cheated with the first test case and processed E0 spine and TOC to the ePub3 nav structures in AZARDI. These will now become their own thing.

    8. That is how we handled the first test implementation.

    9. The metadata should be relatively easy to update once it has been decided.

    10. OK. That is a small change.

    11. We are using both but not with fully controlled test cases. We will update accordingly.

    12. Linked with 11 we will get the test cases done accordingly.

    So we will kick into E0 test cases and test implementation round 2.

    ReplyDelete
  2. I'd like your pros and cons regarding the @data-type. O'Reilly Media has settled on @data-type to make HTMLBook based on HTML5, which I think is a very interesting approach. http://www.balisage.net/Proceedings/vol10/html/Kleinfeld01/BalisageVol10-Kleinfeld01.html Why use @data-e0-type instead of @data-type?

    ReplyDelete
  3. Some comments/questions:

    1. The Moby Dick e0 sample contains an index.html file in a sub-folder in the zip archive (MobyDick-e0/index.html). If the search for index.html allows it to be in sub-folders, what if there are 2 or more index.html files (e.g. A/index.html and index.html)?

    2. If the index.html search is for the first index.html file, what if an inner index.html file occurs before a top-level index.html (e.g. Part1/index.html, Part2/index.html, index.html).

    3. In the index.html, the ToC navigation in the sample files is marked as hidden="hidden". This means that:

    3a. the use case of extracting and placing on a web server does not work as the index.html page shows the cover (if specified) and not the ToC, so the user cannot navigate to the sub-pages;

    3b. the e0 document reader needs to add this hidden ToC to a navigation/ToC pane, instead of just relying on HTML rendering (complicates the it just works in a browser use case);

    3c. authors need to provide two ToC documents -- one with the navigation in the index.html page and another in the place where the books ToC is placed. This is why ePub3 has moved to using a HTML5-based navigation document and linking it in the OPS spine.

    4. The spec mentions meta/link metadata, but what about the RDF and microdata HTML specs? These allow the metadata to be annotated inside the HTML body and avoid duplication. NOTE: I agree that the book metadata should be in the main root file (index.html).

    ReplyDelete
  4. dave-

    here's an alternate formulation -- for my "ezub" format -- a meat-and-potatoes man to knock out your straw-man.

    1. an .ezub file has file extension .ez, _when_ it is zipped. however, it could just be a simple folder of files.

    2. an .ezub is a folder with one (and only one!) .html file, named anything you like, with an .html extension.

    3. anything.html (or, if it's what you like, index.html) is an html5 content file that contains the entire book.

    4 no "nav" element is required, or desired, since most people don't even know what that is. (including me.)

    5. any file in the folder is a part of the book, and will be offered for display, even if is _not_ in anything.html.

    6. use headers h1-h6, which are assembled into a table of contents, if one's not included, (but it should be.)

    7. you don't need to know anything about a "spine" or "nav order" or "role=" or "link rel=", or any of that crap.

    8. start with a title-page, including the cover image. follow it with a linked table of contents to major sections.

    9. put "metadata" in a section headered "metadata" or "metadata for this book", or something similarly clear.

    10. forget that xhtml crap. use html5. anything html5 is ok. 10 points is more than enough for us to put here.



    1. no mimetype file, meta-inf folder, or container.xml file. a "reading system" just needs to find the html file.

    2. no special method of zipping, since there's no reason to introduce any possible point of failure, is there?



    1. _real_ simplicity. not the artificial fake-cheese kind. because the false simplicity tastes just like shit.

    2. avoid duplication. unless someone wants a belt _and_ suspenders. then i guess that is up to them.

    3. use html5 all the time. until something better comes along. because there's always something better.

    4. make life easier for everyone. except the greedy middlemen who are parasites living off complexity.

    ReplyDelete