Sunday, February 24, 2013

index.html and "One Big File"

The heart of our format (which I'll call e0, after Richard Pipe) is index.html. The book opens to index.html, and the reading system shows us what's there (unless it's marked hidden, as HTML5 allows us to do).

So it's entirely possible to create an .e0 file that consists only of index.html in the zip. However, I think that requiring all content to be in index.html is too restrictive. Partly I'm just lazy; having split hundreds of single-file e-books into chunks in 2007–2008, I don't want to do the reverse. But some books are not all the same thing.

I think the clearest example is with metadata. If we put book metadata into meta on the index.html file, we can put chapter-specific metadata in meta in a chapter file. EPUB can't do that! With a short story collection, the editor can be cited in index.html, and each author in the component HTML file containing their work.

Styles may differ between sections, and books may even alternate between reflowable and fixed layouts.  Portions of books may be mixed together; I'd rather drag the new file into a folder and make a single edit to my nav file (in index.html) than paste a bunch of new content into a giant file. And heaven forbid if you muck up the nesting of divs in a forty-thousand-line file!

So make One Big File if you like. But maybe someone else's books may benefit from having more than one.






8 comments:

  1. I agree with you Dave, I think we need to find the right balance.

    Multiple files means many things:
    - easier for the RS to render
    - mix of reflowable and fixed layout content documents
    - potentially mix of media if we lift the restrictions for media in spine (images, audio and video)
    - non-linear navigation (for which I don't think we need non-linear items in the spine)
    - ability to reference some semantics right from the spine and/or navigation
    - content level metadata
    - different styles on the @page model or other elements

    I also strongly believe that constraints are part of a good design (one good example: the six constraints of the REST architectural style) and that even if we aim for something simple yet powerful with this exercise, defining EPUB Zero too loosely would produce many undesired side effects (there's a good list of them in Bill's comment in the previous post).

    ReplyDelete
  2. you wasted your time splitting all those e-books
    because of the requirement requiring an e-book
    be split up so brain-dead viewer-programs could
    process them, like chopping up a piece of steak so
    your 92-year-old grandpa will be able to gum 'em.

    and i don't buy your arguments, or hadrien's either
    -- and in some cases i think they're _bad_ideas_ --
    but if you're set on e-books with multiple .html files,
    you can use the "prev/next rel" thing for navigation.

    (plus you should be including real navigation anyway;
    that is, _actual_links_ to the previous/next chapters.)

    -bowerbird

    ReplyDelete
    Replies
    1. I think using rel="next" would be harder than defining a reading order.

      Delete
  3. dave said:
    > I think using rel="next" would be
    > harder than defining a reading order.

    since you are typing everything by hand,
    well then, yes, it might well be, dave. ;+)

    i make tools to take care of the drudgery,
    so that would be a piece of cake for me...

    you're gonna give people an authoring-tool,
    right? that's how you make stuff _simple_...

    -bowerbird

    ReplyDelete
  4. Using links in each file can be an alternative, but isn't it better to know without parsing all of our files what the structure of the publication will be ?

    In a dynamic world (the Web) I would argue that link@rel="next" is better, since content can change all the time and this is more flexible than checking and changing a centralized file all the time.
    In a packaged document (EPUB) expressing things at a package seems to be a more reasonable choice since each individual document won't evolve on its own, the whole publication will.

    I wouldn't completely rule it out as an option (although I slightly prefer having everything defined in the index.html), but reading system developers can probably provide more insights.

    ReplyDelete
  5. I think we need to distinguish between modularity for production purposes, and modularity for consumption. Why not use link@rel="next" to establish relations between sub-publications in the package, where index.html is the first? Then each sub-pub will handle its own reading order according to principles of single-file-epubs. In complex publications, like textbooks, cramming everything into a "single book" is not really the future of publishing anyway, so I think it would be good idea for e0 to establish mechanisms for modularity in some way.

    ReplyDelete
  6. hadrien said:
    > but reading system developers
    > can probably provide more insights.

    i put this out of of order because i _am_
    just such a "reading system developer".

    and the things i am telling you are things
    that i know will work, because i've _made_
    them work, time and time again, so the
    f.u.d. you emit, i can see through easily.

    and indeed, i am about to gear up with
    a large campaign to roll out my work, so
    you'll soon see the proof of the pudding.

    which is why i really don't care whether
    you choose to listen to me or ignore me.


    > Using links in each file can be an alternative, but
    > isn't it better to know without parsing all of our files
    > what the structure of the publication will be?

    it's no big deal to process "all of our files"
    to determine the order of the files, since
    we'd just have to read the first 2k of each.

    but if it would make _you_ feel better to
    have an explicit list, you could make one.

    in the interest of simplicity, however, you
    should not _require_ it from everyone else.

    you could also have your packaging tool
    create this file-order-list at compile time.


    > In a dynamic world (the Web) I would
    > argue that link@rel="next" is better,
    > since content can change all the time
    > and this is more flexible than checking
    > and changing a centralized file all the time.

    any workflow that relies on "checking and
    changing a centralized file" -- even once,
    let alone "all the time" -- is _badly_ flawed.

    you guys really need to leave the "by hand"
    mentality back in the 20th (12th?) century.


    > In a packaged document (EPUB)
    > expressing things at a package
    > seems to be a more reasonable choice
    > since each individual document
    > won't evolve on its own,
    > the whole publication will.

    just can't shake the multi-file mentality,
    can you? meanwhile, project gutenberg
    has always been able to make its e-books
    using a single .html file. it's not that hard.

    ***

    kjartan said:
    > cramming everything into a "single book"
    > is not really the future of publishing anyway

    the computer is extremely adept at splitting.
    and joining. this is much ado about nothing.

    -bowerbird

    ReplyDelete
  7. So the reasons given for not having one big file seem unrelated to the goal given in the first post, of being as simple as possible and as close to the web as possible.

    Adding metadata to parts of the book for short stories is nice; I'm not sure it's worth the extra complexity. (And EPUB3 can do better than that - it can attach metadata to any element. So if you used that method you could accomplish the same thing with one big file.)

    For short stories, I'd much rather see a system where you can read them as a book, but can also split them up into individual EPUBs, and store them in separate folders or delete them individually. That'd be even more complex, but it should be modular enough it doesn't complicate books that don't use it.

    Fixed layout can be redesigned; I don't think it should dictate the structure of an EPUB.

    What I'd really like to see is two separate, related specs - EPUB for Authors and EPUB for Devices, with free, open-source software to convert from the author format to the device format.

    ReplyDelete