Athena: What an ofline web reading experience may look like
With the latest set of web technologies coming down the W3C/WHATWG pipeline it is now possible to create top-of-the-line responsive experiences that can also work as ofline applications.
HTML5 web is more than capable of competing with native applications. Chrome and Windows apps have shown as much capability as native apps, if we let them. What needs to happen now is the developer shift to thinking about the web in terms of application logic rather than the rules we want the web to play by.
Athena is a proof of concept for such an application. It uses ServiceWorkers for caching application resources, it uses Polymer and a suite of custom web components to handle layout and application structures.
This article discusses the rationale for Athena and how it has been implemented. It represents my ideas and opinions but it is not prescriptive; rather it embraces the Perl moto: There's more than one way to do it (TMTOWTD). The only required part of an Athena publication is the service worker... the UI and content display is up to you.
Browser support considerations
Whatever way we choose to creatae and show the content must:
- The solution must support current versions of IE, Opera, Firefox and Chrome. It should also work with the 2 prior versions of each browser
- Must provide keyboard and touch alternatives for mouse navigation
- If the content scrolls beyond the visible area on screen there mush be an icon, or another indicator, to show the text overset (maybe something like what Adobe InDesign does with overset text frames)
The work dones at W3C, WHATWG and ECMA TC-39 coupled with browser vendor's adoption have made the browser a better development environment. Libraries like jQuery were initially created to speed css interactions and to smooth the differences in CSS rendering and Javascript support among browsers. Because of this standardization and the requirements we set up above we can drop older version of browsers to concentrate in our application, not the workarounds.
This flies in the face of people telling you that we should go back as far as possible in supporting users. Not all computers can upgrade browsers or even operating systems. In most instances I would agree but if we are trying to push the envelope then we should use the best available technology without consideration for older versions that limit functionality (I'm looking at you Internet Explorer 8)
Athena's technology stack also makes it hard to polyfill for older browsers. ShadowDOM and ServiceWorkers have limited (or non existant) polyfill support and that makes them work only with modern, evergreen browsers.
The reference implementation uses the following technologies, listed with the browser support information for each
Technology | Support information |
---|---|
Polymer | Polymer browser support |
ServiceWorker | Is ServiceWorker Ready? |
Other technologies have a different support requirements that are outside the scope of this article.
Remember: You get what you paid for
Because the specifications used in this project (web component specifications and ServiceWorkers) are not finalized, developers can (and should) expect changes... that's the price we pay for working with the newest stuff. But it allso allows us to tighten the feedback loop to the spec writers, tell them what works, what doesn't work and what we'd like to see going forward.
The extensible web manifesto speaks more of this way spec writers and application developers should interact with each other.
Inspiration
Bibliotype and and a related article with code available in Github
Webstock '13: Craig Mod - Subcompact Publishing.
Hi combines elements of twitter and the open web. When you first start you are required to enter a 20 word snippet of text and to allow the site to capture your location (it adds weather data to the location for some random reason.) This is called a moment.
You are then allowed to create longer form content related to the moment you initially created. Other users in the application can ask you to do expand on the moment; whether you do so or not is your decision.
Flipboard is a windows and mobile application that collects, curates and delivers long(er) form content.
In A next-generation digital book Mike Matas presents ideas and concepts for a digital book or book-like application. These are fully interactive books that take advantage of multimedia and advanced mobile device features to make reading a more engaging experience. None of the things shown in the video is impossible using web technologies, why haven't we done so already?
Sarah Groff-Palermo cares a lot about putting data art on the web. Books should be as much as art as technological endeavors. Her ForwardJS presentation mixes art and code in one interesting product.
Craig Mod's essays:
- Books in the age of the iPad
- Embracing the digital book
- The collaborative book
- A Simpler Page
- Post-Artifact Books and Publishing
- The shape of our future book
- Platforming books
- Subcompact Publishing
- Hi: Narrative Mapping the World
Hosting, technology and components
What are the parts of Athena? What are they used for?
Hosted on Github
Athena publications are initially hosted on Github Pages for the following reasons:
- Whenever you create a gh-pages website you automatically get SSL. Serviceworkers will only install and work on SSL enabled websites
- Because the website is just another branch of the repository we can set it up so that edits are pushed directly to the production publication
- You can still assign your own domain name to the website or choose to keep the github.io domain name
- The basic Github functionality is free for public repositories. If you want private repositories then there's a cost
Service Worker
The core of an Athena publication is a scoped service worker that will initially handle the caching of the publication's content. We take advantage of the multiple cache capabilitity available with service workers to create caches for individual unitts of content (like magazine issues) and to expire them within a certain time period (by removing and deleting the cache).
For publications needing to pull data from specific URLs we can special case the requests based on different pieces of the URL allowing to create different caches based on edition (assuming each edition is stored in its own directory) or resource type.
Serviceworkers have another benefit not directly related with offline connections. They will give all access to our content a speed boost by eliminating the network roundtrip. If the content is in the cache, the speed is only limited by the Hard Drive's speed to see how long it takes to retrieve the content from the cache.
This is what the serviceworker code may look like:
// This is one best practice that can be followed in general to keep track of
// multiple caches used by a given service worker, and keep them all versioned.
// It maps a shorthand identifier for a cache to a specific,
// versioned cache name.
// Note that since global state is discarded in between service worker restarts,
// these variables will be reinitialized each time the service worker handles an
// event, and you should not attempt to change their values inside an event
// handler. (Treat them as constants.)
// If at any point you want to force pages that use this service worker to
// start using a fresh cache, then increment the CACHE_VERSION value. It will
// kick off the service worker update flow and the old cache(s) will be purged
// as part of the activate event handler when the updated service worker is
// activated.
var CONTENT_CACHE_VERSION = 1;
var CURRENT_CACHES = {
'content': 'athena-concent-v' + CONTENT_CACHE_VERSION
// We can also create caches for each individual chapter or
// unit of content.
// We can also cache images and media separately and expire them
// in a different schedule
};
self.addEventListener('activate', function(event) {
// Delete all caches that aren't named in CURRENT_CACHES.
// While there is only one cache in this example, the same logic will
// handle the case where there are multiple versioned caches.
var urlsToPrefetch = [
'./content/pre_fetched.txt',
'./content/pre_fetched.html',
// We can also fetch remote content for our cache(s)
'https://www.chromium.org/_/rsrc/1302286216006/config/customLogo.gif'
];
// All of these logging statements should be visible via the "Inspect"
// interface for the relevant SW accessed via chrome://serviceworker-internals
console.log('Handling install event. Resources to pre-fetch:', urlsToPrefetch);
var expectedCacheNames = Object.keys(CURRENT_CACHES).map(function(key) {
return CURRENT_CACHES[key];
});
event.waitUntil(
caches.keys().then(function(cacheNames) {
return Promise.all(
cacheNames.map(function(cacheName) {
if (expectedCacheNames.indexOf(cacheName) == -1) {
// If this cache name isn't present in the array of "expected"
// cache names, then delete it.
console.log('Deleting out of date cache:', cacheName);
return caches.delete(cacheName);
}
})
);
})
);
caches.open(CURRENT_CACHES['prefetch']).then(function (cache) {
cache.addAll(urlsToPrefetch.map(function (urlToPrefetch) {
// It's very important to use {mode: 'no-cors'} if there is any
// chance that the resources being fetched are served off of a server
// that doesn't support CORS
// (http://en.wikipedia.org/wiki/Cross-origin_resource_sharing).
// In this example, www.chromium.org doesn't support CORS, and the
// fetch() would fail if the default mode of 'cors' was used for the
// fetch() request. The drawback of hardcoding {mode: 'no-cors'} is that
// the response from all cross-origin hosts will always be opaque
// (https://slightlyoff.github.io/ServiceWorker/spec/service_worker/
// index.html#cross-origin-resources)
// and it is not possible to determine whether an opaque response
// represents a success or failure
// (https://github.com/whatwg/fetch/issues/14).
return new Request(urlToPrefetch, {
mode: 'no-cors'
});
})).then(function () {
console.log('All resources have been fetched and cached.');
});
}).catch(function (error) {
// This catch() will handle any exceptions from the
// caches.open()/cache.addAll() steps.
console.error('Pre-fetching failed:', error);
}));
});
this.addEventListener('fetch', function (event) {
var requestURL = new URL(event.request.url);
event.respondWith(
caches.match(event.request)
.then(function (response) {
return response || fetch(event.request);
})
.catch(function (error) {
// This catch() will handle exceptions that arise from the match() or
// fetch() operations. Note that a HTTP error response (e.g. 404)
// will NOT trigger an exception. It will return a normal response
// object that has the appropriate error code set.
console.error(' Error in fetch handler:', error);
throw error;
});
)
});
});
);
});
Limitations
As powerful as service workers are they also have some drawbacks. They can only be served through HTTPS (you cannot install a service worker in a non secure server) to prevent man-in-the-middle attacks.
There is limited support for the API (only Chrome Canary and Beta and Firefox Nightly builds will work.) This will change as the API matures and becomes finalized in the WHATWG and/or a recommendation with the W3C.
Even in browsers that support the API the support is not complete. Chrome uses a polyfill for elements of the cache API that it does not support natively. This should be fixed in upcoming versions of Chrome and Chromium (the open source project Chrome is based on.)
We need to be careful with how much data we choose to store in the caches. From what I understand the ammount of storage given to offline applications is divided between all offline storage types: IndexedDB, Session Storage, Web Workers and Service Workers and this amount is not consistent across all browsers. Furthermore I am not aware of any way to increase this total amount or to specifically increase the storage assigned to serviceworkers.
The future
As part of the serviceworker family of specifications we will be able to match native applications with push notification and background content synchronization using open web APIs.
JSON package file
Taking a cue from the package.opf
epub package specification I've come up with a basic JSON definition for a publication package. We picked JSON as our package format because it is easier to write, easier to validate (using tools like jsonlint) and can easily be parsed by all existing browsers (according to caniuse.com).
The other advantage is that we can easily customize our package file to match the needs of our specific publications.
The basic publication.json may look something like this
{
"publicaton": {
"metadata": {
"pub-type": "book",
"Title": "New Adventures of Old Sherlock Holmes",
"pub-info": [
{
"pub-date": "20141130",
"pub-location": "London",
"publisher": "That Press, Ltd"
}
],
"authors": [
{
"firstName": "Sherlock",
"lastName": "Holmes"
}
],
"editors": [
{
"role": "Production Editor",
"firstName": "John",
"lastName": "Watson"
}
]
},
"structure": {
"content": [
{
"title": "Introduction",
"type": "Introduction",
"location": "content/introduction.html"
},
{
"title": "Chapter 1",
"type": "chapter",
"location": "content/chapter1.html"
},
{
"name": "Chapter 2",
"type": "chapter",
"location": "content/chapter2.html"
}
]
}
}
}
I've left the format deliberately vague because I believe this needs many iterations to become the strong format that it needs to be.
UI
The UI is one of the points where I'm struggling. Athena herself doesn't (meaning I don't) really care about what front end platform/Library/flavor of the week you choose for the User Interface. I've chosen three experimental interfaces for introducing Athena: Polymer, Angular and a plain HTML interface using Bootstrap (or maybe Foundation)
One big problem that I need to research is the routing portion of web application and whether I can route external pages through the framework and control where the content is displayed. Another option would be to use some aspects of Polymer and mix them with a plain Bootstrap or Foundation site and eschew the web application side. It's too early in the process to decide.
The Polymer version provides a glimpse of how a Polymer-based application may look like. It also uses athena-document, a custom element that wraps a markdown transformation engine for display on the web. There shouldn't be major problems to do the same thing with LaTeX and other document formats and there's nothing that says we can't use these web components in non Polymer applications.
Content: format and metaphors
In my blog I've written about Paged Media and Generated Content for paged media and about creating a print @media style sheet. They both refer to printed content, either by creating PDF directly (using Paged media) or adapting the web content for printing (using @media rules tailored for print).
Athena doesn't want to be a print platform but a starting point to test whether offline web apps can compete with native platforms and existing digital content standards. That said it should be possible to create paged media style sheets to at least create a good PDF for print and a high quality version for archival storage.
See Book Metaphors Online for a more thorough discussion on this subject.
Format
I've written before about HTML and its roles as the final language for publication. I will only summarize the article I just linked.
HTML is a powerful language full of capabilities and, alongside CSS3 and Javascript, provides the foundation of modern sites and applications.
HTML is not an easy language to author. Depending on the variant of HTML you're writing (XHTML or regular HTML) you have to follow different rules.
The default HTML5 is too permissive; it allows the worst tag soup markup; the same markup that has been allowed by browser vendors in an effort to be competitive. It is nice to authors but makes parsing the content much harder than it needs to be.
XHTML5 syntax (best explained in this HTML5 Doctor article by Bruce Lawson) provides stricter guidelines for authors that may turn some people off from HTML altogether. Sure, attributes must be quotes, all tags must be lowercase and all attributes must be closed, including <img> and <br> tags. The benefit is that the stricter rules make parsing content and developing new technologies around it easier.
Because of these difficulties I present 4 solutions to create content that easily transforms to XHTML5 content. I don't go into too much detail of each solution, just enough to give you an idea of what it is.
- Markdown is a text to (X)HTML conversion tool designed for writers. It refers both to the syntax used in the Markdown text files and the applications used to perform the conversion
- AsciiDoc is a text document format for writing notes, documentation, articles, books, ebooks, slideshows, web pages, man pages and blogs. AsciiDoc files can be translated to many formats including HTML, PDF, EPUB, man page
- HTMLBook is an open, XHTML5-based standard for the authoring and production of both print and digital books. It is currently under development
- Docbook, DITA and TEI are some examples of XML vocabularies that can be converted to HTML.
Athena doesn't really care what you use to create your content as long as you provide well formed HTML5 created with XHTML5 syntax.
Book metaphors online
Does it make sense for Athena to use book metaphors?
Most of these metaphors use jQuery and jQuery plugins/addons
For the simplest of book interfaces we can just use one of the scripts below to build a pagination setup that requires to click on either a page number or in an arrow.
If the script doesn't incorporate it already, we can then build a keyboard navigation interface by creating a small script that matches key pressed to arrows and navigates forward or backward based on the arrows pressed.
- http://flaviusmatis.github.io/simplePagination.js/
- http://cssdeck.com/labs/quick-and-simple-pagination
- http://designshack.net/articles/css/building-a-custom-css3-pagination-user-interface/
Full examples
Turn.js and Bookblock present complete book-like interfaces. They use jQuery and, in the case of Bookblock, additional libraries that have to be cached and may present issues when working with Polymer and other web component libraries
Use cases for Athena publications
These are the three main uses cases I see for Athena publications. The first two are based on short publication looks. The third use case is based on what media and resources will serve the story best. Enhancing existing content lets us choose which part of the Athena toolkit we'll use with the content we're working on... at the very least convert the project into an offline capable application.
Early access content
The Early Access Publications idea is based in existing programs like Manning's MEAP and O'Reilly's Early release programs where the book content is published as soon as it's ready (and sometimes as soon as the author is done writing it.)
We can do multimedia books (see below for more information about how I envision interactive books) and the multimedia work can be done in parallel to the writing or it can all be done in a collborative fashion (Github private repo or similar version control system.)
The advantage of this kind of publication is that it tightens the feedback loop between readers, reviewers, editors and authors. It also allows for collaborative editing: whoever has access to the git repository can make changes and accept changes coming from the community (whether this repository is public or private.)
O'Reilly Media uses Ilia Grigorik's book High Performance Browser Networking as a case study on the benefits of this tighter loop.
Serial Publications (magazines and the like)
Serials are periodical publications. Magazines are the ones that come to mind fitst but they are not the only ones. Shorter content like Atavist books and stories or the longer content available from O'Reilly Atlas with the added advantage of offline access.
This way a book is never really done. We can continue to work on stories and tell new stories as long as we want to and the stories can get that continual polish that makes for a good reading experience. If we need/want to, we can also provide CSS Paged Media Stylesheets that will allow to create a PDF version of the text/images we make available.
Interactive books
When I was thinking about interactive books there were two that came to mind: The first one was the Defiance companion iBook and Al Gore's Our Choice as presented at TED in 2011.
Before all the new CSS, HTML5 and Javascript technologies became mainstream it was very difficult (if not right out impossible) to create create experiences like the ones above.
Now the almost impossible is merely diffcult. The technologies in those books is available as open web APIs at different levels of standardization and you can create equivalent experiences from the Applications that you run in your mobile devices.
Enhancing existing content
The easiest way to start using Athena is to add the offline ServiceWorker to an existing application. This process if fairly simple:
- Create a ServiceWorker script that cached the required files
- Link the service worker to the main page in your application
- Test the offline experience and overall functionality of your project
Copyright Considerations
When working with Athena content we have a fairly open hand as to what resources we fetch and the sources we fetch resources from. What copyright restrictions do we face when accessing and then caching content?
There is nothing that would stop me from doing this when defining the cache content:
var urlsToPrefetch = [
'./content/introduction.html',
'./content/pre_fetched.html',
// We can also fetch remote content for our cache(s)
'http://chimera.labs.oreilly.com/books/1230000000345/ch12.html',
'http://chimera.labs.oreilly.com/books/1230000000345/apa.html'
];
In the links above the content originates from O'Reilly's (Interactive Data Visualization for the Web.) In this case, the content is already available free of charge (and for which I own both the printed and ebook versions) but it illustrates a point: Unless you're serving your content behind authentication a serviceworker can do whatever it wants with it. The cache will not expire untl you install a newer version and the content will remain in the cached as long as the cache lives.
It follows the "with serviceworker comes great responsibility" theme regarding serviceworkers or, as Jake Archibald puts it, "Serviceworker treats you like an adult". The Serviceworker will allow you to do a lot of things but you're responsible for what you do with it.