Protocol-centric Thinking

I am an Internet-protocols guy; my most recent IETF task was re-editing the JSON spec with more attention to interoperability (RFC7159), which gives you a feeling for the nuts-and-bolts-ness of my world-view. The Internet itself is just a set of rules about what bits co-operating programs can send back and forth to each other on the wire.

This page is an attempt to think out loud about AR-Discovery issues in a bits-on-the-wire way. It’s informed by my work on developing a “walkumentary” which runs on Google Glass; a location-based way of attaching stories to the world.

I’m a newcomer to the AR community so my thinking may be a bit out of tune; I confess to liking things that are simplistic and abstraction-free. Also, I’ve observed that those are the kinds of things that tend to succeed on the Internet. I believe passionately in 80/20 points.

JL: There is also a rich history of layering / tunneling protocols, for better (IP over packet radio) and worse (SOAP over HTTP). Protocol Piggybacks is a page to continue developing these concepts in the context of AR delivery, particularly with the goal of enabling HTTP interactions to be carried over links with issues of latency, bandwidth, scalability, and other constraints relative to dynamic AR experience delivery.

Invent a new protocol or re-use an existing one?
I think that to the extent AR discovery re-uses existing protocols, it empowers developers by letting them use existing libraries and tools.

CP comment: Agree, re-use rules, but "extensions" are often a necessary/good step when the "as is" fails.

What are the candidate protocols?
To the extent that AR apps can live with request/response messaging patterns, HTTP is overwhelmingly attractive for reasons of scalability, tooling, and libraries that are already available for every programming language.

CP comment: today most is done with HTTP. the primary handicaps are speed (not fast enough) and overhead (too much for the potentially very small "patches" of info requested and a lot of time the large data assets (3D models, videos) are going to be streamed to the device)

You also get the benefit of a large engineering community developing authentication, authorization, encryption, and privacy technologies which all ship ready to work with HTTP. My instinct is that AR Discovery should be constructed along RESTful principles and implemented with HTTP except in those cases where it can’t.

CP comment: I'll bounce your suggestion with one of the earliest cloud-based AR architects and ask him to give us his considered opinion on this. He's on vacation at the moment but I'll bring him in as soon as possible.

In the case that you need a persistent connection, WebSockets would be the obvious candidate. However, note that this is relatively fragile, exhibits poor scaling behavior and negates the rich caching infrastructure that has grown around HTTP.

Subsequent discussion assumes a RESTful protocol implemented with HTTP.

What needs to be in a request?
At a bare minimum, a location: latitude/longitude/altitude. You could probably hit an 80/20 point with just this and nothing else. Other things that would be helpful include:
 * radius of interest, i.e. “don’t show me anything more than 20m away”. [but watch out; in many cases, AR servers are going to have a better idea about the visibility of augmentations]
 * identity of the user-agent
 * payload formats that the user-agent can handle
 * a text query, of some sort. “I'm here, and I’m interested in anything having to do with Banksy”
 * an image query: “I can see this. What augmentations are available for it?”

Can requests be accomplished with HTTP GET?
One hopes so, because you get a rich caching infrastructure as a consequence of the idempotency, in particular sophisticated things like ETags, which in practice are hugely valuable to help with scaling.

On the other hand, if you use GET, that means you can’t send much of a structured payload along with your query; you’re pretty well limited to HTTP headers, which are just name/value pairs. Having said that, you can do a lot with name/value pairs. However, you probably can’t do image queries with GET.

Strawman thinking:
 * Simple AR Discovery requests are HTTP GETs to service-provider-defined URIs. Discovery service providers can obviously design their own URI spaces, including value-added query languages.
 * Complex AR Discovery requests are HTTP POSTs whose Content-type and message body constitute a query.
 * Recommend the use of the existing User-agent and Accept headers.
 * Specify a new X-AR-Location header to provide client location.
 * Maybe specify a new X-AR-Radius header
 * Maybe specify a new X-AR-Query header to hold value-add queries

What is needed in the response to a query?
A list of augmentation resources that are of interest, based on the query. What is the absolute bare minimum that you need to make this useful? Other things that are non-essential but might be useful include: Strawman thinking:
 * A URI that identifies the augmentation.
 * A short human-readable label that describes it.
 * The exact lat/long/altitude of the augmentation
 * The Content-type of the augmentation [but well-written clients will handle this being wrong]
 * The byte size of the augmentation [but well-written clients will ignore this]
 * Lengthier descriptions or artist’s statement about the augmentation
 * Thumbnails, icons, other visual aids to identify the augmentation
 * Identifier of the provider of the augmentation; probably a URI.

An AR Response could be a super-simple JSON vocabulary comprising a list of responses, each of which is a JSON object, required to have a URI and label, and may optionally include other information. {  "version" : "1.0.8", "augmentations: : [    {     "link": "https://ar-provider.org/pillar-of-fire.arml",     "label" : "A pillar of fire over the Taj Mahal"    },     {     "link" : "https://ar-provider.org/back-story.mp3",     "type" : "audio/mp3",    "size" : 235135,    "label" : "Listen to the history of the street you’re in "    }   ] }