The multimodal future is still voice-first

One of the most difficult thing for technology industry observers to do is to hold simultaneously in their minds the possibility that multiple “hot” new technologies will actually succeed. The temptation is always to pit one trend against the other and determine which will win. The truth is that the future typically involves the mashing up up more than one of these buzzwords once they have gone through their respective hype cycles. Mobile and social. Open source software and closed app stores. QR codes and NFC. AR and VR. And so on. But even where technology trends intersect, overlap and blend together, keystone technologies tip the scales and give the future a shape. Voice is one of those keystone technologies.

Nowhere do I feel the need to clarify that things will be “both and” rather than “either or” more strongly than in the realm of contextual computing – voice interfaces, messaging, chatbots and predictive GUIs. After all, the adoption of human-human messaging (whether c2c or b2c) is a direct ramp to chatbots. Voice interfaces are really just a form of chatbot, and they can return a GUI menu for user confirmation. You can already ask Bixby to identify what it is you are looking at in your camera viewport. Apple Watch can automagically suggest actions on the Siri watch face that you could actively invoke with your voice and vice versa. The future of contextual computing is clearly multimodal.


SiriKit’s multimodal responses

Brian Roemmele – the Rafiki of voice – has coined this entire category of computing “voice-first.” It is a term that has proliferated far and wide on the interwebs as a rallying flag for the emergent voice interface tribe. Having spent time working from the messaging piece backwards at HeyNow and Layer, I was always a believer in voice, but I really latched onto the idea that voice-first didn’t mean voice-only. And it doesn’t. Brian has been very vocal about the need for other modalities alongside voice, and that we suddenly aren’t going to stop using screens or typing altogether.

Yet there is something about voice in particular that feels different, and it wasn’t until yesterday’s Siri section in the WWDC keynote that I was able to really put my finger on it. Apple demoed Siri Suggestions on Monday – where Siri begins to learn about actions you take in apps and making contextually relevant suggestions as to what next actions you might want to take at a given point in time (context being a function of past usage patterns and the current state of your machine). And while represents a laudable improvement to the way iOS helps you make use of apps, it lays bare the limitations of an approach that does not put voice at the center of human computer interaction, however multimodal it may end up being.

Screen Shot 2018-06-05 at 10.44.30 AM.png
Siri Suggestions


Smartphone GUIs are paradoxically “single tracked” in that they demand your full attention, and yet smash your attention into dozens of pieces across apps, notifications and other stimuli. Even the most perfectly tuned GUI – with options and actions triaged ruthlessly by your own personalized context such as we are seeing with Siri Suggestions – at the same time absorbs you completely in the machine’s understanding of the world such that you can’t do anything else and bombards your eyes with stimuli. I’m not sure about you, but even as I go through well-tread workflows in apps I know inside and out to get stuff done, a sense of anxiety, distraction and mild panic is not far behind the leading edge of my perception. I feel like I am running on an ever quickening treadmill, constantly trying to outrun a robotic Red Queen who’s speed and parallelism leaves my wetware in the dust. My attention reserves are depleted each time I look at and interact with a screen, no matter how well designed and tuned.

Voice interfaces, on the other hand, are “dual tracked” in that you can do something else while engaging (driving, cleaning, working out, just passing by). And yet funnily enough, this dual tracked nature does not contribute to sensory overload or multitasking drowning, it rather focuses all inputs and outputs of the machine into a single, linear thread – just like the way the human mind works. Speaking to a computer and hearing responses – even ones that come with visual affordances – is a development in human computer interaction that most closely resembles the way we think. You can only have one thought at a time, only hear one thing at a time and only say one thing at a time. Indeed thoughts and speech are intertwined in a strange loop with one another, with Broca’s region (our internal voice) both shaping and being shaped by our speech. Do we speak our thoughts? Or do we think in words?

As our attention continues to fragment, even looking at a screen to evaluate Siri Suggestions and acting on that “next best action” is going to strain us. No matter the amount of personalization or context used to render visually options and actions to the user, the attentional price will always be higher than speaking. GUI will never go away, in fact in the AR world the entire FOV will be a GUI. But to deal with that overstimulation, the ultimate skeuomorphism will need to emerge for computers to interact with us the same way we think – that is, the same way we talk to each other and ourselves.

We’ll point our camera (or look with our glasses) at a thing and ask our assistant about it. Our assistant may present a notification to quietly nudge us about a recommended next action, but we will engage with it fully with our voice to get an answer to our question or unambiguously express our intent without futzing around with the interface. As we get ready in the morning, we will compose wildly complex queries by speaking a short sentence to our assistant, and have it resolved on our behalf without lifting more brain cells than required to express that need. Voice will be the shortest distance between a user declaring she has a job to be done and the computer working out how to do it for her. And in doing so, voice will become the first interface among equals in our multimodal future.

Siri and all her friends: why it’s SiriOS or bust this WWDC

“A wizard is never late, nor is he early. He arrives precisely when he means to.” – Gandalf

Siri remains the biggest liability turned threat Apple has faced in quite some time. It’s clear from FB, Google IO an Microsoft Build (let alone the blistering pace of progress of Amazon Alexa) that Apple needs to move quickly to close the gap before it’s too late. And while it’s clear that digital assistants on smartphones doesn’t quite matter yet on mobile, the day where users begin to change their purchase behavior on the basis of assistance is drawing near. One can’t help but feel that this WWDC is a make or break moment for Siri and for Apple.

What began as a multi year lead has given way to a serious deficit compared to erstwhile competitors Alexa, Cortana and Google Assistant. iPhone user satisfaction with Siri is dramatically lower than their overall satisfaction, but that should not comfort the paranoid in Apple’s executive team. Safe for now, Apple’s formidable iOS ecosystem stands to face serious competitive pressure when the basis of competition shifts underneath their feet as it appears to be doing in the case of assistance. Left without a major upgrade in capability as a platform in its own right rather than simply an appendage to iOS, Siri will be alone in its fight against the other assistants, fighting with a vastly smaller data corpus and with far less mature cloud and data practices internally.

Apple struggles with all things cloud services, machine learning and data. Siri unfortunately relies quite heavily on all three, and as a result, its ability to even correctly transcribe my words lags the field significantly. This will always be an issue, and while Apple basically needs to build or buy their way out of this deficiency, its strategic ace in the hole does not necessarily require them to bear Alexa or Google Assistant’s voice to text capability overnight. By leveraging the power of arguably the most important and robust consumer facing developer platform, community and economy in history, iOS, Apple can bring to bear an ecosystem that will unlock differentiated, delightful conversational user experiences.

Rather than renting space on Alexa or Facebook Messenger, iOS developers can leverage the master assistant as a sort of “router” to assistance experiences totally owned and controlled by that business. Siri gains new superpowers to help get users’ jobs done: asking for help from the App Store, iOS’ crown jewel. By leaning on its developer community rather than trying to be itself the smartest AI out there, Apple can securely, richly and sustainably deliver the science fiction style digital assistants we’ve envisioned. We have Alexa, Erica, Cortana, Eno, Luvo, Cleo and more, and so rather than winning a platform war with a better product, Apple can win it with a superior ecosystem. It can win it with SiriOS.

SiriOS – Siri operating system – would more or less be a rewritten SiriKit, sans the domain guardrails and with the capability for some sort of developer (and possibly user) defined intents and ontologies. The new Siri “applet” would require the app to be installed on the device. From third party audio apps to shopping experiences and beyond, giving developers the type of flexibility afforded by other assistant platforms would return Apple to pole position, if only in the knick of time. Rather than being threatened by Alexa, Amazon would become Apple’s best friend in the voice world. There’s no need to go so far as Android’s recently announced ability to set other assistants like Alexa and Cortana as the primary. “Hey Siri, ask Alexa to order some more paper towels,” is winning the war without firing a shot.

As things progress, one can imagine usage of Siri beginning to climb, helping Apple with that voice to text problem by providing useful data for Siri to learn from. Apple Business Chat is another fascinating new piece of the puzzle, whereby we could see a convergence into a multimodal experience that not only mixes text, voice and rich GUI, but human and bot interactions as well. And as discovery of new Siri apps gets more robust, the ability for users to interact with Siri apps without the core iOS app needing to be download may come to the fore as Apple dabbles around things like app thinning, but core and (allegedly) declarative UI frameworks. Things start getting very interesting for iOS as SiriOS becomes a powerful abstraction leak that gets to the core of how we use computers.

Siri needs to become the preferred voice UI platform for consumers and developers, and a point of aggregation of the user experience which Apple can control entirely. In doing so, they stand poised to be the ones to spike the assistance football. Yet again, Apple goes last.

The Conversational Economy

Markets are conversations. Trade routes pave the storylines. Across the millennia in between, the human voice is the music we have always listened for, and still best understand.

— The Cluetrain Manifesto, 1999

Long ago it was obvious that markets were conversational. You’d visit the bazaar, browse the wares and meet the merchants. You might have a relationship with the shopkeeper who refilled your weekly staples or the cobbler that fixed your shoes. In the early industrial economy, you might have dealt with traveling salesmen for a number of different products. You talked, you bought, and they remembered you.

In the past, each sale, each “conversion” was highly dependent upon how the conversation went. Yes, was the product itself good, but also, was the salesperson knowledgeable? Did they help me find what I needed? Did I trust them, did they hear me? Do they remember what kinds of things I like and dislike? Do they serve my particular needs, even as those needs evolve over time?

Similarly, customer retention was a function of how that conversation evolved over time. Individual conversations with the salesperson, shopkeeper or craftsman constituted an ongoing “conversation” out of which your purchases organically emerged. Purchases were bookends to parts of an ongoing conversation, and as that conversation was sustained, in good faith and with trust on both sides, so too was your loyalty to that merchant.

Over the past half century of mass production and mass marketing, these conversations have been distorted and fragmented. In a world where physical distribution — of both products and of media — required massive scale, the business models that naturally arose to govern the exchange of stuff were often impersonal, uniform and alienating.

Mass marketers became experts in creating one-size-fits-all messages delivered by a handful of media gatekeepers to promote one-size-fits-all products carried by a handful of mega retailers. When marketing spoke, customers listened. These media and commerce channels enjoyed a tight symbiosis which primarily served the purpose of one-way communications from businesses to customers.

In the mass marketing era, the customer conversation didn’t go away, but it became diluted across every TV & print ad, every coupon, every unsatisfactory purchase, every support call where they sat on hold, and every email complaint gone unanswered. Even as advanced targeting capabilities became available with the rise of Google and Facebook, companies spoke to their customers as befit the media they were doing it with: as audiences. In many ways, online advertising has simply amplified the existing distortions in the customer conversation presented by the mass marketing era; the relationship between the business and the customer became even more lopsided.

Even when communications channels were made available to customers like mail-in feedback, customer service lines, and email support, customers rarely feel heard and frequently feel like they’re being given the runaround. How fun is it to navigate a phone tree when you urgently need to talk to a human? For millennials and gen-Z, merely being forced to talk to a salesperson or a support rep on the phone — a communication medium not even reserved for one’s immediate family — is a few steps short of torture.

Fortunately for consumers, the cracks that had begun to appear in this system with the rise of Amazon have become a slow-motion collapse of the mass-marketing status-quo over the last few years. Don Peppers and Martha Rogers, authors of the seminal 1993 book The One to One Future, were prescient to notice how the internet was accelerating trends towards a more personalized, individualized approach to marketing and sales. Instead of looking at markets simply in terms of psychographic segments and market share, they proposed companies think about their business as a collection of relationships withindividual customers, one by one, and over the long run. Their warning to companies in 1993 rings even truer today:

Don’t be confused, however, by the fact that technology, to date, has not made it easy for your customers to communicate their ideas, feelings and suggestions to you. Don’t let a momentary accident of technological history convince you that your customers don’t have individual feelings and suggestions they would like to communicate to you, if it were as easy for them as it is for you.

Because, lo and behold, the end of that “momentary accident of technological history” is upon us.

Rising expectations by customers around the holistic customer experience are well established across industries. Media and entertainmentever the canary in the internet coal mine with a product reducible down to pure ones and zeroes, showed us that people want what they want when they want it — not just what they’re given. Amazon gave it to us with low prices, two-day shipping, easy returns, proactive customer service and personalized recommendations. Through their tech-enabled business models and customer-centric practices, the companies of tomorrow are already displacing the giants of yesteryear.

Technology and new business models have coincided to deliver better customer experiences and in doing so have raised the bar for every other industry. People are frustrated when their banking app is slower and more cumbersome than Uber’s. Why should they care about how difficult regulatory and legacy code issues are to overcome, or about internal bureaucracy? Between 2014 and 2016 alone, the percentage of customers who reported they had stopped doing business with a company after a bad experience jumped from 76% to 82% (KPCB, Ovum)Customers judge companies by the ever-rising gold standards of customer experience, and large swaths of the Fortune 1000 have already begun to wake up to this reality.

The ubiquity of social media and the increasing role of word-of-mouth referrals in the purchase process both amplify the customer experiences people have across their networks as well as drive customers’ desire for authentic communications with companies. People are connecting with one another more frequently and transparently. And with always-on smartphones, our connectivity is real-time by default. No longer can businesses hide in their corporate ivory towers, blanketing the airwaves with their carefully crafted, one-way messages. In the same way that customer expectations are shaped by their experiences with other companies, so too are they shaped by the new ways they interact with the world and with their friends.

So what are most businesses to do? How can companies — old and new — keep up with the ever-rising tide of customer expectations? We at Layer believe the answer lies in another mega-trend precipitated by the mobile revolution.

As the smartphone install base matures, a powerful pattern has emerged in the way people use their devices: messaging consistently is the #1 thing people use their phones for. It only makes sense that a device whose ancestor was exclusively used for communication, and which was dubbed by Steve Jobs in the iPhone keynote as an “internet communicator,” would manifest the fundamental human need to connect and communicate.

Modern messaging apps, by their nature, are used as not simply a means of sending messages but of maintaining a conversation. That means a nearly constant loop of notifications, checking one’s phone, and responding, all the while maintaining a relevance that no other type of app notification can match. These notifications, when implemented correctly, are constantly being opted into by the user. So-called “over-the-top” (OTT) messaging is able to go far beyond mere text, and can incorporate voice, video, and a whole host of entirely programmable interactive message elements.

What Operator pioneered with its concierge shopping experience over rich messaging, others are taking to the next level. Laurel & Wolf and Havenlyconnect customers to interior designers to help you transform your home (and sell you furniture and accessories). Trunk Club connects you over rich messaging to a stylist with whom you collaboratively craft a custom outfit to fit your style and taste. Accolade Health cuts through the red tape of the healthcare bureaucracy by matching employees with their own personal health advisor.

The Layer-powered Trunk Club experience for stylists (left) and customers (right)

These companies are cultivating a differentiated, defensible customer relationship by anchoring their customer conversation in today’s communication medium of choice: messaging. Whether customers are talking to human reps, automated text or voice interfaces, or some combination, the UX metaphor of messaging is the anchor that companies are settling on.Beyond just chat, the companies that define the conversational economy are combining rich, interactive messages, synchronous voice and video, and powerful agent-side customer service dashboards to help their employees be more effective and efficient. The upstarts are not alone — established giants like Staples and Bank of America have gotten the message (😉). The race is on.

Rich messaging is now Amazon’s default customer service option on mobile

Companies that foster one-to-one, direct and personalized customer relationships will stand a chance in a game defined by platform giants like Amazon and agile incumbents like Walmart. Those that do not will go the way of the media companies that the internet has already hollowed out or destroyed outright. Using a rich, branded messaging experience as the backbone of the customer conversation is going to be table stakes.

But the Conversational Economy is about so much more than surviving digital disruption. It is about one-to-one technologies allowing us to return to personal and personalized commerce. It means we’ll get “mass customization at scale,” as Don Peppers put it, where customers are treated as human beings and companies are no longer guessing as to how to serve their needs. And as Trunk Club and others have demonstrated with their custom clothing services, the personalized future is about more than just raw data. It is about a conversation.

The revolution is here, and it has a voice.


Snapchat Spacetime: Together when we’re apart

It would seem to me that Snapchat’s core job to be done is allowing us to be together when we’re apart. This is consistent across both one to one & one to many Snaps, as well as Stories. Metaphorically speaking, it’s a blend of teleportation and time travel (hear me out).

In a Snap, people are able to communicate certain things, particularly emotions, far more efficiently and effectively than with simple text. What might take 15 messages of back and forth to describe to a friend or loved one could be accomplished in a quick photo with a caption or a 4 second video message. If one party is busy, Snaps will accumulate and provide an immersive and serialized update for the receiver. When going back and forth on Snapchat every few minutes with someone, it can almost feel like being next to one another. Snaps are usually for the closest of friends or for particularly relevant moments to be shared privately with more casual ones.Snapchat’s fully private side forms the glue of countless friendships and small groups, allowing people to show their friends what they before could only struggle to tell.

Not only are people able to momentarily suspend the feeling of geographic separateness from someone they’re talking to on Snapchat, but they are also able to maintain a thread of communication asynchronously that still feels synchronized. This time shifted intimate communication provides an experience that live video chat or streaming alone cannot achieve. By packing so much about where we are — literally and emotionally — into such a compact and effortless package, Snapchat chips away at our separation in space and ventures to give us back some time together.

Stories are doing the same job with a slightly different graph, and it is this aspect of Snapchat that probably poses the most immediate term threat to Facebook engagement. Snapchat Stories have been wildly successful, and more recently have been widely copied. The audience for Stories is all of your mutual Snapchat friends. This group is significantly larger than the group that engages most frequently together on the private side of Snapchat, but it is likely on average a lot smaller than most people’s Facebook friends list. It is comprised of pretty good friends, old friends, and perhaps friends that have moved away. We want to feel like we’re keeping in touch and up to date with these people, but not talking to them everyday.

Stories allow us to take the same principles of Snaps — the immediate capture of a moment in space and time, the playful creative elements, the immersive sense of jumping into someone’s experience as Evan Spiegel put it — and combine them with a more explicit element of storytelling and putting forward your “face” to your friends. That face, like Snapchat’s lenses and filters, can be whatever you want it to be that day. And by threading individual Snaps into a story, you are able to do exactly what the feature’s name suggests — tell a story about your experience. Your friends that tune in to your story will get an update from the real you, not because it is tied to a profile of everything you’ve ever done, but because it represents who you are right here, right now. Snapchat Stories are the heir to AIM away messages, only this time it’s backwards; Stories are “here” messages. Friends can keep in touch without exchanging messages everyday with subtle gestures of watching each other’s stories and occasionally commenting and striking up a conversation. Don’t be surprised to see Snapchat roll out a private version of Stories to finally fix the clusterfuck that is group chat.

Fellow observer Alex Danco had an awesome piece a little while back framing the shift happening in social called “From pull and push to here and now,” (come on, how cool is that title?) that you should all check out. He describes a new paradigm emerging in social communication and content that principally is characterized by Snapchat, but also would include things like Twitch, and Houseparty. With things like ubiquitous mobile with high-speed internet being taken for granted, front facing cameras and the advent of ephemeral content, the new kids on the block are competing for the “here and now. And while from a content consumption perspective there are many other players and factors to consider, I think Snapchat is clearly the best positioned to define how we relate to the here and now with the people that matter to us. As things like augmented reality come into play moving forward, Snapchat will likely further warp spacetime to bring people closer together. Or as Steve Jobs said, make a dent in the universe.


After initially hitting publish, Nikhil made a great point about Discover that I think also applies to Live Stories:

The jumping into experiences of others far from you extends beyond the private side of Snapchat. Discover is, for better and sometimes for worse, a window into the pop culture zeitgeist and has so far been fairly immune from the most pernicious filter bubble effects that tend to develop elsewhere. Live Stories invoke the breathtaking experience of being somewhere else and feeling what the people there are feeling. Spectacles will only intensify this kind of “tourism” as we are able to capture more and more, whenever the moment arises.

On Snapchat, there is here, and it is always right now.


Originally published on Medium

Ghost in the machine: Snapchat isn’t mobile-first — it’s something else entirely


“Oh, you think darkness is your ally? You merely adopted the dark, I was born in it, molded by it.” — Bane

It’s tempting to think of Snapchat as a part of the app revolution, as one of the shining examples of mobile-first design that has defined our smartphone age.

This is of course true to an extent, but seeing Snapchat take its place at a consistent #1 or 2 in the US App Store alongside Facebook and Google’s main properties (and the other flavors of the week) somewhat obscures what is actually going on here.

Snapchat is not mobile-first, and it’s not really an app anymore. Nor is it a meta-app platform at this point like Facebook Messenger is angling to become (at least not yet)Snapchat is a true creature of mobile, a living, breathing embodiment of everything that our camera-enabled, networked pocket computer can possibly offer. And in its cooption of smartphones into a truesocial operating system, we see the inklings of what is beyond mobile.

When I open Snapchat up to the camera, I can’t shake the feeling that the ghost is banging on the glass, trying to break out into the world.


As we come up on year 8 of of the app economy, it’s absolutely remarkable to think about just how far we’ve come. Mobile has completely reshaped old industries, created new ones, and turned the entire computing world on its head.

Companies from all sectors have met their end (or become shells of their former selves) for failing to think “mobile-first” — a term coined by Luke Wroblewski that has defined the age as much as “lean” and “design-thinking.” Most consumer-facing and many B2B verticals are being driven by companies that have designed or adapted their customer experiences to fit a smartphone dominated world.

And yet — like all great waves in technology — the ground shifts beneath the feet of even those who have aligned themselves around the dominant ethos.

Peter Wagner and Martin Giles astutely wrote about these very rumblings last year in “Mobile First, But What’s Next?” They coined the term “authentically mobile” to distinguish services that not only are tailored for the mobile world, but who so thoroughly leverage the unique capabilities of mobile devices that they could literally not exist without them.

Where mobile-first companies take the new, portable form factor and riff on things that were more or less possible but limited in some way on the desktop, authentically mobile companies are truly creating experiences that would either be impossible or entirely meaningless without a networked supercomputer in our pockets.

A classic example of authentically mobile would be Uber, which without a location-enabled computing device always on our person (on both sides of the 2-sided marketplace), would almost certainly not exist. Wagner and Giles’ table here summarizes the shift:

Credit: Peter Wagner and Martin Giles

It’s clear that Snapchat is extremely well described by column #3 — particularly with regard to its emphasis on collection — and if there were a column #4, it would be straddling the lineThe “emphasis on collection” couldn’t describe Snapchat — an app which famously defaults to its camera — any more perfectly. CEO Evan Spiegel recently characterized Snapchat as primarily “a camera company.”

The Feed

No user-interface metaphor is as widely associated with the idea of “mobile first” design than the scrollable feed— whether it’s standard reverse chronology or algorithmically driven. One need only to observe people on public transit with their necks craned over their phones flicking up endlessly to feel just how pervasive feeds have become in our daily lives.

Outside of the big social players, the feed is found in countless other mobile apps ranging from productivity to personal finance. But although the smartphone form factor suits the feed incredibly well — from the focused screen size to the portability that has allowed content consumption to consume all the idle moments of our lives — it wasn’t born on mobile.

We began to see feeds everywhere towards the end of the desktop browser heyday, with the most important feed obviously being Facebook’s. In a way, Facebook made the browser wars irrelevant by essentially itself becoming the browser — the jumping off point for how we experienced the web. And despite intense skepticism from Wall Street, Facebook has been wildly successful in porting the News Feed over to mobile.

Adam Gale has a nice summary of just how handsomely this mobile bet has paid off for Facebook:

Indeed, Facebook (which includes WhatsApp and Instagram) is essentially a mobile company. Revenues on the platform jumped 70% year on year in the first quarter of 2016 (to $4.4bn, out of $5.4bn total revenues), having grown 82% the previous quarter. Mobile income now represents 82% of the business.

Just as Facebook was making this transition, and right when the iPhone’s camera gained the capability to take acceptable photos, a more pure, focused version of the Facebook News Feed emerged: Instagram. You post a few Instagram photos per week. Then you spend a lot of time scrolling through and looking at content, much like you would with the Facebook blue app. Instagram’s simple design, creative constraints and s̶u̶s̶p̶i̶c̶i̶o̶u̶s̶l̶y̶̶ consistently beautiful content make it a delightful mobile experience, and in many ways the crown jewel of Facebook’s attention empire.

Instagram is the pinnacle of Wagner & Giles’ “emphasis on presentation” hallmark of mobile-first. Instagram has long since eclipsed Facebook’s mindshare in the younger generation, and the acquisition has been hailed as one of the greatest in the history of technology. Facebook’s dominance over the feed metaphor is essentially complete and uncontested.

But we are beginning to see some cracks appear in both Facebook and Instagram. Earlier this year (ironically?) the Twittersphere was abuzz over a report in Bloomberg about sinking original (i.e. user generated) sharing on Facebook in what the company refers to internally as “context collapse.”

Anyone who has been on Facebook for long time probably didn’t need numbers to back up the general feeling that they and their friends weren’t posting big photo albums from the weekend’s events anymore, let alone sharing a cool song on someone else’s wall. VentureBeat reported around the same time that Instagram engagement had dropped a whopping 40% in 2015.

The Instagram numbers I take with a bit of a grain of salt as they don’t entirely pass the sniff test, but I think that while Instagram continues to grow (recently passed Twitter in a big way) and maintains a very privileged place in mediating our social hierarchies, people (especially young people) seem to be posting less frequently and are starting to spend their time elsewhere. It remains to be seen if Instagram’s algorithmic feed will fix this.

To be sure, Facebook and Instagram are still part of people’s hourly (ok — every 15 minutes) routine of “checking your phone,” but I don’t think anyone can deny that their apparent evolution into more passive consumption experiences doesn’t raise a few red flags.

Physics — back to the “now”

So what exactly is going on here? The numbers support the idea that Facebook and Instagram are wobbling a little in the US, and I think it’s reasonable to look at Snapchat’s continued explosive growth in users & engagement as one of the causes.

But why exactly are the two scions of the feed and the lynchpins of a mobile-first empire seemingly struggling to drive people to share their lives? Perhaps the task of constantly manicuring a persistent online identity — of carefully considering what effect your digital exhaust will have on your ego — is beginning to weigh on people. Both Facebook and Instagram are supposed to be arenas for the best version of yourself, and with each post you are putting something out into the ether to be judged both now and forever.

Mark Zuckerberg is famous for his extreme views on the singularity and persistence of our identity, going so far as to say that “having two identities for yourself is an example of a lack of integrity.” Consuming the feed exacerbates some of our darker insecurities which, in turn, put a ton of pressure on our contributions to it.

As everyone with a mom who made the family stop for a picture at every turn while on vacation can attest to, the urge to photograph all of the best moments of our lives is nothing new, but social media has turned this up to a fever pitch such that if it’s not posted, a moment might as well have not happened.

Before joining Snapchat as a researcher in 2013, Nathan Jurgenson wrote an essay called “Pics and It Didn’t Happen” that sheds some light on the chickens that are finally coming home to roost. He begins one of the most poignant sections here with a quote from Susan Sontag:

As Susan Sontag wrote in On Photography,

there is something predatory in the act of taking a picture. To photograph people is to violate them, by seeing them as they never see themselves, by having knowledge of them they can never have; it turns people into objects that can be symbolically possessed.

Sontag notes that this makes for a nostalgic gaze, an understanding of the world as primarily documentable. For those who live with status updates, check-ins, likes, retweets, and ubiquitous photography, such an understanding is near inescapable. Social media have invited users to adopt a sort of documentary vision, through which the present is always apprehended as a potential past. This is most triumphantly exemplified by Instagram’s faux-vintage filters.

I don’t think it’s so much the simultaneous massaging and crushing of our egos that is weighing on the mobile-first giants of the feed. Snapchat Stories certainly have a component of performance and voyeurism that probably never goes away in social.

Rather, as we drown in an over-abundance of content destined for archive that has lost its meaning, the immediacy and intimacy of those platforms like Snapchat and plain old messaging have given us an island of engagement with the present moment.

Jurgenson absolutely nails it when he sayBy being quick, the temporary photograph is a tiny protest against time. In contrast, the feeds are crushing in their insistence that we are constantly living to relive the past.

The ghost in the machine — a sign of what’s to come

Countless people have observed (and often lamented) Snapchat’s “bad UX/UI” according to generally accepted design practices on mobile. Where “good design” calls for feature discoverability, Snapchat does almost no hand holding for new users and buries features behind complex gestures and unintuitively placed screens. From pressing on Discover stories to compose a snap to share + markup the content, to double filters (hold the first down and then keep swiping through)Snapchat is at once one of the simplest apps of its stature in the world and one of the hardest to learn.

Importantly though, it’s not really the UI that is the “hard” part about learning Snapchat (many have overstated the role of this feature bamboozling in keeping out “the olds”). Rather, the ambiguity around what Snapchat “is” and “what it’s for” is primarily responsible for the incredulity of onlookers and the so-called steep learning curve.

Beyond the visual design practices that have defined the smartphone era, perhaps an even more overarching principle that has guided the critique of mobile apps has been the idea of a core “problem” to be solved, a single organizing principle around which users can rally. Reminiscent of the early days of Twitter, Snapchat has faced questions about what it’s core use case is, but unlike Twitter which has arguably been consumed by this dilemma, Snapchat has embraced the ambiguity and essentially responded with 👻.

Snapchat is very difficult to understand, even for those who use it regularly and think about it until their head hurts. The tangible reasons for its incredible success are numerous, overlapping and, at the end of the day, inadequate when compared to the actual feeling and experience of using it.

An interview Evan Spiegel gave to The Verge back in 2013 for the launch of Stories gives one of the best lenses (no pun intended) through which to understand what Snapchat is and what it was about to become. He said, describing the new feature:

When you have a minute in your day and are curious about what your friends are up to, you can jump into their experience. The last snap today will also be the beginning of tomorrow so there’s no pressure to compose a narrative. There’s this weird thing that happens when you contribute something to a static profile. You have to worry about how this new content fits in with your online persona that’s supposed to be you. It’s uncomfortable and unfortunate.

Jumping into their experience,” I think is probably the closest thing I’ve heard to a unified theory of what Snapchat is. It connotes an active give and take between friends (and more recently, influencers). It foreshadows theimportance of the doodles, stickers and filters that have come to define much of Snapchat, which are more about giving us an excuse to share anything — profound or mundane— than posing for an eternal self portrait. It’s something that only really works when the capture and consumption device are the same, and where the output — vertical photos/videos — fully immerses you in each experience shared with you.

And like all real experiences, these shared “jumpings” are fleeting. We can put a different persona on (with face filters, now literally) each moment and be reborn the next. Snapchat itself feels like it’s constantly pulsing like one of those time lapse videos of cars and city lights. We all go “there” when we get a peek into each other’s lives, but really there’s no there, there.

In this way, Snapchat the “place” is everywhere and nowhere at the same time. The “app” lives as much in our own mind and habits— the latent potential of any moment to be instantly shared, experienced together, and forgotten — as it does on Snapchat’s serversRather than looking at the inherent ephemerality of life as a bug like some of its competitors, Snapchat sees it unequivocally as a feature. Without this impermanence, Snapchat would feel like surveillance. Instead, it feels more like teleportation — somehow allowing us to be together when we’re apart.

It’s no surprise that even as Snapchat remains a fraction of Facebook’s size, it has nearly caught the blue giant in terms of photos shared daily. Ben Thompson had a great piece where he posited that tech markets all seem to have a “phonebook” and a “phone” — the phonebook being the grand directory of both people and content, and the phone as the go-to place for actively connecting with the most important people in our lives. In the US, he stated the obvious: Facebook is the phonebook, and Snapchat is increasingly becoming the phone.

This might appear to be a stable stalemate, but I pose the question in light of Facebook’s frantic attempts to get Messenger to catch on in the US: how long can the phonebook live without the phone? Much like Facebook became the browser on the desktop and took its momentum into the mobile-first world, I think we should expect authentically mobile Snapchat to parlay its takeover of the phone into whatever comes next.

Update 6/30: Two interesting new stories I felt I should include here as an addendum

Originally published on Medium