You may be tempted to compare Clips, Apple’s new mobile video app, to Snapchat and Instagram. When it appears in the App Store next month, the slick new app will feature filters, stickers, and a press-to-record UI that makes it easy to capture, edit, and share snippets of video. But Clips is no mere Snapchat clone.
For starters, Clips is a tool, not a platform. Yes, you can use it to make highlight reels of your cat, but you’ll still do your sharing over iMessage (or, more likely, your social network of choice). It’s also smart: Clips uses facial recognition to identify the people in your videos, and suggest people to share them with. But what really sets the app apart is a feature Apple calls Live Titles, which turns your spoken words into on-screen text. It also goes one crucial step further: Apple claims it synchronizes the text to the cadence of your voice. If it works as advertised, Clips’ speech-to-text feature will be the easiest method yet for close-captioning social videos. That makes Live Titles a neat solution to an ironic—and increasingly irksome—problem: The more people use video to communicate, the more they need text to tell them just what those videos are saying.
Blame autoplay. Next time you find yourself browsing your favorite social feed, pay attention to all the videos that play unprompted as you thumb through. Then notice how many are muted. Muted videos are infinitely more courteous than unmuted ones, not to mention easier to watch when riding the subway at rush hour, standing in line at the grocery, or talking with mom on speaker phone—provided they have captions. Text-on-screen has become a crucial barrier to entry not just for media outlets and YouTube stars, but anyone who wants their social videos to be seen by as many eyeballs as possible. People are consuming more video content than ever (according to Facebook’s latest numbers, half a billion people watch 100-million hours of video on the platform every day); it stands to reason they’re watching more muted content than ever, too.
But captioning videos is still a pain. Your current options for adding text are third-party apps like Vont (which produce good results, but are labor intensive), animating it ahead of time (which most people don’t know how to do), or finger-typing and scribbling on screen (which doesn’t sync to the action in your video). Facebook recently unveiled a free caption-generating service, but it’s currently limited to English, bedeviled by transcription errors, and only runs on desktops.
Clips promises to make it easier than ever for anyone to annotate their videos. Users can get fancy if they want. They can choose from different typefaces and presentation formats (like whether to use a simple text overlay or a speech bubble), and they can still manually edit captions with punctuation and emoji. But at its most basic level, Live Titles’ job is to listen attentively and spit out precisely timed subtitles in one of 36 languages (for now).
The Rise of Visual Culture notwithstanding, text remains essential to the narratives we all spin with images and video online. A tool that promises to makes captioning simpler, more intuitive, maybe even a little fun, is no small thing. If Live Titles is truly effortless to use, Clips could be very successful—in which case, Snapchat and Instagram might soon be the ones cribbing from Apple’s playbook.