Video: Best Practices for Live Captioning

By Tim Siglin , Steve Nathans-Kelly
Posted on November 11, 2016

LinkedIn’s Heather Hurford and Streaming Media’s Tim Siglin take a deep dive into the current challenges of closed-captioning live-streamed video in this interview from Streaming Media West 2016.

Tim Siglin:
Welcome back to Almost Live here at Streaming Media West 2016. I have with me Heather Hurford who’s a live video producer at LinkedIn. Heather, thanks for joining us.

Heather Hurford: Thank you, Tim.

Tim Siglin: I understand one of the challenges you’re trying to solve, which is a massive challenge for all of us is, closed captioning for live. There are a bunch of different parts of that. How are you breaking down the problem and looking at the solutions?

Heather Hurford: It really started out of a necessity, a need at LinkedIn to add the captioning to our own internal ambiance needs in an effort to make our work environment more accessible and inclusive. I have a history, a background in language, in IT and production and so it was a great project for me to take on and I’d actually worked on adding closed captioning to a nationwide television channel back in the early 2000s when the FCC mandate came out to add caption to all broadcasts, so I had some experience. I raised my hand and it turned out to be even more challenging than I thought. I think the hardest thing right now is that this standard that exists in the broadcast world has not been adopted for the online world and the reasons behind that, as I’m discovering are not so cut and dried.

Tim Siglin: I go back in this part of the industry for 18 years. I remember SmileFiles which were part of what Real implemented, and then you had other tech solutions on Windows media. We had the format issues. That was one issue. What are the other issues that you’re finding as to why things weren’t adopted or why there’s no standard per se, context, etc.?

Heather Hurford: Live is different than video-on-demand for captioning, so I’m really focused on the live because I think that for on-demand, there actually are some pretty decent solutions out there. There are lots of different file formats that work. Most of the platforms support more than one file format so you’ve got options. When it comes to live, many of the players don’t actually support true closed captioning.

Tim Siglin: So you’re saying the on-demand players from a particular video platform might support the time tags but the live video player does not?

Heather Hurford: Exactly.

Tim Siglin: Okay. So it’s not as easy as the old line 21 days where we did the insertions.

Heather Hurford: No, and that’s exactly the issue is that there’s not really a standard that’s been adopted across online media players, so YouTube had a solution for a while that had captioners going in and adding captioning information on the back end, and it wasn’t great for the kind of programming that we do at LinkedIn which is constant narration without a lot of breaks, so it really struggled to keep up with it. We’d lose huge amounts of text, so YouTube was one of the first to add support for the 708 standard which is what the digital–

Tim Siglin: Equivalent of line 21, yeah.

Heather Hurford: Exactly, in the broadcast world. That’s nice because all of the gear that’s out there, the captioning encoders, that’s the information that they spit out so we added a closed captioning encoder to our signal flow so when we’re live on YouTube that works great, but many of the other platforms don’t support it. The internal player for example that we use doesn’t have any way of taking that data and decoding it in the player. Essentially, what we do as a workaround right now internally is we create two streams, one with burned in captions and one without and we let the viewer decide which experience they want instead of having that nice toggle back and forth.

Tim Siglin: Toggle back and forth. Okay so essentially, you’re doing what we used to call time code burn back in the day when we transferred from film, so literally they-

Heather Hurford: It’s open captions.

Tim Siglin: Wow. That’s crazy.

Heather Hurford: Obviously not a–

Tim Siglin: Not an elegant solution.

Heather Hurford: Not an elegant solution. It’s a workaround for now and like I said, it’s all platform dependent so we just did our biggest conferences to Ustream and Ustream supports the 708. We were able to get really nice, great user experience, closed captioning.

Tim Siglin: What does the death of Flash do to what you’re doing, because one of the beauties of Flash is it inherently as a player had some time text capabilities in it. Obviously we’re now moving to HTML5 players. Are you finding the HTML5 players better out there, the companies that thought about this, or is it still sort of a hodgepodge of some support it, some don’t support it?

Heather Hurford: I’m finding a hodgepodge. YouTube and Ustream are the only big players that support it that I’ve found in the live space. Others will say they support it but they don’t actually support the 708 standard, and they offer a different solution which oftentimes isn’t a great user experience. I’ve seen some where the captions in there… I use quotes, captions, because it’s actually a scrolling transcript that pops up in a separate window.

Tim Siglin: Window, yeah, exactly.

Heather Hurford: With most of those solutions you are obligated to use the captioning provider that they’re partnered with and the quality, it tends to be not very good. I always like to point out that comprehension is tied to accuracy, so if you and I are talking and you only understand 70% of what I say, this won’t be a very good conversation.

Tim Siglin: Which brings me to an interesting question. Having worked with some speech-to-text solutions in the last decade or so, one of the ideas was we’ll just plug a speech to text engine in there and have it do that and then you put that up for captioning, but in reality, unless it’s trained, you’re getting 65% accuracy. The other option is to have somebody sit there and type it and of course as we’ve all watched live news with those, some of it’s phonetic, etc. You can come back later and clean it up for the on-demand asset, but where is the optimal solution for how to do that and what’s your take on that?

Heather Hurford: There’s actually an in-between, which in the process of adding captioning at LinkedIn I discovered because it is really hard to train traditional transcriptionists to get beyond a certain accuracy level. The really exceptional ones that can get in the 90% range are few and far between and they’re in high demand, so as we scale, as the volume of content that’s being produced increases, that’s not going to work. I found that in other parts of the world people are using voice writers, so a speech-to-text solution where there’s still a human being who’s taking in the content and re-speaking it.

Tim Siglin: Ah, and reading it. Re-speaking it.

Heather Hurford: Re-speaking it into a speed–

Tim Siglin: The system is trained. The speech-to-text is trained for them, so they hear it and that would also help you from a translation standpoint if you went multilingual, I would assume. Okay.

Heather Hurford: Yeah, so it takes care of that accuracy issue. It keeps the human element. Artificial intelligence is not there yet, so you still have a human making a decision and interpreting in those moments where it matters, frankly.

Tim Siglin: That’s fascinating. They may be interpreting somebody from their own language–

Heather Hurford: That’s what we’re doing.

Tim Siglin: Or they may be interpreting somebody from another.

Heather Hurford: We’re doing all-English right now and the accuracy we’re seeing is incredible and the cost is actually… I don’t want to throw numbers around, but it’s actually a lot cheaper than what I was paying for live captioning back in 2002.

Tim Siglin: What we used to get upset with our mothers or grandmothers sitting next to us and telling everything that was being said on TV is now actually turned into something lucrative for a person who can do that.

Heather Hurford: It is. It’s a skill and here’s the interesting part when it comes to skill is that transcriptionists take several years to get trained up. It’s a skill and it’s actually an aging labor pool, most of the people that do that, so re-speakers or voice writers as they’re also known can be trained in just a matter of months.

Tim Siglin: What percentage of those voice writers have a LinkedIn profile?

Heather Hurford: You know, I don’t know yet but there’s a real business opportunity there in the United States like I said, and I know in Europe where they’re doing… Everything is subtitled and captioned and it’s done in multiple languages. There’s a huge demand and a scale.

Tim Siglin: It’s the kind of thing that you wouldn’t physically have to be in the room to do, either.

Heather Hurford: In fact, what we’re doing is using an EEG encoder with iCap, so our captioning provider is remote. They dial in with a code to that iCap cloud software. We send our program audio there, so they’re just hearing the audio and they’re sending the captioning data back in.

Tim Siglin: Given the fact that it’s a stream and you have multiple seconds of delay anyway, the fact that they’re getting a real-time audio feed, like off of a conference call, means that the captions sync up then to the video.

Heather Hurford: Well, EEG has; some of their encoders have a feature where you can introduce even more delay to close that gap.

So that’s what we do. We actually close that gap and get it within five seconds. Sometimes it actually varies and sometimes the captions will be right on and even lead by a second or two which is … I always wonder what the viewer thinks.

Tim Siglin: The lead can be a strange thing. Eventually in two seconds he’s going to say this. Well, Heather, fascinating conversation. Thank you very much.

Heather Hurford: Thank you, Tim.

Tim Siglin: Again, this has been Heather Hurford from LinkedIn, live video producer talking about the challenges of live video captioning.

Leave a Reply

Your email address will not be published. Required fields are marked *