JOSH MILLER: All right, so we are
going to talk about accessibility in the media
space at scale.
We've got Morgen Dye from BBC studios,
John Luther from JW Player, and Daniel Peterschmit
from Science Friday.
So some really nice and varying perspectives, I think.
So rather than have them each introduce themselves,
we're going to dive right in, and I'm
going to ask to maybe talk, first about why
are you dealing with this?
How did it come across your plate?
What's driving accessibility at your organisation?
So Morgan, I'll let you start.
MORGEN DYE: A few things.
I think, first off, we really wanted to be doing this.
Accessibility and creating services that are inclusive
really cuts to the heart of what the BBC is all about.
We've been doing accessibility for a long time.
Another thing is in television.
There's not many options.
You have to be creating captions, for example.
But it's not just the mandates and the regulations,
but it's also our partners are insistent upon it.
So yeah.
JOHN LUTHER: OK.
I think for JW Player, our perspective
is a little bit different, because we're not
a content provider.
We're basically a technology provider.
So for us, the main driver for this is customers.
We have many thousands of customers who are doing just--
I think in Kevin's presentation, he touched on this--
just the scale of how much video is
being created every second of the day is just gigantic.
So as that customer base grows and their content base grows,
customers--
their regulatory, not only just doing the right thing
and making things accessible for everybody,
because we are a web--
primarily web technology company, and the philosophy
of the open web is--
I've been involved with for a long time-- it's for everybody.
It should be accessible to everybody.
But beyond that, there are regulations.
We do a lot of business in Western Europe.
Now in the United States, there's
legislation, as most people in this room are probably aware.
You have to be compliant or you do
risk getting sued or other unpleasant things might
happen to you.
So there's that side of it.
The other side of it is as our company has evolved
into not only the technology, but we're also
now becoming a data company.
Video intelligence is this new market and product that
we're starting to develop, where the more we can know about
a video and its context, recognising what's in it,
what it's about, transcription, machine learning--
all these things, which I'll talk about a little bit later
in my presentation.
It's all part of the same philosophy.
We want to know as much about every piece of video
for the customer, not for--
we're not doing anything creepy with this stuff
or following, tracking anybody around.
So those are the two pieces of it.
Primarily for customers, but secondarily,
just to know more about the content
so that we can make content discovery and all
these other things that video intelligence means to us.
DANIEL PETERSCHMIT: At Science Friday,
we're much more than a radio show.
We have videos, we have original articles,
and we provide free educational materials K
through 12 for science classrooms based off
our radio content.
So we were with NPR for a long time, up until about 2013,
and they do transcripts for all their shows.
So when we left, we lost that, and there
was a lot of staff turnover and new people coming in,
and we didn't have transcripts for about a year.
And when that happened, all these educators
came out of the woodwork and were saying, we really
need transcripts for our classrooms
because of [INAUDIBLE] and [? IDEA. ?]
If you're not providing different forms of learning
for kids with disabilities or sighted or hearing
disabilities, it's really crucial for teachers.
So we started integrating-- once we heard,
because if you're already doing it,
it's just like a thing you should be doing,
and then you don't hear people--
you don't expect people to say, hey,
this is great, because it's expected.
But then it goes away, and in that case,
and then we see how great the need is for it.
But yeah, we do transcripts for all our segments,
for our videos, for other audio products, so yeah.
But education is a huge component for us,
and that's the gateway for a lot of people.
JOSH MILLER: So let's talk about barriers for a second.
In terms of getting things off the ground--
it's more than one person, obviously,
it's more than just you people sitting here.
How do you get people around you to buy into this, whether it
be budget, dev resources--
how do you get everyone on board to say yeah, let's do this?
MORGEN DYE: Again, I think for me,
it was actually quite simple.
I'm very fortunate to work in a company
where this is just part and parcel for what we do
and what we've been doing for a long time.
So honestly, there wasn't a lot of fights, if you will,
when it came to getting buy in on creating
captions or [? AD ?] or whatever it might be.
That's not to say that there weren't challenges, for sure.
Cost is certainly the first thing that everyone sees
and the first thing that everyone has a concern about.
But I think we learned very quickly, actually,
that when it comes to cost, it all sort of just comes out
in the wash, if you will.
I think it made our content more valuable to our clients,
to our customers, and so whatever
investment that we were making, we easily made it back
with our content sales.
So I don't really have any, I guess, tips.
I'll leave it to these guys to see if they have
something more useful there.
JOHN LUTHER: Well, for us, it's a matter of competitiveness,
I think.
As I mentioned, we're primarily a technology provider.
We've always prided ourselves--
our core product, which is our video player,
has been around now for 11 years, I think.
Alicia would know better than me.
She's the product manager.
But we've always prided ourselves
on being the most accessible video player.
We've been the first to integrate
a lot of accessibility technologies,
such as we're the first web video player
to provide captions and all these other things.
So to maintain just competitive advantage and be,
again, the most accessible player for everybody,
so that as people come up against these requirements
and have to, again, with legislation
and everything else, I'm proud to say
I think they come to us first.
And I've actually been emailing and preparing
for this talk and things with some people,
and they still say you guys do this better than anybody else,
meaning primarily the captioning and [? AD ?]
and WebVTT support and things like that.
So it's about buy in, the question?
JOSH MILLER: Yeah.
DANIEL PETERSCHMIT: It was a little bit trickier for us
at Science Friday.
You really don't realise how amazing something is until you
don't have it anymore.
And you realise how much effort it takes and the resources
required.
So those emails from those teachers helped a lot,
but we were--
part of it is just starting to do it.
And then this is a thing we're doing now,
and we should ask for the budget for it.
But also, it's for people who either
don't have a personal connection to accessibility or disability
who are just focusing on getting the show out every week.
It can be hard to convince people that this
is a thing we should be doing.
So we have so many stakeholders, with our member stations,
with teachers, parents, students,
everyone who listens to the show, people on the web.
I think once you bring your stakeholders front and centre,
and you can help people step into their shoes in sort
of an empathy way, I think it can go a long way.
But then there is grant money out for this,
and we're primarily grant-funded,
along with listener contributions.
It does need lead-up, though, and time so you have the time
to apply for the grants.
It is expensive, but if you're a nonprofit like this,
there are resources out there.
JOSH MILLER: So you've all mentioned
basically the benefits of doing this stuff, which is awesome.
Because that's what we're always trying to preach,
that there's more than just the cost side of this.
So, Morgen, you mentioned ROI and the cost washing out.
Differentiation, John.
You realise how valuable it is once you lose it.
I mean, there's some really interesting stories here.
So we talk about this idea that social video
has been really valuable for us, because it's
allowed other people to have this aha moment of watching
a video with captions when they had no intention of watching
that video.
And now they're all of a sudden watching, like, oh, wow,
captions are kind of cool.
So you've also alluded to mostly having full support across
the organisation.
But what happens if you have a new producer, a new developer
come in and they've never seen this stuff before?
How do you get them to understand
that this is a good idea, and that this is what we're doing?
DANIEL PETERSCHMIDT: For the web development side of it,
a lot of people, or at least the people in the organisation,
they just interact with the way sighted and hearing people,
interact with the web.
And it's hard for them to understand
we need alt text on images.
And it comes down to getting everyone in the room
and being like, this is a screen reader and this is how--
this is what it's like to experience the web in this way.
And then when you have something that direct in front of people,
I think that's an easy way to--
there's so many dimensions to this.
And to just drop a hint every once in a while,
we have designing--
I put up these worksheets for designing
for accessibility for autistic users or deaf users,
just as design tips.
A lot of people are audio people at our work.
But put it right in front of the kitchen,
and it's just something that's hopefully on people's mind.
Like, we're not just designing this for people
who can hear our show.
JOSH MILLER: Morgan, do you have analytics around this stuff?
Are you actually measuring how it's being used?
MORGEN DYE: I don't know if we have exact numbers in that way.
Some interesting anecdotes just being a British company.
Most of our programming is going to be with British accents.
So even those who are not perhaps deaf or hard
of hearing, people really enjoy watching our programming
with captions on.
It's actually one of the first things
I tell people when they're getting into doing captions,
is spend some time actually watching video
with captions on.
Until you immerse yourself in that experience
and spend a lot of time discovering for yourself what
makes a good captioning experience versus a bad or not
so good captioning experience, you really
don't know where to start.
But yeah, I don't have any sort of metrics or numbers
that I can provide on the amount of people using them.
I know that-- speak to our comments team.
One of the first things that you will hear
is, where are the captions?
If something's missing on one of our services
with one of our partners, captions
are always the first thing that are noticed.
If the quality is not as perfect as it should be,
people will notice, and we'll hear about it.
I can speak a lot more into how we
react to those sorts of things.
JOSH MILLER: So, John, differentiation's, I think,
are a really interesting one.
If you're already differentiated and ahead,
how do you continue to stay ahead in that space?
JOHN LUTHER: Well, I mean, to address
the first question a little bit, if a developer or a product
manager comes in, I'm actually very
happy to say that the level of awareness of this stuff,
compared to when I started in this business in 2002,
is just--
so there's general just people knowing
the difference between a caption and a subtitle
and all these things that, for a long time, people just--
JOSH MILLER: Who'd have thought people cared 10 years ago?
JOHN LUTHER: Yeah, right.
It's like captions-- everything was burned into the video.
There was all this stuff that--
so yeah, there's that.
And then on the other side of it, I think for us,
we've always--
again, I keep coming back to the web standards.
That's very, very important to us not only philosophically,
but just competitively.
So we contribute to standards.
We're W3C members.
I encourage anybody else here to--
if you don't have the budget to be a W3C member,
you should definitely just participate.
There's a lot of community groups in the W3C that
don't require membership.
I work a lot with people from the BBC.
There's just a lot of stuff going on there.
So hearing the standards, contributing to standards,
just being aware of what's going on in the market, what
people need--
I mean, it sounds simplistic and elementary,
but that's really what we try to do, is just stay up on things.
There's just so many resources out there now,
like the Web Content Accessibility Guidelines.
If people have never read that document,
I encourage you to just read it.
I'll also talk about it a little bit more this afternoon.
The latest version of it just was published last week.
There's open source tools now that
help you audit your website and software to make sure
that you're complying.
There's just so many more resources out there
than there were before that we're
trying to contribute to to just stay competitive, but also
because it's the right thing to do.
JOSH MILLER: So you're kind of all--
I don't know if lucky is the right word,
but in a nice position to be in an organisation that embraces
it all.
And you all, I'm sure, interact with other content/technology
organisations that may be trying to do something similar.
What are you seeing people do wrong?
Where are you seeing the mistakes?
What are people missing?
MORGEN DYE: I think what I'm seeing in what we experience
a bit at first, as well, is we're treating captions a bit
like just an element in a assembly line.
We're sort of dehumanising the process, and not
enough people are spending enough time stepping
back and thinking about, again, how do these captions--
how are they presented?
What is a good accessibility experience?
People are a bit more just focused on the delivery aspect,
making sure they get to places on time.
And that's obviously critical, but not
enough focus is being spent thinking
about the quality and the accuracy, in my opinion.
DANIEL PETERSCHMIDT: Yeah, quality and accuracy
is super important to us, too.
As a science show, everyone is an expert in something.
And if you get something wrong, then you will know.
They will let you know.
But I think maybe this isn't such an outward thing.
But besides it just being the right thing to do, it can be--
I think something that's overlooked
is that it can be really helpful in development content--
transcriptions.
The biggest part of our transcription budget
goes towards our videos--
not just for the final video product,
but our video producer--
maybe there's like two or four people in a video.
And each of those people is an initial at least hour-long
interview.
And having the transcripts for that immediately
is super helpful.
We use it for our social media, to grab things really quick.
One of our partners used to be PRI,
and they would take our transcripts--
on our segment pages, we usually have a few paragraphs.
But they would take our transcripts for that segment.
Someone who would write a whole article.
And since they're our partners, we
could just re that article back on the segment
page and hopefully hook people better that way.
So there's so many in-house, in-development advantages
to having transcripts, besides just the last step.
JOHN LUTHER: The first thing they
do wrong is they use the wrong video player.
[LAUGHTER]
It should be-- no, facetiousness aside,
I think that transcripts--
speech-to-text most people now are pretty well clued in to,
at least knowing it's something that they have to do.
And I think audio description is--
and that's not necessarily something
that people get wrong.
They're just not doing it.
And as Kevin mentioned--
and maybe it was in your presentation, I'm sorry--
it's just really important to not only--
I have three kids on the autism spectrum,
and they're very loud.
So my captions are on all the time on my television.
And so as you mentioned, it's very helpful to just have that,
even as a sighted person, because I
can't hear the TV half the time, especially the subtle sounds
coming out.
So I don't necessarily think it's wrong.
It's just there needs to be a lot more awareness about, OK,
you can't just--
yes, the machine transcription.
Everything's getting much better very quickly.
But the audio description piece of it,
as people have mentioned, is extremely expensive.
It's very, very hard to do well and accurately,
but it's something that people just
need to be more aware of and just factored in.
You got to budget it.
JOSH MILLER: So thinking about now
getting tactical operationally, if you think about--
and I'll give you options so you don't have
to say anything too negative.
But if you think about something you're either
really happy you're doing today that you
wish you had figured out sooner, or something you think
could still be improved, what comes to mind?
What have you figured out well?
MORGEN DYE: I think something that I
wish we would have done much sooner
is figured out how to work with things like captions in-house.
We were highly dependent on external vendors at first.
And we still are, and I would never
recommend transcribing all of your content in-house.
That's quite a lot of work, obviously.
But being able to do just really simple tasks with captions
in-house has saved us a tonne of time.
It's much more efficient and saves us a lot
in terms of operational costs.
And that's just doing basic things like timing offsets,
making quick fixes; if there is a mishear or a misspell,
being able to quickly open that file up, change it, get it back
out to your client or partner.
That's been a huge thing.
I wish we would have started doing that earlier.
DANIEL PETERSCHMIDT: On the web development side,
we have these external sites called microsites
that are just for larger projects, which
we code from scratch.
And I just wish that in my--
I wish I had more knowledge or I did more research
on all the W3 practises.
But I really wish my college computer science classes
included this, because it wasn't existent in my computer science
classes.
And that's where it should start,
because it's hard to hook people.
They have set practises, and it's so much harder.
And just learning it, too--
the W3 standards are great, but it's so much stuff,
it can be hard to know where to start.
And Google Chrome has a pretty decent audit now,
but it's still a huge investment.
And I wish I had invested in that earlier,
but we're doing the best we can.
MORGEN DYE: Yeah, I mean, there's
not a lot in terms of prior education
before you go out into the industry
and start working with this stuff.
People don't go to college to learn
how to create captions, which on my team
we probably have nearly a century's
worth of collective post-production experience.
And not one of us actually knew a thing
about SCC files, DFXP files, 608 versus 708.
It's just not something that we learned,
and so we had to get into it.
And so it was very daunting at first,
but how cool would it be if people, especially the younger
people coming into the industry, could
have this knowledge up front instead of learning it
on the fly as they go.
JOHN LUTHER: I still struggle with 608 versus 708.
I think as a company, I wish we would have just gotten--
so in 2015, we started this kind of--
Jeroen and I, one of the founders of our company,
said, let's just transcribe everything-- every video.
So in addition to our video player,
we have a video hosting and streaming platform.
It does transcoding, and there's a dashboard.
There's all this video tools in addition to the player
that we have.
And more and more customers were starting to-- at the time,
they were starting use our video platform for this hosting, all
this other stuff.
We said, why don't we just transcribe
every single video in there?
And I'll talk a little bit about this later,
but it was shocking when we got the estimate of how much, just
a rough back of the envelope.
So at the time, I wish we would have
taken a step back from that and said,
you know what, we should just build this thing ourselves.
We should make one of these and sell it and market it,
because I think it would have been a good business for us
to be in.
But at the time, we're just like, too expensive.
We'll get to it later.
We'll come back to it.
I wish we would have just not paid the cost for someone else
to do it, but developed a core technology to do it ourselves.
Those things are now coming to market.
Plenty of companies are doing it,
but I wish we would have done it a little bit more ourselves.
DANIEL PETERSCHMIDT: Another thing
that I talked to with our education lead to
is we're an English language show,
and Spanish language content is like the next frontier for us.
We want it to be.
And I think in 2014, I think it was
one in 10 students in the US were Spanish speakers.
And it's like at 2020, it's going to be something like one
in four, one in three.
We started producing Spanish language short videos
that are like experiments you can do at home with your kids.
And making them no voice, to just have text in there
they can easily swap out for other languages.
But that's an obvious limitation right now with our audience.
And that can go into not only video.
I don't know what this could turn into--
like maybe having voice actors redo
our show in another language, since it's just
an interview show.
These are things that are really far down the line,
but what are other ways we can get into other languages?
JOSH MILLER: Awesome.
We're going to take some questions
before we run out of time.
Yeah?
Please.
AUDIENCE: Hi.
I'm Ashley Edwards for the New Jersey Department of Education.
And accessibility is very new to us.
We are recently one of the states that
just got sued by the federal government
for not having accessible websites,
so we are trying to be extra accessible.
And one thing we're just starting
is accessibility with videos.
So I know you said that you'd never recommend transcribing
all of your content in-house, but is machine transcribing
actually better than humans transcribing?
Or do you hire other humans?
What is your advice?
MORGEN DYE: Just to clarify my point,
I wouldn't recommend transcribing when you're
at the scale that we are.
We're delivering literally thousands
of hours of content per year.
That becomes highly unmanageable.
I can't speak for how much content
that you'd have on your website, but there
could be different options there for you.
I'll let these guys speak to the voice recognition,
the automation technology.
We've played around with it.
Where we get tripped up is the British accents.
Computers just, for some reason, really,
really struggle, especially when you have a diversity of accents
that our programming does have.
You'll have Scottish, Welsh, Northern Ireland,
all various neighbourhoods and regions of London.
It becomes quite confusing very quickly for the computer.
But I'll let these guys speak to that.
[INAUDIBLE]
DANIEL PETERSCHMIDT: Again, we need high quality, pretty fast
turnaround transcripts.
But for the digital team, we have been--
if we need to transcribe an interview that would later
become an article on our site, Trent is not a bad option.
It's one of those machine-generated transcripts.
You do have to put in punctuation
and who's speaking and stuff like this,
but it's a slightly more easier way to get into things.
Again, it's scaled at--
once you have something transcribed through them,
it still requires a decent amount of time
to make it look usable.
JOHN LUTHER: You should just hire Josh.
It shortens it.
JOSH MILLER: Yeah, use case and resources available internally
will really dictate what makes sense for you guys, for sure.
JOHN LUTHER: It's getting better much faster,
but there's the notion of the two-pass transcription.
You run it through the machine, the computer,
and what you get back is most OK.
The accuracy rates used to be abysmal.
They're getting much, much better.
And then you have a human being make sure
that everything is correct and sort of tweak it.
And that can really help reduce the cost tremendously,
because again, preface to later, all the major cloud computing
companies are doing this.
But I'd like to say that their motives are
pure for accessibility, but it's not.
They're all doing it for their little speakers.
But that has really helped this stuff get much better much
quickly, because they're investing millions and millions
of dollars in that.
Because they want you talking to every device in your house.
MORGEN DYE: There's always a middle ground, too,
where you can send it off and let the automation,
the computers do the work.
And then it comes back to you, and perhaps someone internally
fine-tunes it, if you will.
Again, don't ever-- my advice to folks
out there doing this is to not be afraid to bring some of it
in-house and to learn to do some basic things.
It is very, very helpful to be able to just--
and don't be overly reliant on your vendors.
Your vendors are great.
They're lovely, but you need to be
able to do some of the things yourself.
JOSH MILLER: A question over there?
AUDIENCE: Someone just spoke about Trent.
My question is, these companies produce a transcript,
but in terms of the captions which are produced,
I'm not sure the quality is that great.
Because, I mean, it is clean-up required
of the transcript, but even things
like segmentation and all those things.
And very often we see that the output is not
even there, just like [? chokes. ?] Companies
that actually do ASR I'm saying don't produce good captions.
JOSH MILLER: Yeah, so that's an important point.
So he's saying, just to make sure everyone understands,
that these applications of speech technology
aren't quite positioned for captioning.
And John just alluded to that as well.
Dan, do you want to clarify the use case?
Because it's a little bit different.
DANIEL PETERSCHMIDT: Yeah, absolutely.
So when I spoke of Trent, we use it purely
for in-development content.
It might do this--
I don't know-- but we don't use it
for captioning or anything like that.
It's for the writer, the writer's own notes
to refer to back to really easily and be
able to play that part of the audio
file really easily alongside the text.
So that can be super useful for like [INAUDIBLE] stuff.
JOSH MILLER: So a good example of the use case
is think about a reality television show.
For that one-hour show that we see,
there's probably 40 hours of content,
maybe 50 hours of content that they
have to whittle down and make interesting.
It's pretty boring otherwise.
So one of the ways they do that is actually
have someone, like a production assistant, essentially,
making notes or transcribing, if they're really unlucky,
all of that content to help the writers figure out
what segments they want to have in there.
So that's one way to think about it.
It's the same idea, just done a little bit differently.
But in the reality TV case, they don't want
that content going anywhere.
Because especially if it's a competition-based show,
that content gets out, the show's over.
Yeah?
AUDIENCE: Just to follow up on from the woman from New Jersey,
I work for a public school district in Connecticut.
And I know you guys have some skin
in the game with your services.
But a neighbouring district got cited by OCR
for inaccessibility.
And through their conversations with OCR for videos,
in particular, YouTube became an option for them.
We're struggling with whether or not
that's a real viable option, because, again, it's
just speech-to-text.
It's maybe 80% accurate.
There's plenty of errors, like you
showed in your opening with the New England Aquarium
not coming out right.
Have you had conversations with OCR about where
their level of tolerance is?
Because, again, as a public district,
we have to argue for every dollar.
And to make an argument on one side or the other,
we're going to have to some more background.
So I was just curious on your thoughts
on the tolerability of OCR.
And where do you think the trends are heading?
JOSH MILLER: So I'll say something,
and then you guys should add.
So I think they've all touched on, I think,
really important points, that there
are some things you can use those tools for and then
do a little bit yourself, if that's
one way to keep costs down.
There is a pretty big lawsuit right now that cites YouTube
captains not being good enough.
That's the MIT/Harvard lawsuit.
So they actually do cite that the YouTube captions that are
there, they're not acceptable.
So based on what we've seen, we would say don't rely on that.
But YouTube also does have some do-it-yourself tools
that are very good, so that you could start with that
and then, if you have a couple of resources in-house, help
clean it up yourself to keep the cost down.
So that is a viable approach.
It's just a matter of understanding
that that is necessary still.
So I think we've heard a little bit of that here as well.
I don't know if anyone wants to add to--
DANIEL PETERSCHMIDT: Yeah, we just heard from teachers
that they're really reluctant to show videos
with the YouTube-generated captions,
not only because of the errors.
But the algorithm learns nasty words
and puts it in front of the screen.
JOSH MILLER: John alluded to something very important,
that the engines are not there for educational content.
That's for sure.
They have ulterior motives, and I
don't think they would really hide that if you really
pushed them on it.
But it's very real, and it does affect the vocabulary.
JOHN LUTHER: I actually know the guy who
manages all that stuff at YouTube,
and they are getting very--
it's getting better.
And he's deaf.
He does it for accessibility.
He's a great guy.
I tried to convince him--
Ken, yeah-- to come, but he couldn't make it.
But we have a caption [? centre ?] as well
on our platform.
So again, you could just run it through YouTube,
don't publish it, take their output.
If being on YouTube is what is concerning
to you, like that you don't manage your own ads
or whatever reasons people don't want to go on to YouTube,
you could use them as your transcription service
as your first pass.
Upload the videos, but don't publish them.
Let it generate the captions.
Download those.
Then you can export them in--
I don't even know what formats there are.
And then tweak them and then put them on your own platform.
JOSH MILLER: And the person he's referring to,
the engineer at Google, he started
the auto-captioning project.
He's fantastic, and he will admit they're not quite there.
I mean, he's deaf.
He knows very well what they can do.
But also, the point is, it's better than nothing.
It's a start, and that's the point,
is it should be recognised as a start.
AUDIENCE: I have a question--
may I-- about audio description.
I apologise if this is redundant.
I came in only at the end of the last session.
I wanted to know if any of you are doing
audio description at scale.
I'm in a position where I'm consulting to some very,
very large vendors-- to very large corporations that have
hundreds, thousands of videos--
and the cost is getting out of hand for doing it
in the old-fashioned way.
You have a writer who writes scripts.
And if they're just for employees--
we don't have broadcast standards.
We can edit the video to work with the description.
But it's getting unwieldy, so I was wondering if any of you
have experience of producing at scale.
DANIEL PETERSCHMIDT: Can you clarify audio description
for video?
Are you referring to captions or--
AUDIENCE: No, for just sight-impaired,
so describing what's happening on screen
for people who can't see the screen.
DANIEL PETERSCHMIDT: Yes, we have.
JOSH MILLER: Yeah, [? very helpful. ?]
The radio's back in.
[LAUGHTER]
JOHN LUTHER: Yet again, later this afternoon,
just to build the suspense some more,
we are starting to explore doing this with machine learning
and OCR, meeting object and character recognition,
not the thing the gentleman alluded to earlier.
It's extremely difficult. I mean, we can recognise--
and I'll show this later.
You can get very basic strings, meaning text,
about objects and verbs, what's going on.
And there are other companies--
the big companies, Google and Amazon--
they're all starting to try to do this as well.
I don't want to be pessimistic, but it's very, very difficult.
I think the transcription, like the very basic speech-to-text,
the transcription of what's said--
I'm a lot more optimistic than most people
about how soon we will have 100% machine-generated.
I think it's within the next two years.
It's just going to happen.
It's getting so much better, so much faster.
The description-- it's very difficult to do with a machine
unless you have just endless computing resources.
Some companies do.
So I wish I had a better answer for you.
Right now, it's just very hard to do with a machine.
MORGEN DYE: Yeah, we're in the same boat, I'd say.
If you find out the answer, please let us know.
Yeah, I don't know anyone who's using the automation for AD
right now.
And right now, we're really just trying
to wrap our heads around the different styles
that we're seeing out there.
Our parent company and the BBC Public Service
has a lot of audio description, and we're
looking at what they're doing.
We're seeing a lot of people out in the digital space--
actually, not a lot-- but those who
are engaging in audio description,
it's very heavily scripted.
It's a lot of fun.
It's very engaging.
It's very much in the tone and tenor of the rest
of the programme.
And then we're seeing other services
where it's much more of a monotone experience,
and we're trying to figure out for those
that rely upon audio description what sort of preferences
that they would like to see.
So we're still very much in the discovery phase.
I don't have any advice or tips or tricks of how
you manage this at scale.
I think we're all in the same boat,
and I think we're all trying to figure that out
and seeing what sort of technology
comes around the corner.
JOSH MILLER: So I'll say we launched
the service a year ago--
well, just under a year ago.
And one of the reasons we actually got into it
was, one, certainly demand and the requirement for it.
But we actually recognised that there were no scalable
solutions out there.
And then you wonder why.
So is it because the market's not there?
Well, no, the market's coming.
So what's happening?
Well, a lot of it's kind of like what we would say
is captioning 10 years ago, which it was very expensive.
People put up a big fight saying,
you can't make us spend all this money.
Otherwise we can't publish our content.
So there's been that tension for a long time,
that it's been so expensive--
I would say, in this case, even more so than captioning ever
was.
It's so hard that it is really--
it really is hard to bring enough efficiency
to bring the cost down.
And that is totally what we're focused on right now,
and trying to do that.
But it's hard, and there's only so much you can automate.
And so what we're seeing is that a lot of networks, which
is where it often starts in the broadcast world,
have been very successful lobbying the government to say,
you can't force us to do this because it's so expensive.
And this will blow up budgets.
And so that's still there a little bit.
And I think a lot of companies are
scared to get into it because there isn't a way to--
it's hard to see how you make it more efficient.
MORGEN DYE: Just real quick.
There's an interesting thing with entertainment content
that we're also trying to figure out,
which is very, very similar to foreign language dubbing, where
you have to have a very specific voice actor.
Some are contracted with certain programmes.
Some are always working with the exact same actor.
They're always the voice for actor A.
But something I'd love to see more of,
and I think actually AD would very much benefit from,
is there being a push to move some
of the responsibility for this upstream,
to move it up towards production.
That's something I'm pushing for.
It's definitely very much a fight.
But think about audio description.
They're the ones that actually might be best served to create
the best experience there.
That's not really a piece of advice.
I don't know if you can apply it in your situation,
but it's just another thing to consider.
JOSH MILLER: When we started, one
of the things we noticed also in the AD space,
with blind/low vision users, that's an area where the user
experience is probably even harder to understand
than captioning.
And captioning, the nice thing is you write what you hear.
It's a little more straightforward,
whereas with audio description, there
is a lot more nuance to it.
So the idea of the same voice actor for a certain type
of show--
Disney is famous for this.
They are very, very careful about the voice
actor for every single show or for every movie.
They're very, very careful.
But then if you go out and survey users,
you'll hear things like, well, I prefer synthesised voice
because now I know it's not the dialogue.
Well, that changes everything.
And you're not going to get the same answer, by any means,
across the board, but it's not as clear.
JOHN LUTHER: Has anybody tried Mechanical Turk for this?
Or do you know?
I don't know.
Amazon has this service called Mechanical Turk
that basically--
I don't know how they do it, but they pay human beings
to do tasks, repetitive tasks.
Some people used to use them for transcription.
I don't know if they've tried them for audio description yet.
Does anyone--
AUDIENCE: Mechanical Turk, in our experience, [INAUDIBLE]..
JOHN LUTHER: Not good?
Yeah.
AUDIENCE: [INAUDIBLE] [? very variable, ?]
and it's [INAUDIBLE].
JOSH MILLER: So we'll take one more question.
AUDIENCE: [INAUDIBLE] with transcription, though.
JOSH MILLER: Yeah, fair enough.
AUDIENCE: We have a question on the side.
JOSH MILLER: Yeah, question on the side here,
and we'll wrap up.
AUDIENCE: It used to be that--
I'm sorry.
[INAUDIBLE]
It used to be that IBM dominated this technology,
and they were the first ones out with it.
And they used the model of Noam Chomsky,
who's the father of American linguistics up at MIT.
And that was their model before they abandoned
it and they sold it off.
Have you ever had access to those old programmes that they
had?
Because even Google now, they've gone back with the AI,
and they released Google Assist.
I worked on part of it for a while.
And they worked on that, and Google went back
to Chomsky's transformational grammar, which he had abandoned
when it became political.
So do you go back to see those programmes?
Do you know those engineers who are still
alive with IEEE who also go between Google, Apple,
and the French systems?
And you're talking about OCR.
That's a little easier to do than the audio and whatnot.
Or you're having a problem doing Spanish,
but if you speak many languages, like you speak in England,
which over there-- so you have access to a fantastic
polyglot of languages.
So you have access to also Cambridge, Oxford, all this.
And you go over to Polytech in France,
who always work together in order
to compete in the European markets
against the American markets.
And they're also very good at language,
and they just establish a law.
My question is kind of three-tier.
I'm sorry about that.
They also passed a law that you have to inform someone
that they're not speaking, because the technology is
really good now.
You have to inform someone that they're not
speaking to a machine.
JOSH MILLER: Do you guys want to take a shot?
JOHN LUTHER: What's the question?
JOSH MILLER: So, one, do we have access
to the original research, and do we go back to that?
Is that fair?
For the IBM, the core research--
AUDIENCE: To Noam Chomsky's translational grammar,
which is very popular.
JOSH MILLER: Right.
So I can say that there's been a big push towards what's
called deep neural networks, which
is kind of the next version--
AUDIENCE: Now, you can't just use it.
The person has to know they're not speaking to a machine,
because the technology is there.
JOSH MILLER: Right.
JOHN LUTHER: Well, yeah, I mean, I don't know what model IBM--
so we've tested Watson.
We've tested Google's services, Microsoft's.
We haven't tested Amazon's yet.
If I think-- if this is your question--
which of them is the most accurate or--
AUDIENCE: Google's [INAUDIBLE] [? is. ?]
JOHN LUTHER: Yeah, we've found Google's--
AUDIENCE: They just released a beta [INAUDIBLE]..
Google [INAUDIBLE].
They released it like two years ago.
So when you were talking about YouTube,
which Google [INAUDIBLE],, both Google and at Google Assist,
which is more of a [? modern ?] technology--
I don't know if you're familiar [INAUDIBLE]
Google [INAUDIBLE] access.
[INAUDIBLE]
JOHN LUTHER: Oh, yeah, I used to work there.
There's plenty of resources at Google.
But the Watson-- all these guys--
AUDIENCE: [INAUDIBLE] Watson, [INAUDIBLE]..
JOHN LUTHER: Yeah, to be honest with you,
Watson has not impressed us much.
I don't know why.
Google has certainly been the most accurate and fastest.
And again, I think, to your point,
because they place a very high priority on--
for a number of reasons, but also
to compete with the other cloud services.
I mean, two years ago, what they had developed previously
for YouTube--
I talked to Ken about this two years ago or something.
I said, well, when are you going to release
a public API for this?
And he was like, good luck, never,
because we wanted to be the best for YouTube.
That's a whole different world now.
There is a very public API in Google Cloud Platform
for doing this.
But yeah, I was just playing around with an app
the other day on my phone that does real-time transcription.
I can't remember what it's called.
AUDIENCE: Google?
JOHN LUTHER: No, no, not a Google product.
It was a startup.
Basically live transcription real-time
of a voice-to-text to chat for people to chat with each other.
AUDIENCE: [INAUDIBLE] Is it Google [INAUDIBLE]??
JOHN LUTHER: No, it's not a Google product.
JOSH MILLER: So I can say, to the point about Noam Chomsky,
a lot of the core research and the models
that are being developed are still in the universities--
no question-- and that you've got professors leading research
on--
and that's where it's all coming out of still.
Now, what the cloud services are doing,
they're layering on top of that and tuning it
to a specific application.
But the core research is still largely coming out
of the universities, so we're seeing that.
But unfortunately we need to--
AUDIENCE: [INAUDIBLE] Are you also planning
to get resources from a disability app
like [INAUDIBLE],, or these companies
that the government requires these companies to do
[INAUDIBLE]?
JOSH MILLER: It would be nice.
So I want to quickly wrap up.
If there's anything you want to close with,
what piece of advice for someone getting started--
what would you say is the first thing
to think about for someone getting started?
And then we'll wrap it up.
MORGEN DYE: Again, I think for me,
it's just spending a lot of time paying attention and watching
programmes with accessibility features enabled.
That's where everyone should start.
You don't understand it until you actually experience it.
So go out there and turn those features on on your favourite
programmes, and define for yourself and discover
for yourself what makes a good accessibility experience.
And then only from there can you actually
make probably the best decisions.
Go for it.
JOHN LUTHER: Yeah, I would second that.
Instal [? Jaws. ?] Just start playing around with this stuff.
Talk to people.
You know what I mean?
Actually surveys or just direct conversations
with people about what the best experience for them is.
Also, I'd just say not to get overwhelmed, because it
can be very overwhelming.
Especially if you have a very large catalogue of content
and you're like, god, how are we ever even going
to begin to tackle this, just take it one bird at a time.
Is that what the old adage is?
MORGEN DYE: Real quick, sorry.
I'd say just come to events like this.
You're off to a great start if you're
coming to events like this.
Ask questions and meet people who
are better at doing this stuff.
DANIEL PETERSCHMIDT: Yeah, thirding all that.
And if you're on the side of getting
to convince other people, then to start with,
the cost benefit, but then the empathy side of it, too.
It can be easy to get frustrated sometimes.
But if you try to ignore that and
then just go for understanding and--
yeah, but obviously-- yeah.
JOSH MILLER: Great.
Well, please join me in thanking everyone here.
[APPLAUSE]
Really appreciate it.
Không có nhận xét nào:
Đăng nhận xét