All Tech Considered
4:11 pm
Mon February 3, 2014

Wikipedia Archiving Voices So You'll Always Know How Celebs Sound

Originally published on Mon February 3, 2014 11:30 pm

What's in a voice? To the folks at Wikipedia, the online encyclopedia, a voice means a lot. They've begun a project to archive the voices of famous people.

Actor Dustin Hoffman, writer John Updike, scientist Jane Goodall and Nobel Peace Prize winner Aung San Suu Kyi are among the audio clips now found in Wikipedia's biographical entries. Formally called Wikipedia's Voice Introduction Project, it aims to collect 10-second sound recordings of anyone with a bio page on Wikipedia. The BBC is helping, by providing a lot of its archival audio through an open license.

Wikipedia expert Andy Mabbett is one of the people helming the project and spoke with NPR's Audie Cornish.

On the project's beginnings

I wrote a blog post suggesting that we get some people to record their voices for Wikipedia, and asked everybody I knew who is the subject of Wikipedia article to provide a short sample where they tell us their name and a little bit about their background so we know what their voices sound like.

As a result of that the BBC heard the project and they asked me if I would get involved in a similar project that they were about to run, where they've open licensed or released with permission to reuse up to 1,000 40-second clips of programs from their archive, because they want to run software against them to do voice recognition, but they realized those clips are also useful for us on Wikipedia to exemplify how people sound, particularly people we can't reach with our recording equipment.

On why audio instead of video

There's nothing wrong with video. We use video on Wikipedia as well but by using audio, it's a very small task of the person recording their voice for us. They can do it in the morning when they are in their dressing gown and curlers. Some people are shy of appearing in front of a camera, but it's also a small file to transmit over the Internet. It's a lot cheaper to download the audio over the video.

On what you learn from hearing someone's voice

It's a very personal thing. If you think about the people in your own life, you know their voice the moment you hear it, as much as or sometimes even more than a photograph ... With a voice, you know instantly. And, I don't know about you personally but if I hear a voice from the dim and distant past from the days of wax cylinder recordings, somebody like the nurse in Florence Nightingale, it's so exciting to have that connection back to them. So we're doing the same for people today.

Who's on your audio recording wish list?

I happen to know there's a Wikipedia article about you [Audie], so I'm looking forward to receiving your recording ... No fear or favor. I'm happy for any subject of a Wikipedia article, whether they sound like a Shakespearean actor or a guy on the street selling a newspaper. We want their voice.

Copyright 2014 NPR. To see more, visit http://www.npr.org/.

Transcript

MELISSA BLOCK, HOST:

From NPR News, this is ALL THINGS CONSIDERED. I'm Melissa Block.

AUDIE CORNISH, HOST:

I'm Audie Cornish and time now for All Tech Considered.

(SOUNDBITE OF THEME MUSIC)

CORNISH: And we start today with this question: What's in a voice? To the folks at Wikipedia, the online encyclopedia, a voice apparently means a lot. They've begun a project to archive the voices of famous people.

DUSTIN HOFFMAN: I said gee, I like this part. No, that's the lead. You're a character juvenile, Which meant...

JOHN UPDIKE: The writer must face the fact that ordinary lives are what most people live most of the time...

JANE GOODALL: One of my favorite symbols of hope is a very long feather from the wing of a...

AUNG SAN SUU KYI: We had to go through the line of soldiers. And many of them, well, their hands were shaking. I don't think they particularly wanted to shoot us down.

CORNISH: Those voices: actor Dustin Hoffman, writer John Updike, scientist Jane Goodall, and Nobel Peace Prize Winner Aung San Suu Kyi are among the audio clips now found in Wikipedia's biographical entries.

Here to explain more is the project's brainchild, Wikipedia expert Andy Mabbett. Welcome to the program.

ANDY MABBETT: Hello. And good afternoon, America.

CORNISH: So, first off, where did the idea come from? Why do this?

MABBETT: Well, being English, it happened in a pub.

(LAUGHTER)

MABBETT: I was talking to a friend after a Wikipedia conference in London. And he mentioned that he wanted to get more sound files onto Wikipedia. And he was meaning things like engines running and the sound of running water - it's a waterfall and so on. And I said it would be a good idea to get people's voices, too. I wrote a blog post, suggesting that we get some people to record their voices for Wikipedia and asked everybody I knew who is the subject of Wikipedia article to provide a short sample where they tell us their name and a little bit about their background so we know what their voices sound like.

As a result of that, the BBC heard the project. And they asked me if I would get involved in a similar project that they were about to run, where they've open licensed or released with permission for reuse up to a thousand 40-second clips of programs from their archive. They realized those clips are also useful for us on Wikipedia, to exemplify how people sound.

CORNISH: And this project you describe at first is essentially the Voice Introduction Project. Here's a sample of one of those.

STEPHEN FRY: Hello, my name is Stephen Fry. I was born in London and I've been in the entertainment business since, well, I suppose about 1981.

CORNISH: So, British actor Stephen Fry, and he and others basically just introduce themselves. What we learn from someone, you know, other than their name and what they do, from introductions and the way they go about introducing themselves?

MABBETT: Well, the name Stephen - it's fairly obvious, most of us know how to pronounce it. But imagine you are not from a Western culture and you see that written down with a P-H in the middle - Step-hen - you would know how to pronounce it. So if we get people to tell us their name, we know how they pronounce it, which is pretty much the definitive guide.

CORNISH: And then, also, why the choice of audio versus video?

MABBETT: There's nothing wrong with video. And we use video on Wikipedia as well. But by using audio, it's a very small ask of the person who's recording their voice for us. They can do it in the morning when they're in their dressing gown and curlers. They don't have to dress up smart for it, like they would with video. Some people are shy of appearing in front of a camera. But it's also a small file to transmit over the Internet. So if you're on a low bandwidth connection or you pay by the megabyte, it's a lot cheaper to download the audio than the video.

CORNISH: Now, when it comes to the human voice, I mean, what is it that you think people are learning here from hearing someone's voice? I don't know if there's a person in particular that struck you in the past, but what are we getting here besides simply what they sound like?

MABBETT: It's a very personal thing. If you think about the people in your own life, you know their voice the moment you hear it, as much as or sometimes even more so than a photograph. Sometimes you can look at a photograph from 20 years ago and think, is that my partner or neighbor or friend or is that just somebody who looks like them? With the voice, you know instantly.

And, I don't know about you. But personally, if I hear a voice from the dim and distant past, from the days of wax cylinder recordings, somebody like the nurse in Florence Nightingale, it's so exciting to have that connection back to them. And so we're doing the same for people today.

CORNISH: Now, how do you verify that the audio is real, essentially since folks will upload the audio themselves?

MABBETT: We have a series of processes for checking things like that. I'm not going to go into too much detail about some of them because I don't want to tempt forgery. But we ask people to verify their ownership of the audio recording by email. And we can obviously trace back who that email belongs to. For a notable person, you can cross-reference it with their agent or something like that.

CORNISH: Before I let you know, do you have a wish list? Is there a voice or two that you really want to hear that you've never heard before, that you really just want to have part of the project?

MABBETT: Well, I happen to know there's a Wikipedia article about you.

(LAUGHTER)

MABBETT: So I'm looking forward to receiving your recording, Audie, (unintelligible) right? I'm sure you know how to do it by now.

CORNISH: I'll work on that. I can say my name at least.

MABBETT: No fear or favor. I'm pleased for anybody who's a subject of a Wikipedia article, whether they sound like a Shakespearean actor or a guy on the corner of the street selling a newspaper. We want their voice.

CORNISH: Andy Mabbett, thanks so much for talking with us.

MABBETT: Thank you very much for your time. It's been a pleasure.

CORNISH: Andy Mabbett, a Wikipedia expert, he spoke to us about Wikipedia's Voice Introduction Project. Transcript provided by NPR, Copyright NPR.