A couple of months ago I attended the London chapter of the global TADHack 2015.
As I had failed to produce a hack at the mini-hack even a few weeks prior (only being able to attend the very end of the event), I committed myself to making something interesting for this event.
My hack, the voice anonymiser, won a prize. I couldn’t have done it without the kind help of the dialogic folks (and in particular Vince Puglia - @vfpuglia), who I would like to thank for the support they gave me in getting my idea working, as well as all the other participants at the London event, who were all a pleasure to talk to.
When I started this hack, I had a think about how journalist’s sources often need to protect their identities when they provide voice interviews on TV and radio. I’m sure you’ve often heard the robotic mutated voices on various broadcast programmes. This is a tricky problem, as these filters can often be reversed, or unique aspects of someone’s speech can often be detected. There’s an area of research looking into solving these problems. Some of my ex-colleagues at BBC R&D are looking at this very area of research in protecting sources.
Then I started thinking of another problem: what if the sources don’t trust the journalists themselves. Or, witnessing the increasingly sophisticated hacking capability of competent state actors, perhaps the sources fear the journalists may have their data compromised. In that case, it would make more sense for a source to provide their information in a way that is anonimised from the start. A phone call, where the journalist can record the facts, without recording anything that could personally identify the source.
The system I put together had three parts:
- A SIP audio recorder and player
- A PSTN to SIP bridge
- A voice changer, based on a phase vocoder
Part 1: SIP audio recorder and player
To perform the tasks I wanted to on the voice, I needed a server that would pick up a call coming in through the Public Switched Telephone Network (PSTN), make a new call to the intended recipient, apply a filter on the voice and forward that data to the recipient.
By a lucky coincidence, the main sponsors for the London branch of the TADHack event, Dialogic, have a product called the PowerMedia XMS, which seemed perfect for this requirement. With the help of Vince Puglia, I launched an Amazon Web Service EC2 instance provisioned with the Dialogic XMS software.
After having a look at some example code, I wrote a basic Java application which made and received queries through the RESTful interface.
I tested this out by making a SIP call with a standard desktop SIP client. With this, I managed to record a call, and make the Dialogic instance call me back and play that same sound back to me.
Part 2: A PSTN to SIP bridge
The next challenge was getting my voice from a reglar phone call from my mobile phone into the dialogic instance, where I could manipulate it and make another call. This can be done quite easily with SIP trunking. SIP trunking is nothing more than a service which links VoIP calls into the Public Switched Telephone Network (PSTN). Twilio has recently released something they call Elastic SIP Trunking which makes processing received calls, and making voice calls from a server incredibly easy.
I created a SIP trunk in Twilio’s interface and assigned it a new phone number. I then set up the origination settings to point to the IP address of the Dialogic XMS instance. That meant that any phone call to the Twilio number would trigger a SIP call to the Dialogic XMS instance, which would be ready to answer the call.
I then gave the SIP it a termination URI, allowing me to make outbound calls to any number by making an HTTP query to that particular URI from my Dialogic instance.
I originally was aiming to have the filter applied in real-time to allow a two-way call to happen where one user is being filtered during the conversation. I realised I would be unlikely to achieve this in the allocated time over the weekend. So I instead focused on recording the incoming call, applying the filter to the recording, then making a call to a pre-determined number to play back the modified recoding. More like a voicemail message anonymiser. Although a little incomplete, I’d like to think that as a demo, it gets the idea accross. I also think it still has potential use in the scenario proposed: having a source leave an anonymised tip to a journalist.
The number called in this setup is fixed. This is intended. The aim of the service is not to provide a tool for anyone to anonymise a call to anyone else (prank calls etc.) Instead, it would be easy to map a twilio number to a given journalist’s phone number.
Part 3: Voice changer
I quickly put something together with Scientific Python to change the voice of the caller. I found a sample code for a phase vocoder in python, allowing the speed of the voice to change without changing the tone. I then modified this to allow the tone to change without changing the speed.
I then experimented with a ring modulator in python but it had a tendancy to make the speech unintelligeable with voice recorded from a phone’s microphone. Hearing a phonecall from myself in the voice of a Dalek did entertain me for a good few minutes though.
With the right parameters, I was able to make a call to my system, and have it call me back with a slightly modified tone of voice (deeper). It was quite an odd experience, hearing something I said in someone else’s voice.
In the end, it was a very fun hack to work on. You can see the demo and my presentation on youtube
If you’re interested in any of the above, you can take a look at the code on GitHub. Finally I would also like to thank the organisers of TADHack, who put together a fantastic event with an awe-inspiring level of coordination accross all sites. I hope to attend next year!