TL;DR A logic vulnerability, dubbed ReBreakCaptcha, which lets you easily bypass Google’s ReCaptcha v2 anywhere on the web.
Back in 2016, I started poking around to see how hard it would be for a threat actor to find a new method that bypasses Google’s ReCaptcha v2. It would be ideal if it worked in any environment, rather than being tailored to fit a specific use case.
I would like to introduce you to ReBreakCaptcha – a brand new bypassing technique for Google’s ReCaptcha v2.
ReBreakCaptcha works in three stages:
- Audio Challenge – Getting the correct challenge type.
- Recognition – Converting the audio challenge audio and sending it to Google’s Speech Recognition API.
- Verification – Verifying the Speech Recognition result and bypassing the ReCaptcha.
As of the time of posting, it is confirmed that this vulnerability still works.
ReBreakCaptcha Stage 1: Audio Challenge
There are three types of ReCaptcha v2 challenges:
- Image Challenge – The challenge contains a description and an image which consists of 9 sub-images. The user is requested to select those sub-images that best match the given description.
- Audio Challenge – The challenge contains an audio recording, The user is requested to enter the digits that are heard.
- Text Challenge – The challenge contains a category and 5 candidate phrases. The user is requested to select those phrases which best match the given category.
ReBreakCaptcha knows how to solve ReCaptcha v2 audio challenges. Therefore, we need a methodology of how to get an audio challenge every time.
When clicking the “I’m not a robot” checkbox of ReCaptcha v2, we are often presented with the following challenge type:
To get an audio challenge we need to click the following button:
Then we are presented with an audio challenge that can be easily bypassed:
Some of you may notice that instead of an audio challenge, sometimes you get a text challenge like so:
To bypass it and get an audio challenge, you simply click the ‘Reload Challenge’ button until you get the correct type. The Reload-Challenge button:
What was our goal? To bypass the ReCaptcha. Can we do this? Yes. How? Google Speech Recognition API!
ReBreakCaptcha Stage 2: Recognition
Now comes the fun part, taking advantage of one Google’s service to beat another Google’s service!
Let’s get back to the audio challenge (Figure 3).
As you can see, the controls on this challenge page are:
1. A play button – to hear the challenge.
2. A textbox – for user input.
3. A download button – to download the audio challenge.
Let’s download the audio file and send it to Google Speech Recognition API. Before doing so, we will convert it to a ‘wav’ format which is requested by Google’s Speech Recognition API.
Now we have the audio challenge file and are ready to send it to Google Speech Recognition.
How can this be done? Using their API.
There is a great Python library named SpeechRecognition for performing speech recognition, with support for several engines and APIs, online and offline.
We will use this library implementation of Google Speech Recognition API.
We will send the ‘wav’ audio file and the Speech Recognition will send us back the result in a string (e.g. ‘25143’).
This result will be the solution to our audio challenge.
ReBreakCaptcha Stage 3: Verification
This stage is fairly short. All we need to do now is to copy-paste the output string from Stage 2 into the textbox, and click ‘Verify’ on the ReCaptcha widget.
That’s right, we now semi-automatically used Google’s Services to bypass another service of its own.
ReBreakCaptcha Complete Proof-Of-Concept
I have proceeded and made a complete POC script using Python.
It utilizes all of the presented stages of the technique for a fully-automated bypass of ReCaptcha v2.
Link to the GitHub repository: https://github.com/eastee/rebreakcaptcha
It has come to my attention that a lot of people encounter a harder version of the audio challenge. Therefore, I have commited a workaround to the GiHub Repo that should overcome this situation, though at a lower success rate compared to the original easier audio challenges.
It is still not fully clear how this harder version is triggered, but the number one reason suspected is when your IP is suspicious to Google.
This is usually the case when one uses a public proxy / VPN, as their IP’s are flagged in the Google system as suspicious (harder ReCaptcha’s and more ReCaptcha encounters).
3/3/2017 – Update #2:
It seems that Google has fully patched this: raising the minimum number of digits from 4-5 to 10-12 and introducing new digit recordings that are harder to speech recognize, as well as background noise. The POC has stopped working as a result. It’s been fun while it lasted 🙂