Re-ReBreakCaptcha: Breaking Google’s ReCaptcha v2 using.. Google.. Again

TL;DR A logic vulnerability working 5 years later, dubbed ReBreakCaptcha, which lets you easily bypass Google’s ReCaptcha v2 anywhere on the web.

ReCaptcha Overview

Many of us know of ReCaptcha, Google’s Human Recognition Program.
There are two versions of it: v2 and v3.

v3 Is not our focus in this post, as it has no user interaction at all and only results in a score without a CAPTCHA challenge.
v2 has two types: “I’m not a robot” Checkbox, and Invisible reCAPTCHA badge.
We’ll focus on the first type, as it has all the challenges.
(https://developers.google.com/recaptcha/docs/versions)

There are two types of ReCaptcha v2 challenges:

Image Challenge – The challenge contains a description and an image which consists of 16 sub-images. The user is requested to select those sub-images that best match the given description.
Audio Challenge – The challenge contains an audio recording, The user is requested to enter the words that are heard.

Re-ReBreakCaptcha knows how to solve ReCaptcha v2 audio challenges, using Google’s own services!
Therefore, we need a methodology of how to get an audio challenge every time.

When clicking the “I’m not a robot” checkbox of ReCaptcha v2, we are often presented with the following challenge type:

Figure 1: Image Challenge

To get an audio challenge we need to click the following button:

audiochallengebutton
Figure 2: The Audio Challenge Button

Then we are presented with an audio challenge that can be easily bypassed:

Figure 3: Audio Challenge

Sometimes instead of an audio challenge, an error message is presented as Google has automation detection:

Figure 4: Automation Detected Error

We’ll try our best to avoid it and bypass it as well.
A simple sleep of a few minutes cooldown should suffice.

3 days ago ‘The Verge’ posted an article about CAPTCHAs:
https://www.theverge.com/2019/2/1/18205610/google-captcha-ai-robot-human-difficult-artificial-intelligence

It argues that CAPTCHAs are getting harder and harder to solve by humans, but algorithms are getting better at it. It seems Google is part of the problem itself.
Also, thanks Josh for mentioning ReBreakCaptcha indirectly!

2017 ReCaptcha Bypass

Back in 2017, I posted a method that bypasses Google’s ReCaptcha v2 with 93% success rate – ReBreakCaptcha.
See the post here: https://east-ee.com/2017/02/28/rebreakcaptcha-breaking-googles-recaptcha-v2-using-google/

Re-ReBreakCaptcha works in three stages:

  1. Audio Challenge – Getting the correct challenge type.
  2. Recognition – Converting the audio challenge audio and sending it to Google’s Speech Recognition API.
  3. Verification – Verifying the Speech Recognition result and bypassing the ReCaptcha.

The previous post promted Google to respond quickly, and heavy measures were made to prevent it in the short-term.
It’s been 5 years, so I decided to revisit this project and check it out.

As of the time of posting (28/02/2022), it is confirmed that this vulnerability still works with some minor changes to the code with 98% success rate – better than the original!

Backstory

Few days after publishing the original post, it got a lot of traffic and made headlines.
It was brought to Google’s attention that the PoC was live on GitHub so they took action.

They replaced the easy-to-solve audio challenges (4-5 digits) to a much harder variant after only a few audio solves.
Those harder-to-solve audio challenges were longer (10-12 digits).
They also contained background noise so bad, it sometimes was impossible to solve manually.

3/3/2017 I declared the PoC as non-operational anymore.
Started fiddling around with splitting the audio digits using the silence between, but had lower success rates (less than the original 97%) – so decided to leave it at that.

Through the coming years, a lot of researchers based their research on mine (with one team even writing their thesis upon this concept).
Some had little tweaks, some had newer mechanisms.
Thank you, I’m honored.

This includes:
10/2017 https://www.reddit.com/r/netsec/comments/78nbmu/code_release_defeating_googles_recaptcha_with/ (After this publication Google decided to upgrade audio challenges from digits audio to phrases audio)
12/2018 https://www.reddit.com/r/netsec/comments/ab94o0/code_release_uncaptcha2_defeating_googles/
08/2019 https://www.digitalwhisper.co.il/files/Zines/0x6D/DW109-3-reCAPTCHA.pdf
05/2020 https://www.reddit.com/r/netsec/comments/gpcic1/bypassing_captcha_with_visuallyimpaired_robots/
01/2021 https://www.reddit.com/r/netsec/comments/kp7p79/breaking_the_google_audio_recaptcha_with_googles/

So what has changed?

I figured enough time has passed, and tried to have another jab at it.
Google has added some noise at the beginning and ending of the audio challenge – but it seems they don’t use it as a fingerprint to prevent this bypass technique (even without splitting the words!).
Made little tweaks to the original PoC:

  1. Updated to Python 3 (specifically 3.7+).
  2. Updated to support the new version of Selenium 4.
  3. Little tweaks to the logic to circumvent ReCaptcha automation detection.

Below is the link to the updated PoC, a fully-automated bypass of ReCaptcha v2:
https://github.com/eastee/re-rebreakcaptcha

A video showing 100 audio challenges solved successfully using the PoC with 98% success rate (x30 speed):

Re-ReBreakCaptcha PoC – x30 Speed

ReBreakCaptcha: Breaking Google’s ReCaptcha v2 using.. Google

TL;DR A logic vulnerability, dubbed ReBreakCaptcha, which lets you easily bypass Google’s ReCaptcha v2 anywhere on the web.

Overview

Back in 2016, I started poking around to see how hard it would be for a threat actor to find a new method that bypasses Google’s ReCaptcha v2. It would be ideal if it worked in any environment, rather than being tailored to fit a specific use case.

I would like to introduce you to ReBreakCaptcha – a brand new bypassing technique for Google’s ReCaptcha v2.

ReBreakCaptcha works in three stages:

  1. Audio Challenge – Getting the correct challenge type.
  2. Recognition – Converting the audio challenge audio and sending it to Google’s Speech Recognition API.
  3. Verification – Verifying the Speech Recognition result and bypassing the ReCaptcha.

As of the time of posting, it is confirmed that this vulnerability still works.

ReBreakCaptcha Stage 1: Audio Challenge

There are three types of ReCaptcha v2 challenges:

  • Image Challenge – The challenge contains a description and an image which consists of 9 sub-images. The user is requested to select those sub-images that best match the given description.
  • Audio Challenge – The challenge contains an audio recording, The user is requested to enter the digits that are heard.
  • Text Challenge – The challenge contains a category and 5 candidate phrases. The user is requested to select those phrases which best match the given category.

ReBreakCaptcha knows how to solve ReCaptcha v2 audio challenges. Therefore, we need a methodology of how to get an audio challenge every time.

When clicking the “I’m not a robot” checkbox of ReCaptcha v2, we are often presented with the following challenge type:

imagechallenge
Figure 1: Image Challenge

To get an audio challenge we need to click the following button:

audiochallengebutton
Figure 2: The Audio Challenge Button

Then we are presented with an audio challenge that can be easily bypassed:

audiochallenge
Figure 3: Audio Challenge

Some of you may notice that instead of an audio challenge, sometimes you get a text challenge like so:

textchallenge
Figure 4: Text Challenge

To bypass it and get an audio challenge, you simply click the ‘Reload Challenge’ button until you get the correct type. The Reload-Challenge button:

newchallengebutton
Figure 5: Get New Challenge Button

What was our goal? To bypass the ReCaptcha. Can we do this? Yes. How? Google Speech Recognition API!

ReBreakCaptcha Stage 2: Recognition

Now comes the fun part, taking advantage of one Google’s service to beat another Google’s service!
Let’s get back to the audio challenge (Figure 3).
As you can see, the controls on this challenge page are:
1. A play button – to hear the challenge.
2. A textbox – for user input.
3. A download button – to download the audio challenge.

Let’s download the audio file and send it to Google Speech Recognition API. Before doing so, we will convert it to a ‘wav’ format which is requested by Google’s Speech Recognition API.
Now we have the audio challenge file and are ready to send it to Google Speech Recognition.
How can this be done? Using their API.

There is a great Python library named SpeechRecognition for performing speech recognition, with support for several engines and APIs, online and offline.
We will use this library implementation of Google Speech Recognition API.

We will send the ‘wav’ audio file and the Speech Recognition will send us back the result in a string (e.g. ‘25143’).

This result will be the solution to our audio challenge.

ReBreakCaptcha Stage 3: Verification

This stage is fairly short. All we need to do now is to copy-paste the output string from Stage 2 into the textbox, and click ‘Verify’ on the ReCaptcha widget.

That’s right, we now semi-automatically used Google’s Services to bypass another service of its own.

ReBreakCaptcha Complete Proof-Of-Concept

I have proceeded and made a complete POC script using Python.

It utilizes all of the presented stages of the technique for a fully-automated bypass of ReCaptcha v2.

Link to the GitHub repository: https://github.com/eastee/rebreakcaptcha

3/2/2017- Update:

It has come to my attention that a lot of people encounter a harder version of the audio challenge. Therefore, I have commited a workaround to the GiHub Repo that should overcome this situation, though at a lower success rate compared to the original easier audio challenges.
It is still not fully clear how this harder version is triggered, but the number one reason suspected is when your IP is suspicious to Google.
This is usually the case when one uses a public proxy / VPN, as their IP’s are flagged in the Google system as suspicious (harder ReCaptcha’s and more ReCaptcha encounters).

3/3/2017 – Update #2:

It seems that Google has fully patched this: raising the minimum number of digits from 4-5 to 10-12 and introducing new digit recordings that are harder to speech recognize, as well as background noise. The POC has stopped working as a result. It’s been fun while it lasted 🙂