Blog: How Tos

London Councils & pirate books. Google dorking for subdomain takeovers

Adam Bromiley 11 Apr 2023


  • Google dorks found me an exploited DigitalOcean subdomain takeover on London Councils’ domain
  • It used a meta refresh to redirect to a site hosting unprovenanced PDFs
  • London Councils had a security.txt file which made disclosure a doddle
  • Their security team were awesome and fixed it quicker than I can make a coffee

What did I do?

Google dorking is a technique that uses operators in search queries to find public files based on words in their URLs or content. It’s a great way to scan the internet for applications using specific web technologies or hosting juicy files. has a great Google dorks primer here.

In my spare time I’ll sometimes go on a Google Dork Drive (you heard it here first folks!). It’s like a Shodan Safari but with a less catchy name.

Note that if your query uses just a few dorks or keywords, you’ll get a lot of false positives. Google Search also ignores most special characters or treats them as spaces, which could lead to a similar outcome. So, a degree of refinement is needed. As an example here’s a broad and unhelpful search for servers hosting a phpinfo page:


Instead, try this:

intitle:phpinfo ext:php "Virtual Directory Support"

The quoted words say to Google, “all results must contain this exact phrase”, and it’s a pretty unique phrase that shouldn’t throw up loads of results.

If you run into the following on your Google Dork Drive:

…just tick the box like the good little non-robot you are :)

What did I find?

Sensibly, Google Search doesn’t index server-side redirects (HTTP response codes 300 – 399). It does however index pages with client-side redirection mechanisms such as a meta refresh, containing the following HTML:

<meta http-equiv="refresh" content="2; url=">

This example redirects the user to after two seconds. Google also indexes sites with JavaScript redirects, and you can dork them if they redirect based off a query parameter.

This means that you can get search results for where attackers have abused this functionality to index legitimate sites with redirects to their scam sites. This happened in the case of the Environment Agency’s river conditions site, documented in our previous post. It then happened to London Councils’ site, another domain…

The results reek of a fishy redirect. What scam sites like this do is contain tonnes of keywords to try to boost their search rankings. Arguably it doesn’t actually work, but hey ho.  Here I’ve just searched for “qwerty”, but you could use any word to get results. Remove the dork and they’d only start appearing in the latter pages of the search, but a more specific phrase would bring them back up. Here’s an example result:

The robots.txt file contained the following allowing all search engines and other bots to crawl the site:

User-agent: *



I counted around one million URLs across 238 sitemap files. Each one used different key words to have search engines show them regardless of your query. Here’s a snippet from one of them:

<url><loc> on the door the complete guide for door super.pdf?sitesec=reviews&amp;context=L</loc></url>



You can get the The Complete Guide for Door Supervisors, Wuthering Heights, or a Bear Grylls survival handbook. So much variety for just one website!

Clicking on any one of these links will take you to a redirection page:

Which redirects to

Dubious indeed, and clearly not the site’s intended behaviour. Turns out it redirected for any URL with two path segments:

The page used a meta refresh to redirect the user after one second:

<meta http-equiv="refresh" content="1; url=">

Another observation was navigating to the homepage sent you to London Council’s main domain,, a legitimate site.

What does it all mean?

Use of a meta refresh with a hard-coded redirect URL indicates the attackers could modify the HTML and thus probably had access to the server.

This is different to a query parameter redirect, like the one on the Environment Agency’s site:

…where the redirect is to the link specified in the URL and often intended functionality, it just gets abused.

The maintenance site was clearly used legitimately by London Councils in the past. We can look at its Internet Archive capture from last year:

So, at some point since May 2022, attackers gained access to but not the main domain. This is indicative of a subdomain takeover.

HackerOne has an article on the topic: A Guide to Subdomain Takeovers. Put simply, the vulnerability often arises when an organisation owning a domain ( deprovisions one of their subdomains ( but don’t remove its DNS record, leaving it “hanging”. An attacker can spin up their own host with the same hosting provider, claim the subdomain by pointing the DNS record to their new host, and then serve their own content under it.

Looking at a site’s current DNS record is the key to identifying a subdomain takeover possibility. Since it was already exploited, however, I looked at the historical records. First, for the compromised subdomain:

And then for the parent domain,

It should be noted that these were all A records, which map a domain to an IP address, however takeovers more commonly involve hanging CNAME records. CNAME’s alias one domain to another, often your site’s domain to a domain owned by the hosting provider. Some providers make these trivial to compromise, since they’ll let you pick the provider-owned domain when you commission a new host.

Compromising a specific A record is usually more time consuming. Instead of claiming a domain name, A record takeovers involve claiming the IP address, and host providers will rarely let you pick that. Instead, it’s down to random chance. You must continuously spawn new hosts until you get allocated the target IP address.

Take a look at EdOverflow’s Can I takeover XYZ? for a list of hosting providers vulnerable to takeovers. London Councils’ provider, shown in the DNS records as DigitalOcean, is vulnerable to the A record variant.

I believe the following happened:

  1. London Councils hosted and via Cloudflare.
  2. They deprovision both sites on 2022-12-22.
  3. On 2022-12-23, they relaunched the sites with DigitalOcean.
  4. They then deprovisioned, for an unknown reason, leaving it with a hanging A record to an unallocated DigitalOcean IP address.
  5. An attacker spun up a DigitalOcean host and was assigned the IP address by chance, claiming They began serving deceiving content on it.

Since there would be hundreds of thousands of unallocated DigitalOcean IP addresses to sift through (before hitting the one in the target DNS record) it makes this unlikely to be targeted attack. I’ll get onto to that later on.

What can someone do with it?

Redirect(ion) vulnerabilities are often used for phishing and social engineering attacks. Government domains and other similarly ‘trusted’ sites are attractive to scammers because target users and demographics tend to be more comfortable supplying their PII to them.

Let’s say the masterpdf site used the Design System, an open-source collection of webpage styles, components, and patterns to help create a standard site. The attackers could mimic Government Gateway and trick users into entering their National Insurance number or other personal information.

It relies on duped users not verifying the redirected URL and trusting the initial URL they clicked. Rather than masterpdf[dot]pro, the scammers would use a more convincing domain name like

I’ve laid out a sample attack below, which uses smishing (SMS phishing) to trick a vulnerable demographic into registering for a fake government grant. It looks like a legitimate government site and prompts the user for their NI number, which gets sent straight to the attackers. These details would be used later for fraud.

Apart from the URL the scam site looks convincing. Despite it acting like a open redirect vulnerability it was a subdomain takeover. The attackers didn’t even need to redirect victims to their site, they could just serve malicious content directly on the London Councils’ site. The URL would be so it’s unlikely the average user would spot something amiss without doing some digging.

So why didn’t the attackers do that?

As mentioned an A record takeover relies on the hosting provider allocating the unclaimed IP to the attacker by chance. Because there’s hundreds of thousands of unclaimed DigitalOcean IP addresses we believe the exploit was automated e.g. the attacker had a bot that spins up tonnes of DigitalOcean hosts and waits till they get an address that has an A record already pointing to it.

Nick Janetakis documented a similar scenario in “A Recycled IP Address Caused Me to Pirate 390,000 Books by Accident” (2018). It’s highly probable the scam site was the same as both appear to host pirated books.

So, they were in no way targeting a domain, it was just luck.

Instead, they were going for a low effort but wide-reaching scam in which scores of random subdomains across the globe get compromised. It’s possible the attackers could later review their rewards and sell off the interesting domains to hackers with more nefarious and precise motives, but fortunately this didn’t happen.

We can actually prove the existence of other victims with a Shodan search.

I noticed the favicon (small image on the browser tab) of the vulnerable site was changed from a London Councils logo to a little blue magnifying glass. Shodan has a filter for favicons; it uniquely identifies them with a MurmurHash3 value so you can search for websites with the same one. I used FavFreak to get the hash, additionally filtered Shodan to DigitalOcean hosts, and got a handful of other websites that’d been taken over.

shodan search \

--fields ip_str,hostnames \

'http.favicon.hash:XXXXXXXXXX org:"DigitalOcean, LLC"'

A quick Google dork search yielded similar results.

7-minute Solution

What’s awesome about this vulnerability isn’t the fact it appeared in Google searches, nor the fact it was a domain and the potential impact it could’ve had. The coolest part of the tale is the remediation.

The London Councils site has a security.txt file here:

It’s a file standardised in RFC 9116: A File Format to Aid in Security Vulnerability Disclosure. It does what it says on the tin; it’s for web developers to detail how users should report vulnerabilities found on their sites. It was conceived from frustration of security researchers finding many sites without proper reporting channels, leaving vulnerabilities not reported or acknowledged by the sites’ developers.

We were pleasantly surprised to find a security.txt because, like the rest of the Internet currently has a pretty unsatisfactory adoption rate of the standard. As we found in 2022 only 2% of their subdomains have a valid security.txt. Fortunately, the rest of the subdomains appear to be covered by a backup policy. The backup policy has a couple of issues though:

  1. Disclosure must be made via HackerOne, but not every security researcher has an account (because they’re not the only disclosure platform out there, plus some researchers like to remain anonymous).
  2. Follow the instructions and you’re taken to a report submission page, which states: “Not every * domain or sub domain is in scope of this VDP. Please only submit a vulnerability report to this VDP if the txt for the domain or sub domain specifically refers you to this submission form.

Point 2 was why reporting to the Environment Agency was a hassle.

The London Councils’ security.txt contained the following:

# London Councils - reporting security vulnerabilities.

# Please report any security vulnerabilities to our website agency via the contact method(s) below, in accordance with the NCSC's Vulnerability Disclosure Policy.

# Please do not include any sensitive information in your initial message, we'll provide a secure communication method in our reply to you.


Contact: mailto:[email protected]

# Our disclosure policy. By submitting a potential security incident to us, you are implicitly accepting these terms - please read this before submitting:


Last-Updated: 2023-03-21 12:38:29+0000

Expires: 2023-06-19 13:38:28+0100

A dedicated security@ contact email address, instead of the general disclosure channel. It’s recently updated too!

Big Blue Door are the site’s developers, contracted by London Councils. The security.txt was on the non-compromised domain, though I still did a brief background check to confirm my email wasn’t going to scammers instead. Either way it would just consist of public information.

I sent the email on Thursday, 23 March 2023, at 11:06 AM, went to make a coffee, and came back to a response:

Received at 11:13 AM. Big Blue Door fixed the issue in just seven minutes, quicker than the office coffee machine can fill a mug of coffee—awesome! Props to their security team, especially since they only recently took over development of the site.

They took the correct action to clear out the attackers. There’s not much more to thwarting a subdomain takeover than just deleting the offending DNS record. Hopefully, however, this incident will prompt London Councils and Big Blue Door into finding and scrutinising any more unrecorded or unused subdomains owned by them. This guidance should be followed for effective mitigation:

  • Avoid creating a DNS record before launching a host.
  • Remove DNS records before deprovisioning a host.
  • Maintain a catalogue of all of your organisation’s domains.
  • Check if your hosting provider has steps in place to verify that someone claiming a subdomain actually owns the domain.

To assist researchers who want to help secure your site, create and maintain a security.txt file (see for further information). If possible, try establishing a dedicated security contact.


Google Dork Drives are great for finding open redirects and already exploited subdomain takeovers.

Vulnerabilities that bolster social engineering attacks should be taken seriously, especially on government sites.

security.txt files are awesome for security researchers.

Proactive security teams are even awesomer!