Blog: How Tos

Scanning for security.txt files

Shane Bourne & David Lodge 12 Jul 2020

Introduction

RFC 9116 was written by E. Foudil and Y. Shafranovich and left draft status in April 2022. This RFC formally defines the unofficial security.txt file that has been an unofficial standard for many years, initially created back in 2017 and documented at https://securitytxt.org/.

The security.txt file provides a simple file with a known path that security researchers can look at to locate an endpoint where vulnerabilities can be disclosed without attempting to email random contacts, tweet them, phone the sales number or hunt down their CIO from LinkedIn (all tactics we have used in the past).

This is a positive step and takes virtually zero development time to implement. Every company, whether e-commerce, government, or security provider should have a security.txt file, though from personal experience many don’t.

How many don’t have one?

It’s easy to identify whether a site has a security.txt file as it is intended to live in …/.well-known/, like ours does. Though putting it in the root directory is certainly valid. The standard also doesn’t disallow the use of a redirect, so it could technically go anywhere, provided one of the below paths returns something valid.

The expected paths for the file are:

  • /.well-known/security.txt
  • /security.txt 

This is trivial to test, by using a wget or cURL (curl) script to loop through several domains and grab both these URLs we can determine if the file exists or not. We ended up writing a custom Python script so that we could thread it for some extra speed.

But then, what targets do we look for? There used to be the Alexa top million websites. Alexa is now owned by Amazon, which means that they have two products called Alexa. Unfortunately, Amazon solved the naming problem by doing a Google-like move and randomly killing off the Alexa top million service. Fortunately, there is now the Majestic million.

One million sites are a lot to go through, so we wanted to begin with a much more targeted approach. We could make our own list of targets, though that would show up innate bias within our team. After a bit of head scratching, we thought that we would see how well this has been pushed by NCSC against UK Government sites.

The Cabinet Office provide a list of all UK .gov.uk domains as of 29 October 2019 here. This is obviously outdated and incomplete but is the best we have now.

Initial results

We have a script and we have a list of 3026 domains (one more than the official list as we added gov.uk to it). After an initial run taking approximately 3 minutes, followed by some tuning (for false-positives caused by some sites incorrectly returning a 200 HTTP header rather than a 404), resulting in the following data:

gov.uk results

The tables of results for .well-known/security.txt provides many of the results, with a search for just security.txt providing four extra valid files.

Just to note, we’re assuming here that a valid security.txt file has at least one line containing the regular expression “[Cc]ontact:“. If we receive a 200 response without matching this pattern it is assumed to be invalid. These are generally sites which ignore the RFCs and return a 200 for every request.

Other responses include multiple weird and wonderful HTTP response codes including 410 (Gone), 400 (Bad Request), 300 (Multiple Choices), 503 (Service Unavailable) and some specific Cloudflare codes.

So, overall we found that only 2% of the sites we reviewed had it.

The wider internet

Taking a slice of the top 10,000 sites according to the majestic millions taken on the 22nd of June 2022 shows a slightly different picture. For this test we were only looking at .well-known/security.txt:

And in graphical form:

So out of the top 10,000 websites, only 6% have a valid security.txt file.

Who has it and who doesn’t?

The obvious question is; which companies have and which haven’t? Of the 635 valid domains, nearly third of them (195) are Google domains. We spotted other well known names that use them:

  • Facebook
  • Walmart
  • GitHub
  • Amazon
  • Rapid 7

Amongst those missing a security.txt we also have some more household names:

  • Wikipedia
  • Microsoft
  • Tesla
  • Slashdot
  • Cisco

Conclusion

This is isn’t great. Adding a security.txt file is trivial to do and helps anyone trying to disclose a vulnerability. It’s especially disappointing that organisations in the security industry still don’t have it.

For those that are thinking “but we have a bug bounty program”, congratulations but that doesn’t cover the issue. The aim of the file is to make it easy for people to disclose to you. They may not know which bug bounty program you have. Additionally, there are researchers who don’t want to go through a bug bounty program simply because they may lose autonomy over disclosure.

Having a security.txt file is easy and should be a part of your web development strategy. As security researchers who have had their fair share of terrible vulnerability disclosures we know it just makes sense to have one.

We’re not alone in this research BTW. Trickest have been at it as well.