Download a public website via git errors 33

May 11, 2020

Are you using Git? Do you publicly disclose .git files on your webroot? If developers clone directly into the webroot during push of an application or website, meta-data left behind by Git repo management can be abused to download all of the application’s source code files. These types of issues can be identified by browsing to http://[website]/.git/config. If this returns any information at all, its likely that your application can and has been downloaded in the past.

How

There are three kind of objects in a git repository

  • Blob - The actual data (e.g. sourcecode)
  • Tree - Grouping blobs together
  • Commit - A specific state of a tree with more meta information (e.g. author/date/message)

All these together are used by git under the hood to maintain the repository. However, the problem that we face is, that these objects are stored as .git/objects/[First-2-bytes]/[Last-38-bytes] files, where [First-2-bytes][last-38-bytes] is the SHA1-hash of the object. We need to be smart and guess/extract the filenames of all objects to completely restore the repository, because brute forcing the SHA1 keyspace isn’t a good idea as it would be too time consuming.

What helps us a lot is the fact that there are some standard files in a git repository:

  • HEAD
  • objects/info/packs
  • description
  • config
  • COMMIT_EDITMSG
  • index
  • packed-refs
  • refs/heads/master
  • refs/remotes/origin/HEAD
  • refs/stash
  • logs/HEAD
  • logs/refs/heads/master
  • logs/refs/remotes/origin/HEAD
  • info/refs
  • info/exclude

These files either refer an object by its hash or another file referencing an object and so on. Thus the easiest way is to start with downloading and parsing the aforementioned files. We need to parse these to continue to download the object files.

So for example, we have downloaded the refs/heads/master file:

> cat .git/refs/heads/master 
6916ae52c0b20b04569c262275d27422fc4fcd34

The reference master points to a commit with the hash 6916ae52c0b20b04569c262275d27422fc4fcd34. After downloading the commit-object from the server (note the url should be .git/objects/69/16ae52c0b20b04569c262275d27422fc4fcd34), we can analyse it further:

> git cat-file -t 6916ae52c0b20b04569c262275d27422fc4fcd34 
commit

This tells us, that the downloaded object is indeed a commit. Let’s get some details about it:

> git cat-file -p 6916ae52c0b20b04569c262275d27422fc4fcd34 
tree fa3887a0b798346c122afdd7c5ecc605bf3c18c0
parent 9264d57c621f66208d689ef653ce8a62c3bccfae

Okay, now we know the hash of the related tree and parent object as well as some information about the author, the committer and the commit message.

We download the tree-object and analyse it:

> git cat-file -p fa3887a0b798346c122afdd7c5ecc605bf3c18c0
040000 tree 532fc6055e09e0a2d5602f4b84c0dbadce1b5f3e        Dumper
040000 tree 077ce769dedcf19d0f063246256e8ae0394fd8df        Extractor
040000 tree d6e1bd4677a256e760cce5ddaa7db7ea6f9a8900        Finder
100644 blob 9670cf17dfeec351c395493058044b9f9dadbe2a        README.md

This tells us which files are stored in that tree. Note that Dumper, Extractor and Finder are also trees (directories). The final step is to download the README.md blob object and cat its content:

> git cat-file -p 9670cf17dfeec351c395493058044b9f9dadbe2a
Git Tools
=============
[...]

We need to take special care of packed files. We can find a list of all packs in .git/objects/info/packs.

> cat .git/objects/info/packs 
P pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack

The appropriate pack file is stored in .git/objects/pack/:

> /usr/bin/ls .git/objects/pack/
pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.idx
pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack

In that case, we need to download both files and then run the following command to extract the packed data:

> git unpack-objects -r < .git/objects/pack/pack-e38660e6be24bb79d8d929ddea3d194e0dd3cd13.pack
Unpacking objects: 100% (15/15), done.

As you can see, by doing this procedure recursively and for every possible hash, which we find in the already downloaded files, we can slowly restore the repository and extract the contents.

Sometimes downloading a specific object will fail, leaving us with an incomplete repository. In that case, we can use git fsck command to search for these missing/broken object files.

Testing

Enter ‘http://site.com/project-path/.git/config’ in your browser URL bar, where ‘project-path’ is the path to your version controlled directory. If you see something like this:

[core]
   repositoryformatversion = 0
   filemode = true
   bare = false
   logallrefupdates = true
[remote "origin"]
   url = git@bitbucket.org:UserName/your-repo.git
   fetch = +refs/heads/*:refs/remotes/origin/*

…you need to take action!

Solution

You should never include configuration or other sensitive files in version control for security reasons – gitignore is there for a reason.

Even if you keep sensitive files outside your git repo, it’s still important to restrict access to the .git directory for public-facing projects. You could move your .git directory outside the document root, so that it is not publicly accessible. This is quite a good solution, though I have found it a bit fiddly when a project uses git submodules.

Another alternative is to selectively block public access to all files under the .git directory.