It has been a while since I posted, but I’ve made some headway with my pet project that I wanted to post about. Most modern forensics tools have a single system or sequential paradigm when it comes to analysis. These tools, while very powerful, were not designed to scale. There is another market dealing with similar issues of scale which forensics can learn from.
Since 2006 there has been enormous growth in the Electronic Discovery or eDiscovery market. In a nutshell, eDiscovery is the legal concept and practice of discovery but with electronic documents instead of paper ones. A lot of the industry and its practices are smoke and mirrors in my opinion. There are a lot of acronyms, jargon, and puffed up resumes complicating a fairly simple process – documents are (sometimes) indexed, (usually) converted to a PDF or TIFF format for printing, and are reviewed to determine if they are relevant and if they have to be turned over. The process of converting to PDF or TIFF is actually done via a print driver which is horrendously time consuming since it spools to the hard drive resulting in massive I/O bottlenecks. Some of the EDD software tries to get around this by using primitive clustering setups – mostly involving network shares – to speed up the process.
My primary objective for Black Friar is to develop a tool more suited to investigating multiple data source such as are present in corporate environments where data may be located on any number of workstations, servers, and other storage devices. The first step towards this long term goal is to adapt the clustering mentality from the EDD market into a form usable in forensics (and a bit less primitive.)
Goals for Iteration 1:
- Perform initial triage / indexing of the data source
- Catalogue the data as it is being processed
- Store the data for later retrieval during analysis
- Allow robust search and reporting functionality
- Implement the above using a distributed processing architecture
Development focuses on using existing open source projects to do the “heavy lifting” for general concepts like indexing so the majority of development efforts in future iterations can be focused on the forensics aspects of the analysis. Currently I am integrating the following projects into Black Friar:
Lucene (http://lucene.apache.org/) from the Apache Project is a very powerful, mature indexing technology. This, along with a customized java implementation of strings, is used to perform full text indexing on all files in the disk image with recovery, where possible, of deleted files.
Gluster (http://www.gluster.org/) is a distributed file system operating in FUSE. This is the most experimental part of the project in terms of performance boosting. I am evaluating the use of distributed file systems in various configurations (striped storage, mirrored storage, etc) for performance gains by hosting the raw image in the communal storage space during processing. The investigation into using this technology in this context centers around mitigating I/O bottlenecks using the file system rather than embedding load balancing logic in the application itself. Initial results are positive, but need more work for confirmation. If successful this would allow much more flexibility in customizing storage allocation based on existing resources or project make up without encumbering the processing portion of the project with the details.
The Sleuth Kit (http://www.sleuthkit.org/) is the excellent package of tools from Brian Carrier which almost every open source forensics tool uses for handling disk images.
Libmagic (http://www.darwinsys.com/file/) is a library and part of the file command. This uses magic numbers for guessing file types.
I have a list of other projects I want to consolidate into a single tool as the project progresses, but the above four provide a sufficient base of functionality to get the project going with minimal fuss. The project itself is implemented in java for cross-platform support. Ideally I’d like to make Black Friar’s base functionality easy enough to use that windows only users can get it going, but still be able to support all the functionality I need with minimal c hacking.
Here is a screenshot of the very alpha search tool which will be part of Black Friar. Eventually the tool will also support archive / export of selected files from the raw disk image.