I am in the middle of a chapter in my dissertation on performance results from Discere – currently the section musing about the impact TIFF has on file sizes versus PDF. It still boggles me that TIFF is still so widely used, and I cannot help but think this is for legacy reasons rather than rational ones. In my test results from comparing the Enron corpus in PDF versus in TIFF, I’m seeing an order of magnitude difference between the two file types. On average, TIFFing 1GB of native data is turning into 10-15GB of TIFF. This unnecessary increase has troubling implications with increasing storage resulting in increased data set sizes.

