Bert's Blog

Jan 16

Troublesome Timestamps

A few years ago I was working for a company who would distribute a simple data file to it’s client app. The client app would scan the data file and construct an index of keywords together with the offset into the file for quick searching. As the data file grew, the time taken to index also grew and so it made sense to index the file and distribute both files together in a zip file. I wrote and unit tested a short routine to do this, and second to unpack and verify the files at the client side.

All was fine until we shipped the new software, and then complaints started coming in that the new functionality ‘wasn’t working’. Very baffled, I turn to the code - no more than a few lines - and the tests - again no more than a few lines and couldn’t see the error. Being unable to replicate I asked for some sample zip files, which all worked fine!

It wasn’t until I spoke to an engineer looking at a different problem that he mentioned the client app had begun to index a large file and was taking ages. I immediately directed him to where the zip file is kept, so he could get me a copy to replicate. After a lot of digging I discovered this little gem in the java ZipEntry source code. 

private static long javaToDosTime(long time) {
  Date d = new Date(time);
  int year = d.getYear() + 1900;
  if (year < 1980) {
    return (1 << 21) | (1 << 16);
  }
  return (year - 1980) << 25 | 
    ( d.getMonth() + 1) << 21 |
    d.getDate() << 16 |
    d.getHours() << 11 | 
    d.getMinutes() << 5 |
    d.getSeconds() >> 1;
}

The index file was storing the timestamp of the data file - so it would know to re-index. However, when you zip a file in Java, it’ll loose precision when storing seconds, as it only stores them in 5 bits (2^5=32 which explains the »1).

My solution was to force all data file timestamps to be even before they’re indexed and zipped!