The Long Now Photography Project Phase 2

From crispyneurons

Jump to: navigation, search
Jason Wells Jason Wells crispy neurons crispyneurons photography photo photograph long now project

the visual archeology of kin


Introduction | Phase 1: Scanning | Phase 2: Post-processing


[edit] Post-processing

While you can see that a lot of fuss goes into the scanning process, the cleanup process is also fussy, but in a different way. Once I've completed scanning a set of photos, I move them to a directory called 'raw'. There I add '.tiff' to the end of the filename, since VistaScan doesn't do this for me, and I like to know at a glance what type of file it is. At this point I use a naming convention that I've developed to help document and organize the pictures.

[edit] Naming

Even before I started this project, I knew I would eventually end up with thousands of digital photos. Most digital cameras give meaningless filenames to the photos; for example, my camera names them "dsc0001.jpg', 'dsc0002.jpg', etc. The camera obviously has no understanding of the meaning or content of the picture, so it uses a simple enumeration strategy. But there's no way I'd be satisfied with this. This is what I use instead:

[subject]_[sequence#]_[date].jpg

So for example, the 6th picture of a set of (say) 50 from a trip to Arizona on March 18, 2001 would look like this:

arizona_trip_06_20010318.jpg

I use underscores instead of spaces because different operating systems and tools handle spaces differently, and I wanted to eliminate the variable. The sequence number is preceded by a zero because the set contains more than 10 pictures, and without the zero, the operating system will sort them incorrectly (1, 10, 2, 20, etc.) rather than sequentially (1, 2, 3, etc.). I keep the date in YYYYMMDD format for the same reason. (A tidbit for the time geeks: I learned years later that this is identical to the compact form of the ISO 8601 calendar date format.)

Yes, it all seems incredibly anal. But there's no choice if you want to be able to find the picture again from a mountain of tens of thousands. And there's an additional concern that this naming convention addresses. In this model, the filename provides a small but crucial amount of metadata for the photograph. Any external party can see these filenames and know what the file contains. This is crucial when considering how the files may be copied around in the decades to come. Some solutions to this problem involve a separate database with special proprietary data structure. (This is the route consumer-grade photography programs like iPhoto choose to go.) This is not acceptable, because the photos (the files) will definitely outlast any conceivable viewing or organizing application, and will surely be copied to and from innumerable machines. Therefore all metadata for a given photo must be packaged with the file itself.

I originally designed this filename convention for photographs taken on my digital camera, so the filenames end in '.jpg', as the camera shoots JPEG images. For photographs that were scanned (as opposed to taken on a digital camera), the raw image is stored in uncompressed TIFF format, so I leave the filename extension as '.tiff' for now.

[edit] Format

Next comes the question of archival file format for each photograph. Is TIFF the best long-term archival format for this project? Is JPEG? Perhaps something else? I gave this problem considerable thought, since it's a more or less permanent decision. There are a variety of image file formats to choose from:

  • GIF. This is a very successful and time-honored image format, commonly used for web pages. However, it's a terrible choice for this project. Color depth, for example, is very important for this project, but GIF89a only supports a maximum of 256 colors. By comparison, the raw TIFF images that are produced by the scanner contain millions of colors. So this is a no-go. Another strike against GIF is the existence of patents relating to essential GIF encoding/decoding algorithms. Non-proprietary software is unable to support GIF for this very reason.
  • PNG. From a technical perspective, I could easily have gone with PNG, which handily supports millions of colors, is completely open, and even has really nice image metadata storage. In principle, it's a great way to go. One downside is that it takes a lot of disk space to store a photograph. A 2592 x 1944 image (as produced by my digital camera) takes roughly 6MB of disk space, whereas the JPG images from my camera only require 1.5MB. Another downside is that the format has limited software support outside of the open-source community. (For example, my camera can't produce PNG images directly, and converting from JPG makes no sense). So PNG remains a partial solution for the future, but not a complete solution now.
  • TIFF. A high quality format (lossless, millions of colors, platform independent) that (without adding lossy compression) consume too much disk space. A single scanned photograph in TIFF format can easily be 50MB or larger. In a hundred years, this will be no big deal, but I need the pictures to be practical today as well. They need to fit (by the tens of thousands) on contemporary storage media like CDs and DVDs, as well as hard drives. Compression is important for the foreseeable future. (Note: this has become less of a problem than it was when I first began the project, making TIFF an increasingly suitable choice.)
  • JPEG2000. An excellent graphics file format, perhaps someday. The format is not popular and virtually nothing supports it.
  • JPEG. Platform independent, open, standard, ubiquitous, employs efficient compression, and capable of very high image quality. While it employs a lossy compression scheme, I compensate by saving photos at a very high quality. Purists might condemn this move, but considering the quality of the original prints, algorithmic lossiness is unlikely to be the limiting factor for image quality anyway. JPEG is the best fit for this project.

[edit] Processing

Now that the JPEG file format has been selected as the end result, we're ready to get into the nuts and bolts of taking a raw scanned TIFF image, cleaning it up, and saving it as a JPEG image. This is best done using Adobe Photoshop. Photoshop is a complex tool requiring a fair amount of skill to use well. This isn't a Photoshop tutorial, but I will list the basic steps I go through to process the raw images:

  • Open the raw image. (If multiple photos were scanned together in a single image, copy one out to work on in a separate window.)
  • Rotate the image slightly (if needed). A scanned photograph may be a few degrees out of true when placed on the scanner window. The Rotate feature will correct for this. Use trial and error to get it right.
  • Crop the unwanted cruft from edges of the scanned image.
  • Crank up the image contrast and brightness. Be careful not to sacrifice too much image detail.
  • Adjust the color balance (if it's a color image).
  • Adjust the hue/saturation/brightness (if it's a color image).
  • Apply the Sharpen filter (if it seems helpful).
  • Use the Healing Brush to deal with wrinkles or holes in the print.
  • Resize to a resolution of 2592 x 1944, being sure to constrain proportions. If proportions are not identical to the target resolution, adjust the resolution values so that the smallest side of the photograph is at least 1944 pixels.
  • Save as JPEG ("Maximum" quality) using the filename convention outlined above.

At this point the image is final. Naturally this process is repeated for every scanned image. Each photograph file may be stored in a directory structure. This directory tree may start very simple and becomes more elaborate as it scales to containing hundreds and thousands of photographs.

This is a time consuming methodology, but the results speak for themselves: a digital photographic archive, of a quality as higher or higher than a contemporary digital camera, that can effortlessly be shared with friends and kin.

Personal tools