Digital Asset Management (DAM) with digiKam

Build a system to organize and find your photographs

Protect your authorship and copyright/left

Protect your images from data corruption and loss

A typical DAM workflow

Introduction

...in the end, photographs need a lot of care. I hope it's you who said this.

Can you find your digital photographs when you need them? Or do you spend more time sifting through your hard drive and file cabinets than you would like? Do you have a systematic approach for assigning and tracking content data on your photos? If you make a living as a photographer, do your images bear your copyright and contact information, or do they circulate in the marketplace unprotected? Do you want your future grandchildren to admire your photographs you have taken yesterday? How do you ensure backup and the correctness of your data? How to prepare to change your computer, your hard disk, the software, the operating system and still manage to find your pictures or movies?

What is digital asset management - apart from a buzz word? Digital Asset Management (DAM) refers to every part of the process that follows the taking of the picture, all the way through the final output and permanent storage. Anyone who shoots, scans or stores digital photographs is practicing some form of DAM, but most of us are not doing so in a systematic or efficient way.

We present a tool, a plan and practical advice on how to file, find, protect and re-use photographs, focusing on best practices for digital photographers using digiKam. We cover downloading, renaming, culling, converting, grouping, backing-up, rating, tagging, archiving, optimizing, maintaining and exporting image files.

A generic definition:

"Digital Asset Management ingests, indexes, categorizes, secures, searches, transforms, assembles and exports content that has monetary or cultural value."

And since we're at it another important one:

Metadata is defined as data about data. Metadata is definitional data that provides information about or documentation of other data managed within an application or environment.

In our context here it stands for all information about a photograph.

digiKam with its libraries and plugins is a unique and comprehensive tool to cover most of DAM tasks, and it does it fast and transparently. Based on open standards on all fronts it will not confine you to a platform or application, rather it puts you into a fast track to manage and find your photographs and to move on if you so please to any other platform, application, system without losing any of your work be it as an occasional user, enthusiast or professional.

The one thing that differentiates the archiving capabilities of film vs digital is that with digital you can make as many new originals as you want. With film you only have one original. All copies will have a slightly lower quality, and both originals and copies are more or less slowly aging and disappearing. The only way to keep it "forever fresh" is to make a digital copy of it. And that is also the only way to protect it from all hazards.

Even if digital media today may last shorter than film it is just up to you to make new copies every year, 5, 10 years or whenever necessary, and to always keep at least 2-3 copies of the files, preferably in different physical locations. You never had that opportunity with film. It could always be damaged in a fire, floods or similar - or even be stolen. The good and bad news then is this: if you lose digital images/data it is only your own laxity.

Build a system to organize and find your photographs

Themes: hierarchy, tags, rating, captions, geolocation, date, albums, filenames, versioning, exporting

I dare-say if you have more than 1000 photographs on your computer in no-DAM fashion it takes you too long to find any particular image. And if you don't know how many images are in your files you're surely not using digiKam. The dual approach to store metadata in a database and in the image files guarantees ultra fast searching and secure archiving freely accessible to other applications, platforms and formats.

But as much as there is no such thing as a free lunch, there is no free cataloging or DAM - those who spend the initial time of building a systematic method of their own will be better off as time passes and the number of photographs multiplies. The ROI (return on investment) of DAM has been estimated in different studies to be better than 10. Keep in mind to be concise, plan for the future (30-50y), do it once. The upcoming semantic web will totally integrate into and add value to a DAM environment.

A case for doing DAM with digiKam

digiKam provides a number of methods to classify photographs: filenames, albums, collections, date + time, tags, rating, GPS position and captions. As if this was not enough, in the KDE4 version of digiKam you can search many standard metadata items like camera model, lens, coordinates, image size and many more. Metadata categories as listed here are in fact different 'views' of your photo library. Combining these views is the very powerful method to narrow down the search for a file and to find it quickly. Imagine having 800 photos of your loved one. Searching for 'Mary', having more than '***' rating, shot in 'France' will surely leave you with very few candidates. In terms of selection criteria for a DAM system, digiKam fares very well in terms of completeness, versatility, speed, scalability, accuracy and openness.

The key thing to remember is that you don't know how you or somebody else will try to find an image 2 years ahead of our time. You will remember past events in a different context, it's a fact of life. So if can narrow down your search by remembering place or time or camera or theme or rating or owner you stand an infinitely better chance to find it quickly than by just one of those criteria or none. At the beginning, at the time of taking a photograph, all metadata is in your head (except for the EXIF data). If you do not transcribe some of it into your DAM system, it will be lost eventually as much as every event fades into oblivion over time.

One distinction has to be interjected here between private and public metadata. One could say that all file-embedded attributes are potentially public since the images may be exported, sold, and copied to other places and people. On the other hand all non-embedded metadata in the database can be considered private as they stay in the database and go nowhere else. By adjusting digiKam's settings accordingly you can control what kind of data remains private and what will be embedded and eventually become public.

Build the archive: Folder organization, physical layout as information

The first thing to do and to know before you put anything onto your system is a to build an information structure (as opposed to data structure). Your image files have to be somehow organized within the computer, you have to decide if others should have access to your photographs (sharing), if you put them on a dedicated drive, on a network drive etc. Keep in mind that you have to migrate one day onto some bigger volume.

The organization should be simple, unified and scalable, and it should be independent of the storage medium on which you host them. Do not make the folders too small, several thousand images in one folder is not too much to ask for, but keep them small enough so that they can fit into a backup medium like a DVD 4.7 or 9.4 GB for double sided ones. Remember that the archive will grow all the time! The concrete type of structure depends on your use case of course: Lets take a simple yet frequent example: you are a casual photographer taking pictures of your private life, your family, holidays and so on. It could be efficient to create a structure based on years plus some holiday and export containers. It could look like this:

2006
2007
2008
Holidays
  - A
  - B
  - C
Export
Fun stuff

Maybe you'll be happy with this structure. Holiday pictures can be quickly found by its location (unless you go to the same place every year), the rest will be organized by date. If you shoot enough pictures you want to create sub folders below the years as months e.g. 2008-01, 2008-02 etc. 'Export' would be a container for images to print or to put onto a website.

The more professional photographer will have very different needs as there will be versions of photographs, archives, workflows, a constant influx of images of diverging themes, and a large quantity of everything. Within 10 year you'll have 95% archives and 5% work space files and you don't want to organize your structure around content!

The consideration are these:

  • what kind of files go together? Segregation of file type makes batch processing easier. Keep new and old files separate.

  • How can you make that structure scalable?

  • Segregation of original and working files makes it easier to allocate the backup strategy and migration. You will always know if you look for an original or a derivative.

TBC

Automatic metadata generation

How to go about all this metadata business? Firstly, there are already a lot of automatically generated metadata: EXIF data and Makernotes. If you have configured digiKam with your identity section all imported images will be imprinted with this data set which includes copyrights, all automatic. If you have a GPS track recorded in parallel to your taking the photographs, you can geolocate those images in a single action using the Geolocation plugin. Even if you brought back 1000 images from a shooting session, so far you'll not have spent more than 10 minutes to do all that. And by now you have all camera settings of every shot, lens data like zoom, focus, aperture etc., date and time, shooting location, copyrights, authorship, program used, and more. Not bad, isn't it? But we could have done more during the importing, we could have changed the file names to include the date, or place or theme, we could have changed the format to a lossless 16 bit per channel format, we could have automatically separated JPEG and RAW files into their folders. I actually recommend to auto-rename to match an event, a place or a theme. digiKam provides all date/calendar related grouping so that there's hardly a need for coding the date into the file name. Unless you'd like to do just that to browse your albums with another application that is not calendar savvy. You will buy a new camera one day or you have a second one already, sooner than you believe. The numbering scheme of that new camera will start over at typically IMG_0001.JPG again, creating identical file names to the ones you have already if you do not rename them. By renaming you lessen the chance of inadvertently overwriting them at a later date. Keep the new names clean, use alphanumerics, dashes, underscores and a single period prior to the file extension.

I also recommend to switch-on the 'save metadata' options in the digiKam settings page for metadata. This will ensure that EXIF and IPTC data is written into the file. If you forgot to do that you can always catch up by copying the metadata in the database to the files in one go (from the album menu).

Now we have a lot of stuff already in our database, but what if I need to change some of it? digiKam provides a metadata editor for a selected number of attributes, the most important ones of course.

The real works begins here as we will apply tags, captions and a rating to every photograph. Of course, all images requiring the same attribute can be treated as a selection in one action. Lets start with rating or ranking. It's best to start with ranking because for further work you can concentrate on the good shots.

Rating/Ranking

A ranking systematic is implemented in digiKam by the 5 star rating tool. In fact there are 6 levels, zero through five stars (*) can be attributed (when saving them into IPTC metadata a translation of levels ensures compatibility with other programs). Rating is rapidly applied with digiKam using keyboard shortcuts or the mouse on single photographs or whole selections. The rating can then be entered as a search criterion or directly from the status bar quick filters. However, before you start attributing stars everywhere take a moment to establish personal criteria for ranking. Best practice is to write down your personal match of stars to some qualitative expression, that will define what you actually mean when giving 5 stars. Generally there should be much less images rated with increasing star assignment. A ratio of 3-10 between each level has proven useful. That will get you quite far in distinguishing your rating pyramid. Say, you choose a ratio of 7 between levels. For every 5 star image you'll then have 7 4 stars, 49 3 stars and so on, resulting in almost 20000 pictures. Amazing? Yes, and 16807 of them you didn't have to rate at all! You even can define a different rating scheme depending on the kind of use, 2 stars for commercial use, may mean something else than 2 stars holiday photos. It is also a good practice to define a neutral rating, everything below is actually a negative rating. This will help you culling and thinning your collection very efficiently. Or you could define purposes to ratings, say 0 stars for 'can throw away', 1 star for images in quarantine (decide later), 2 stars for gallery export, 3 stars for printing, 4 stars for selling, 5 stars for 'have to work on', as you please. It must suit your needs. The following table illustrates a possible evolution for a professional photographer using a ranking ratio of roughly 7 over the next 12 years. It is evident that the good shots can be easily found, even within millions of photos.

Ranking

Lets continue with tags (or keywords as called by other applications, or categories, they are all synonymous).

Tagging, Keyword assignment

Tags are a hierarchical labeling system that you create as you add to it. The important thing to do is to create a system that suits your needs and habits. Are you a (semi)professional who wants to sell photographs to agencies, do you want to publish on a web gallery, or are you just the occasional amateur managing the visual family memory? For all these different use cases you want to design a tag structure that is adapted to it. If you configure it so, digiKam will write the whole hierarchy into IPTC fields so that they can be used by your photographic agency using a different application of to automatically create title and caption for web exports. In any case it will serve you well to quickly find a specific picture again.

The hierarchy will provide you with automatic groupings. For example, if you start a typical private use hierarchy with 'Activities', 'People', 'Places', 'Themes' and 'Projects' on the top level, everything you tag with a sub-tag of these will be grouped together into a virtual album. digiKam has a dedicated view in the left sidebar for these virtual albums. But it comes even better! As you continue adding sub-tags into the hierarchies, not only will you be able to search and quick-filter for them, the right sidebar tag filter allows you to select combinations of tag groups. Lets say in the left sidebar tag panel you select the virtual album 'People' and you have 12 different tags for people in there, then you can combine it with the right sidebar and just choose 'Peter', 'Paul' and 'Mary' out of the 12.

In the long run you will not remember the details of your pictures and their subject (essentially the metadata in you brain will break down). It is therefore paramount that you choose general and generic categories. You will aways remember that a particular shot was set at a river bank in a country or continent (-> river, continent), but you'll have forgotten which river it was. Instead of only tagging it with 'Okavango' you tag it with river/Africa or river/South Africa. The details you can either put into a tag as well or into the captions. A trick may help you: How would you search for that river with an Internet search engine? That's the way to go!

Another categorization might be task-oriented as in 'print jobs', 'web export', 'personal', 'galleryXYZ', 'clients', 'slideshow' etc. Create groups as you need them but not more, you should be able to remember by heart the top level tags at least, otherwise the differentiation will become useless. Don't forget that you have all the other attributes to narrow down the search. The right sidebar tag filter combines with any view of left sidebar (albums, calendar, timeline, tag and search).

When you import cataloged images from other sources having embedded tags already, digiKam will automatically create the trees for you, respectively insert it into the right place. Rearranging the hierarchy within the tree is no problem, you can do that easily by dragging and dropping a sub-tree to another place in the hierarchy. The changed tags will be updated as digiKam ripples down the branches.

The graphics here shows how different metadata overlap. This is a very coarse representation, as each block of metadata will in itself be subdivided into many sections. File names and calendar data are properties of all images.

Ranking

Enough of tags - lets move on to captions or comments, the third major tool for metadata cataloging.

Captions/Comments

This is already the 4th kind of metadata we present here. What is the distinction of captions compared to tags (comments can be used synonymously, but the IPTC vocabulary stipulates the term 'caption'), keywords? Where tags owe to a hierarchical and generalized description, captions are the opposite: prose description, details, anecdotal stuff. Tags foremostly serve the finding, retrieval and grouping of assets, whereas captions shall entertain, inform, touch the beholder. Naturally they can also be used to filter the catalog, but this is just a byproduct. Captions are to remember the story, the event, the emotions, it's what makes photographs much more interesting to look at, captions put photographs into a context and meaning. If the pictures are an aesthetic statement, caption should be the emotional and informational complement.

You rarely want nobody to see your photographs. You rather want to share them with friends, your family, other photographers, agencies, put them onto the Internet. And don't tell me you're not interested as to how your photos are being received!

So you might have to most beautiful portrait, sunset or landscape and nobody seems to care. Why is that? Look at some good photographs yourself without reading the title, comment or background information. How many of you are interested in depth of field, exposure time, white balance etc.? Some, of course. But anybody will be interested in the story the pictures tell, you want to remember a photograph, meaningless images bombard us too much anyways. You have to give the viewer something that explains it all.

Lets look at this panorama. From far it is not even a nice beach panorama. If you go closer you start to see some details, people, the space.

Ranking

And now I tell you that this is the Allies landing site "Omaha Beach" in the French Normandie 60 years after the disembarkation. Wow! One starts to dream, have associations, memories, the historical time span is present, you may hear the silence. The caption has totally reframed to perception of this panorama.

For others to appreciate your photographs, the title is probably more important than the image itself for the interest it creates. When you show pictures, tell a story. Remember that the key is to convey the meaning to viewers, to help them understand what you understand about the subject and what moved you.

  • let people know what you understand about the subject, why you love it

  • create a red line between the photographs

  • oppose or relate them to different epochs

  • take notes shortly after shooting to remember

  • contemplate, research, watch, and talk - but mostly listen.

  • it's okay if the image is less than perfect because it has the strength to stand on its own merit described in the caption.

With digiKam you can enter unlimited amounts of text using internationalized alphabet (UTF-8) as caption. You can enter it for a selection of photos at the same time. KDE even provides a spell checker. When you export images to web galleries, the captions will be exported at choice into either/or/and caption/title of the web gallery system, no need to re-write the story for publishing.

Geolocation (geo-tagging)

Do you still remember the times before GPS? When you would find your way to another city without navigation system? Wasn't the earth a dull blue ball before GoogleEarth? Well then, with images, the train of spatial representation is running at cruising speed alright.

A few cameras have a GPS receiver built-in, the images come tagged with 3-dimensional coordinates. And with almost any GPS device you're able to extract a trace (of course the receiver needs to be switched-on and carried with you whilst taking the photographs, and for good matching the camera time must be accurately set) and save it onto a computer. You have to store it in gpx format, that's easily done with gpsbabel, gpsman and other tools. You then can automatically match a whole bunch of photos with that track using digiKam. The coordinates are written into the JFIF part of JPG files (settings choice) and into the database. The KDE4 version will enable searches based on locations and coordinates, you can create virtual albums of geographical areas! In the right sidebar under the metadata tab you'll find your image located on a local zoom of the world map. A further click brings on anyone of several mapping services on the web, zooming in on details. Even if you don't have a GPS trace you can geo-tag multiple images with a geo-editor. Just navigate on the map to the spot of shooting and click to fix it as a geo-tag.

e.g. conversion of a Garmin track with file name 'xyz':

$ gpsbabel -w -i mapsource -f xyz.mps -o gpx -F xyz.gpx

The possibilities of exploiting this geolocation are already innumerable and will become pervasive in the future. I'm sure one day not too far away we can revisit in a virtual reality our travels through geo-tagged pictures. The digiKam features include exporting to kml files that can be opened by GoogleEarth (which in turn will show the photos on their shooting site), exporting to gallery2, picasaweb, flickr etc. with GoogleMaps viewer and more.

Protect your authorship and copyleft/right

Themes: watermarking, IPTC and XMP authorship data, export size

This will be the last chapter and step to mark your digital library with authorship, ownership and copyright or -left information. More than in 'the good(?) old days' of paper copies, the ubiquitous Internet makes it just too easy to 'steal' a picture from a web site. At the very least, for all images that will be exported and/or published in any form, the authorship and copyright information should be part of their metadata. Nothing more simple to do with digiKam: you can setup the default identity, and any images ingested be digiKam will be automatically informed. I put copyleft in the title for a reason (citation from wikipedia):

"Copyleft is a play on the word copyright and is the practice of using copyright law to remove restrictions on distributing copies and modified versions of a work for others and requiring that the same freedoms be preserved in modified versions.

Copyleft is a form of licensing and may be used to modify copyrights for works such as ... music, and art. In general, copyright law allows an author to prohibit others from reproducing, adapting, or distributing copies of the author's work. In contrast, an author may, through a copyleft licensing scheme, give every person who receives a copy of a work permission to reproduce, adapt or distribute the work as long as any resulting copies or adaptations are also bound by the same copyleft licensing scheme. A widely used and originating copyleft license is the GNU General Public License. Similar licenses are available through Creative Commons - called Share-alike."

And here follows a description of what should be supplied to digiKam's setup page as information:

Author (synonymous with Creator and By-line): This field should contain your name, or the name of the person who created the photograph. If it is not appropriate to add the name of the photographer (for example, if the identity of the photographer needs to be protected) the name of a company or organization can also be used. Once saved, this field should not be changed by anyone. This field does not support the use of commas or semi-colons as separator.

Author title (synonymous with By-line title): Linked to Author. This field should contain the job title of the photographer. Examples might include titles such as: Staff Photographer, Freelance Photographer, or Independent Commercial Photographer. Since this is a qualifier for the Author field, the Author field must also be filled out.

Credit (synonymous to Provider): Use the Provider field to identify who is providing the photograph. This does not necessarily have to be the author. If a photographer is working for a news agency such as Reuters or the Associated Press, these organizations could be listed here as they are 'providing' the image for use by others. If the image is a stock photograph, then the group (agency) involved in supplying the image should be listed here.

Source: The Source field should be used to identify the original owner or copyright holder of the photograph. The value of this field should never be changed after the information is entered following the image's creation. You should consider this to be a write-once field. The source could be an individual, an agency, or a member of an agency. To aid in later searches, I suggest to separate any slashes '/' with a blank space. Use the form 'photographer / agency' rather than 'photographer/agency.' Source may also be different from Creator and from the names listed in the Copyright Notice.

Copyright Notice: The Copyright Notice should contain any necessary copyright notice for claiming the intellectual property, and should identify the current owner(s) of the copyright for the photograph. Usually, this would be the photographer, but if the image was done by an employee or as work-for-hire, then the agency or company should be listed. Use the form appropriate to your country. For the United States you would typically follow the form of © {date of first publication} name of copyright owner, as in 'copr 2005 John Doe.' The word 'copyright' or the abbreviation 'copr' shall be used in place of the (c) symbol as ASCII characters only are allowed. In some foreign countries only the copyright symbol is recognized and the abbreviation does not work. Using something like (c) where the parentheses form a partial circle is not sufficient. For additional protection worldwide, use of the phrase, 'all rights reserved' following the notice above is encouraged. In Europe you would use: Copyright {Year} {copyright owner}, all rights reserved. In Japan, for maximum protection, the following three items should appear in the copyright field of the IPTC Core: (a) the word, Copyright; (b) year of the first publication; and (c) name of the author. You may also wish to include the phrase 'all rights reserved.'

Whereas it is paramount to fill-in the author and copyright sections, they represent no protection against fraud. Anyone with a bit more than basic computer knowledge is able to delete or modify image embedded metadata. The solution to this problem is called 'digital watermarking'. To private persons this might be of little interest for the majority of photographs, but for professionals ans semi-professionals this protection is really important.

Digital Watermarking (DW)

Digital Watermarking refers to an invisible digital watermark that is being impressed on photographs as an element of digital rights management (DRM). The watermark contains the same information of authorship and copyright as described above, but the metadata is encrypted and saved in the actual image data (as opposed to the metadata section which is a separate section within the image file). This invisible imprint has holographic properties so that modifications done to an image (size, color, crop, up to a certain limit) will not destroy the copyright information. Only when an image is resized to a very small fraction like a thumbnail will the embedded information be lost, but then the image is of no value anymore to the copyright infringing party.

The digital watermark will be unique per image. digiKam will provide a plugin for DW in the near future that features batch processing.

Protect your images from data corruption and loss

Themes: disk errors, disk failures, power surges, ECC, transmission errors, storage media deterioration, recovery, redundancy, disaster prevention, lifetime, temperature, data size, common myths

What are then the main factors of digital data loss?

Of course we're not talking about losing CDs on the road or in a fire - that kind of loss is just the same as traditional paper copies or negatives. We are talking about problems with the so called "New Media".

Problems with digital data can roughly be categorized into the following areas of concern:

  1. the physical deterioration of the media (all media deteriorate at different time scales)

  2. undetected transmission errors during data transfer

  3. the lack of support for long-date, undoubtedly proprietary, digital formats

  4. ancient hardware.

Kroll Ontrack, the worlds largest data recovery firm, have some interesting statistics on what actually causes data loss.

Cause of data lossPerceptionReality
Hardware or system problem78%56%
Human error11%26%
Software corruption or problem 7%9%
Computer viruses2%4%
Disaster1-2%1-2%

So let us analyze those cases step by step!

Physical deterioration

CD, DVD, optical drives

Physical deterioration of the media happens more rapidly with paper and CD-Rs than the average of film. Yet while film lasts longer (sometimes decades longer) than other forms of media, the right kind of backup of digital media never loses anything. Film decays - digital 1's and 0's do not, and film starts to decay the moment it's created and developed. It will never have the same color, contrast, etc. that it did have a moment before. Digital doesn't do that. However, digital is susceptible to corruption! And yes, physical media such as floppies and magnetic hard drives are also susceptible to the decay of the medium, just like CDs are. They just last longer.

To combat the problem of CDs/DVDs, they need to be properly cared for and not trusted for more than a few years. Thankfully you can purchase archive-quality CDs and DVDs which last longer, though they are much more difficult to obtain and are much more expensive. There are offers out there for gold-plated DVDs, $2 a piece claiming 100 years storage life (if you care to believe it).

CD/DVD disks may become unreadable, but you can reduce the risk using good disks and a good recorder, and storing them in a correct way. The best DVD recorders are not much more expensive than the cheapest, but they write in a much more reliable way. It's a matter of choosing the right one.

Essentially, CDs and DVDs are very prone to errors, even in a freshly written state. That's why they are heavily protected with a checksum mechanism (75% of data are effective data, the rest is formatting and checksum overhead). But even with that massive amount of protection they will suffer deterioration from chemical aging, ultra-violet exposure, scratches, dust, etc.

For damaged CDs and DVDs, there is an inexpensive program called IsoBuster from which will do seeming miracles on CDs and DVDs. It runs on Windows and Linux®; but not (yet) on Macs. Similarly, there are applications designed to get data from damaged floppies, hard drives, flash media such as camera memory and USB drives, and so forth.

Optical media: Blu-ray disks seem to win the format war against 'HD DVD'. A dual-layer Blu-ray disc can store 50 GB, almost six times the capacity of a dual layer DVD at 8.5 GB. Everything that has been said about CDs/DVDs applies to Blu-ray disks as well.

Best practice:

Burn them slowly with a good recorder on archive quality media in an open, non-proprietary format, read the data back to verify, label them with some descriptive text + date & author, lock them away where it is clean, dark, animal safe and dry. And do not forget to copy them over to the next generation of media before you throw away your last piece of hardware or software able to read them.

Hard disks (hard drives, HDD)

Disk manufacturers keep their statistics to themselves. A manufacturer guaranty buys you a new disk, but no data. Google for one has done a large scale study on HDD failure mechanisms: Disk Failures study

In a nutshell: Disks run longest when operating between 35°C and 45°C, at lower temperatures the error rates increases dramatically. Controller parts (electronics) are the foremost sources of failure, SMART does not diagnose any of this. Some SMART errors are indicative of imminent failure, in particular scan errors and relocation counts. Lifetime expectancy is 4-5 years.

But all depends much on the real use case and some luck. For example I have a Fujitsu notebook that is running 24/7 since 1998, almost ten years without the slightest hick up. Just luck? In general and contrary to intuition or ecological considerations, running a hard drive permanently results in a longer lifetime than switching it on and off all the time. It has even been reported that aggressive power management spinning down the drive can harm it quickly. Making it working hard shortens the lifetime somewhat. The worst factors for HDD probably are vibrations, shocks, and cold temperatures.

If your disk is making weird noises, normal file recovery software isn’t going to work. Do a quick backup if that is going to happen to you. (Use dd utility if possible, not a normal file backup since dd reads in a smooth, spiraling stream from beginning to end and doesn't stress the mechanics). There are specialist companies that can recover data from otherwise destroyed drive, but they are costly, plan for 2000$ minimum charge.

Power surges

As much as 1% of all computers are affected by lightning and power surges every year.

(This is about total data loss due to power surges. Of course you can have the occasional data loss due to power loss before saving files. But those losses can normally be restored without major difficulty.)

You don’t have to wait for the next thunderstorm to be concerned about how a sudden fluctuation in electric power may affect your computer system. Recent statistics have shown that as much as 63 percent of all electronics casualties are due to power problems, and most computers are subject to two or more power anomalies a day. Since power surges or blackouts can occur anywhere and at any time, it only makes sense to protect your computer by investing in some sort of surge protection device.

How surges happen

A power surge occurs when the power line voltage increases over nominal values for more than 10 milliseconds. Sixty percent of all power surges are caused from within the home or office, generally when a device with a motor (such as a hair dryer, refrigerator, or water pump) shuts off and the power it was using is diverted elsewhere as excess voltage. The remaining 40 percent of power surges are generated by factors such as lightning, utility grid switching, line slapping, poor wiring, and so on.

While most average electricity-using devices are not affected by power surges, devices relying on computer chips and high-speed microprocessors are susceptible to serious damage. For your computer, power anomalies can result in keyboard lockup, complete data loss, hardware degradation, damaged motherboards, and more. Failure to protect yourself from the inevitable can result in a loss of both time and money.

Surge protectors

The most common defense against power surges is a surge protector or suppressor, a device that works by absorbing some of the excess energy and diverting the rest of it to the ground. These are usually found in the form of a power strip (one of those long devices that have six or so outlets and a single, grounded plug). Bear in mind, however, that not every power strip serves as a surge protector.

When selecting your surge protector, you want to be sure it is listed as meeting the UL 1449 standard, which guarantees a certain minimum of protection. You should also look for one that offers protection against lightning (not every one does) and provides insurance for equipment that is properly attached.

Because a power surge can follow any path to your computer, be sure that each peripheral connected to your system is protected. This includes your phone line or cable modem, as power can surge through these routes as well. A number of manufacturers are now producing surge suppressors that feature a phone jack for your modem along with the electrical outlets, while others have coaxial cable jacks for those who use a cable modem or TV tuner card.

If you have a notebook computer, you will want to carry a surge suppressor as well. A variety of suppressors designed specifically for notebooks are available, small in size and possessing both electric and phone outlets that make them ideal for use on the road.

Uninterruptible power supply (UPS)

While a surge suppressor will protect your system from minor fluctuations in the power lines, it won’t help you if the power should black out completely. Even an outage of just a few seconds can result in the loss of valuable data, so you might find it worthwhile to invest in an uninterruptible power supply.

Besides serving as surge suppressors, these devices automatically switch to battery power when a power outage occurs, giving you the opportunity to save data and shut down your system. Some models will even allow you to keep working until power is restored. When purchasing a UPS, be sure that it has the same qualities that you would seek in a surge suppressor, but also check out the battery life and included software.

Considering the potential risk to your computing system, ensuring its safety from power disturbances is a worthwhile investment. A quality surge suppressor will cost you upward of €20, a 500W UPS can be had for less than €40. It’s a small cost to pay for the peace of mind you’ll gain knowing your computer is well protected. In the very least pull all lines to your computer when you go on holidays.

Solid state drives: USB sticks, memory cards, flash disks

SSDs are mechanically more robust than HDDs and suffer much less on that front when they are plugged into the computer. But since they are mostly mobile devices, their exposure to drops, accidents and electrostatic discharges is much higher. So, for different reasons, SDDs are as likely to fail if not more than hard drives. Add the danger of theft and longevity and limited capacity, and SDDs will become prohibitive as permanent data storage devices.

One major cause for data loss (often recoverable) is the unsafe removal of SDDs from a computer. Before data is saved from a computer memory to any attached device, it remains for some time in buffers. In hard drives this means seconds at most, whereas with SDDs it can be tens of minutes. Therefore, before you disconnect a flash device, always activate data flushing through software (often called "safely remove device").

There is a new technology trend coming up, to replace hard drives with SSD flash drives. By 2010 they may be competitive in price to HDDs. Data retention is an issue with SDDs, it cannot be overwritten an infinite amount of times. SDDs wear in use. Wear then depends much on the location data is written, and how often it is written. Linux® has developed a special driver avoiding writing to the same spot too often. But this is all premature information. Keep your eyes and ears open.

Magnetic media

Magnetic tapes are used in backup systems, much more in professional environments than in home use. Tapes have issues with data retention and changing technology, but they are safer in one aspect than CDs and DVDs: they are less exposed to scratches and dirt and writing deficiencies. On the other hand they are susceptible to magnetic fields. Throw a magnet next to a tape and it's gone! Tapes should be re-copied every 5-8 years, otherwise too many bits will fail and escape the checksum protection. The downside of magnetic tapes is often the recorder price and the restore time (20x longer than from HDD). Tape backup system have seen their best days.

Saveguarding against logical errors

Web storage services

Amazon Web Services includes S3 - Simple Storage Service. With appropriate configuration, you can mount S3 as a drive on Linux®, Mac, and Windows systems, allowing you to use it as a backup destination for your favorite software. Google Shared Storage is another popular offer where one can store infinite amount of data.

It is expensive compared to hard drives at home - 40 GB cost $75 a year, 400 GB cost $500. And you have to transfer the images over the (a comparatively slow) Internet.

I think as a safeguard against localized data loss of the most essential images it's not a bad idea at all, but it is not a general backup solution, much too slow for that.

Picasaweb (Google), Flickr (Yahoo) and Foto-Community 23hq.com provide online storage services specialist on photographie. Their free space is limited to 1 GB and you don't want to have full resolution images online. But the pro-accounts offer more, in the case of Flickr, dramatically more. For a mere 25$ a year you get unlimited (sic! reality check needed here) space.

In terms of data retention the web space solution is probably pretty safe. Transmission errors are corrected (thanks to the TCP protocol) and the big companies usually have backup included plus distributed storage so that they are disaster proof within themselves.

Transmission Errors

Data does not only get lost from storage devices, it also gets lost when traveling inside the computer or across networks (although the network traffic itself via TCP is error protected). Errors occur on buses and in memory spaces. Consumer hardware has no protection against those bit errors, whereas it is worthwhile to look into such. You can buy ECC (error code correction) protected memory (which is expensive, granted). With ECC RAM at least the memory will be scrubbed for single bit errors and corrected. Double bit errors would escape that scheme but they occur too infrequently.

Transmission errors

This diagram depicts the transmission chain elements in a computer, all transitions are susceptible to transmission errors. The zfs and btrfs file system at least ensure the OS to disk path of data integrity.

The Byte Error Rate (BER) for memory and transmission channels is in the order of 1 in 10 Million (10E-7 bit). That just means that 1 in 3000 images has an error only due to transmission problems. Now how dramatic that is for an image is left to chance, it could mean that the image is destroyed or that a pixel somewhere changed its value, due to the compression used on almost all images one cannot predict the gravity of a single bit error impact. Often one sees some partial images instead of the full image.

The worst of all that is that nobody tells you when a transmission error occurs, not your hardware. All those glitches go down unheard until one day you open the photograph, and to your surprise it's broken. It is quite worrisome that there should be no protection within a computer, nobody seems to have thought of it. The Internet (TCP protocol) is much saver as a data path than inside a computer.

Flaky power supplies are another source of transmission losses because they create interference with the data streams. With normal files systems those errors go unnoticed.

Expected error rate increasing with complexity

Even if you are not overly concerned today with transmission problems, have a look into the future at illustration. Already in 2010 we'll see thousands of errors per year!

'Oracle' or 'Rising Sun' at the file system horizon?

ZFS from Sun Microsystems seems to be one of two candidates to deal with disk errors on a low level, and it is highly scalable. It is Open Source, heavily patented, comes with an GPL incompatible license, and is available on Solaris and Leopard. Let us hope that it will soon be available for Linux® and Windows (article).

This is for the courageous ones. Fuse ZFS

Oracle has also started an initiative with its btrfs file system, which still is in an alpha stage. It employs the same protection technique as zfs does, and it's available on Linux®, although it is not yet part of the stock kernel.

Human errors

Theft and accidents

Do not underestimate it! Those two factor account for 86% of notebook and 46% for desktop system data losses. For notebooks, theft counts for 50% alone.

Malware

Data loss due to viruses is less grave than common wisdom make you believe. It accounts for less damage than theft or re-installations, for example. And it is limited to Microsoft OS users. Apple users experience very few viruses and under Linux® they haven't been around for quite some time now.

Panic is a factor in data loss

Human error, as in everything, is a major problem in data loss. Take a deep breath and stop! Panic is a common reaction, and people do really stupid things. Experienced users will pull the wrong drive from a RAID array or reformat a drive, destroying all their information. Acting without thinking is dangerous to your data. Stop stressing about the loss and don’t do anything to the disk. Better yet, stop using the computer until you have a plan. Sit down and explain you plan to a laymen or better, laywoman. You will be amazed how many stupid ideas you'll discover yourself in such an exercise.

If your disk is making weird noises, normal file recovery software isn’t going to work. Do a quick backup if that is going to happen to you. If the drive is still spinning and you can’t find your data, look for a data recovery utility and backup to another computer or drive. (Non-Linux® users: Google for "free data recovery software" for some options, including one from Ontrack). The important thing is to download them onto another drive, either on another computer, or onto a USB thumb drive or hard disk. It is good practice to save the recovered data to another disk. dd is your friend on *nix systems.

Common myths dispelled

I'd like to dispel some common myths:

  • Open Source file systems are less prone to data loss than proprietary systems: Wrong, NTFS is rather a tiny notch better than ext3, ReiserFs, JFS, XFS, to name just the most popular file systems that often come as default FS with distributions. A brilliant article about it is here: link

  • Journaling files systems prevent data corruption/loss: Wrong, they only speed up the scan process in case of a sudden interrupt during operation and prevent ambiguous states. But if a file was not entirely saved before the mishap, it'll be lost.

  • RAID systems prevent data corruption/loss: Mostly wrong, RAID0 and 1 prevent you from nothing, RAID5 can prevent data loss due to disk-failures (but not from disk or file system errors). Many low-end RAID controllers (most mother board controllers are) don’t report problems, figuring you’ll never notice. If you do notice, months later, what is the chance that you’ll know it was the controller’s fault? One insidious problem is corruption of RAID 5 parity data. It is pretty simple to check a file by reading it and matching the metadata. Checking parity data is much more difficult, so you typically won’t see parity errors until a rebuild. Then, of course, it is too late.

  • Viruses are the biggest thread to digital data: Wrong. Theft, and human errors are the primary cause of data loss.

Make your budget: Data size, required storage volume estimation

Digital camera sensors are 1-2 aperture stops away from fundamental physical limitations. What I mean is this: as technology evolves, there is a natural limit to its progress. Sensitivity and noise characteristics for any kind of light sensor are not far from that limit.

Today's cameras tend towards 10 mega pixels sensors, although this resolution is already too high for compact cameras and deteriorates the end result. Given the sensor size and quality of optics, 6 mega pixels are optimum for compact cameras. Even DSLR cameras run into their limits at 10-12 mega pixels, for higher resolutions one has to go for full frame sensors (24x36mm) or even bigger formats.

So, taking into account the manufacturer mega pixel propaganda it seems save to say that the bulk of future cameras will see less than 20 mega pixels. This gives us an estimation for the necessary storage space per photograph in the long run: <15 MB per image. Even if file versioning will be introduced (grouping of variations of a photograph under one file reference), the trend is to implement scripting of changes so that a small overhead will be recorded only and not a whole different image per version. With faster hardware this concept will see it's maturity quite soon.

In order to estimate the amount of storage space you have to plan for, simply determine the number of photographs you take per year (easy with digiKam's timeline sidebar) and multiply it by 15 MB. Most users will keep less than 2000 pictures per year which requires less than 30 GB/year. Assuming that you will change your hard disk (or whatever media in the future) every 4-5 years, the natural increase in storage capacity will suffice to keep you afloat.

The more ambitious ones out there will need more space, much more maybe. Think of buying a file server, Giga-Ethernet comes integrated into motherboards today and it's a flick to fetch the files over the local network. Speaking about modern mobos: they now have external SATA connectors. This makes it really a trifle to buy an external SATA drive and hook it up to your machine. 1000 GB drives will hit the market this year (2008). These are terrific compact storage containers for backup swapping: keep one drive at home and one somewhere else.

Back it up, backup, backup, recover!

A 750GB HD costs €100 today. Do not blame anybody else for data loss! 6% of all PCs will suffer an episode of data loss in any given year. Backup your data often according to a plan, and back it up and test the backup before you do anything dramatic like re-installing your OS, changing disks, resizing partitions and so on.

Disaster prevention

Say, you religiously do your backups every day on a external SATA drive. Then comes the day where lightning strikes. Happy you if the external drive was not connected at that moment!

Disasters strike locally and destroy a lot. Forget about airplane crashes: fire, water, electricity, kids and theft are dangerous enough to our data. They usually cover a whole room or house.

Therefore disaster control means de-localized storage. Move your backups upstairs, next house, to your bureau (and vise versa), whatever.

There is another good aspect to the physical separation: as said above, panic is often the cause of destroying data, even the backup data. Having a backup not at hand right away may safe your ass one day.

Some backup technicalities explained for laymen.
  • Full Backup: A complete backup of all the files being backed up. It is a snapshot without history, it represents a full copy at one point in time.

  • Differential Backup: A backup of only the files that have changed since the last full backup. Constitutes a full snapshot of two points in time: the full backup and the last differential one.

  • Incremental Backup: A backup of only the files that have changed since last whatever backup. Constitutes multiple snapshots. You can recreate the original state at any point in time such a backup was made. This comes closest to a versioning system except that it is only sampled and not continuous.

Best practice: The IT-layman's backup cookbook
  1. do a full backup in a external storage device.

  2. verify its data integrity and put it away (disaster control)

  3. have another storage device for frequent backups

  4. swap the devices every other month after having verified data integrity

A useful rsync recipe to backups

Rsync is a wonderful little utility that's amazingly easy to set up on your machines. Rather than have a scripted FTP session, or some other form of file transfer script - rsync copies only the differences of files that have actually changed, compressed and through ssh if you want to for security. That's a mouthful.

A reasonable backup approach for images could be this one:

  1. backup important images right away (after dumping them to a computer) to DVD/optical media

  2. do daily incremental backup of the work space

  3. do a weekly differential backup and delete integral backups of week-2 (two weeks ago)

  4. do a monthly differential backup and delete backup of month-2

  5. if not physically separated already, separate it now (swapping-in another backup drive)

This protocol tries to leave you enough time to spot losses and to recover fully at the same time keeping the backup volume at <130% of the working space. You end up with a daily version of the last 7-14 days, a weekly snapshot for at least one month, and a snapshot of every month. Any more thinning should be done by hand after a full verification.

For German speaking *nix users: link

Preserve your images through the changes of technology and owners

Themes: metadata, IPTC stored in image files, XMP files associated, keep the originals, storage, scalability, media, retrieval of images and metadata, copying image data over to the next generation of media, applications, operating systems, virtualization, viewing device... use of the www.

In order for your valuable images to survive the next 40 years or so (because that's about the time that you will become really interested to revisit those nice old photographs of you as a child, adolescent etc.) there are two strategies to be observed:

  1. Keep up with technology, don't lag behind more than a couple of years.

  2. Save your photos in an open, non-proprietary standard.

How to keep up with technology?

As the future is unforeseeable by nature, everything said today is to be taken with caution, and to be reviewed as we advance. Unfortunately there is no shortcut possible to some basic vigilance. Every 5-8 years at least one should ask oneself the question of backwards compatibility of current systems. The less variants we used in the past the less questions are to be answered in the future.

Of course every time you change your computer system (machine, operating system, applications, DRM) you have to ask yourself the same questions. Today, if you want to switch to Windows Vista, you have to ask yourself three times if you still can import your pictures, and, more important so, if you are ever able to move them onto some other system or machine. Chances are good that you cannot. I see many people struggling around me, because Vista enforces a strict DRM regime. How can you proof to Vista that you are actually the owner of your pictures copyright?

Basically the questions should be answered along the line explained in this document: use and change to open standards supported by a manifold of applications.

Virtualization becomes available now for everybody. So if you have an old system that is important for reading your images, keep it, install it as a virtual machine for later.

Otherwise the advice is quite simple: every time you change your computer architecture, your storage and backup technology, your file format, check it out, go through your library and convert to a newer standard if necessary. And keep to open standards.

Scalability

Scalability is the tech-geek expression of the (easy) capability of a system to be resized, which always means up-sized.

EMVS /LVM Todo

Lets assume you planned for scalability and dedicated the container you want to increase to a separate disk or partition. On *nix systems like Linux® you then can copy and resize the container to the new disk:

Check with dmesg if your new disk is recognized by the system, but don't mount it.

$ dd if=/dev/sdb[#] of=/dev/sdc # source is /dev/sdb, new disk is /dev/sdc
$ parted resize /dev/sdc1 0 <disk size in MB> # works on ext2,3, fat16, 32 and reiserfs
$ resize2fs /dev/sadc1  #resize_reiserfs in case

Todo

Use open, non-proprietary standards as file formats

The short history of the digital era in the past 20 years has proven over and over again that proprietary formats are not the way to go when you want your data to be intelligible 10 years into the future. Microsoft is certainly the well known culprit of that sort because of its domineering market share. But other companies are actually (if inadvertently) worse since they may not stay long enough in the market at all or have a small user/contributor base only. In the case of Microsoft one has at least the advantage of many people sharing the same problems. Finding a solution has therefore much more success. Still, in some cases Microsoft is using Open Source documentation to understand their own systems, so badly maintained have been their own documentation. Usually with any given MSoffice suite one cannot properly read a document created with the same application two major versions earlier.

Image formats have had a longer live time than office documents and are a bit less affected by obsolescence.

Open Source standards have the huge advantage of having an open specification. Even if one day in the future there'll be no software to read it anymore, one can recreate such software, a task becoming simpler every year.

JPEG has been around for a while now, and whilst it's a lossy format losing a bit every time you make a modification and save it, it is ubiquitous, supports JFIF, EXIF, IPTC and XMP metadata, has good compression ratios and can be read by all imaging software. Because of its metadata limitation, lossy nature, absence of transparency and 8 bit color channel depth, we do not recommend it. JPEG2000 is better, can be employed lossless, but lacks in user base.

GIF is a proprietary. patented format and slowly disappearing from the market. Don't use it.

PNG has been invented as a Open Source standard to replace GIF, but it does much more. It is lossless, supports XMP, EXIF and IPTC metadata, 16 bit color encoding and full transparency. PNG can store gamma and chromaticity data for improved color matching on heterogeneous platforms. Its drawback are a relatively big footprints (but smaller than TIFF) and slow compression. We recommend it.

TIFF has been widely accepted as an image format. TIFF can exist in uncompressed form or in a container using a lossless compression algorithm (Deflate). It maintains high image quality but at the expense of much larger file sizes. Some cameras let you save your images in this format. The problem is that the format has been altered by so many people that there are now 50 or more flavors and not all are recognizable by all applications.

PGF "Progressive Graphics File" is another not so known but open file image format. Wavelet-based, it allows lossless and lossy data compression. PGF compares well with JPEG 2000 but it was developed for speed (compression/decompression) rather than to be the best at compression ratio. At the same file size a PGF file looks significantly better than a JPEG one, while remaining very good at progressive display too. Thus it should be well-suited to the web but at the moment few browsers can display it. For more information about the PGF format see the libPGF homepage.

RAW format. Some, typically more expensive, cameras support RAW format shooting. The RAW format is not really an image standard at all, it is a container format which is different for every brand and camera model. RAW format images contain minimally processed data from the image sensor of a digital camera or image scanner. Raw image files are sometimes called digital negatives, as they fulfill the same role as film negatives in traditional chemical photography: that is, the negative is not directly usable as an image, but has all of the information needed to create an image. Storing photographs in a camera's RAW format provides for higher dynamic range and allows you to alter settings, such as white balance, after the photograph has been taken. Most professional photographers use RAW format, because it offers them maximum flexibility. The downside is that RAW image files can be very large indeed.

My recommendation is clearly to abstain from archiving in RAW format (as opposed to shooting in RAW format, which I recommend). It has all bad ingredients: many varieties and proprietary nature. It is clear that in a few years time you cannot use your old RAW files anymore. I have already seen people changing camera, losing their color profiles and having great difficulty to treat their old RAW files correctly. Better change to DNG format!

DNG Digital Negative file format is a royalty free and open RAW image format designed by Adobe Systems. DNG was a response to demand for a unifying camera raw file format. It is based on the TIFF/EP format, and mandates use of metadata. A handful of camera manufacturers have adopted DNG already, let's hope that the main contenders Canon and Nikon will use it one day.

I strongly recommend converting RAW files to DNG for archiving. Despite the fact that DNG was created by Adobe, it is an open standard and widely embraced by the Open Source community (which is usually a good indicator of perennial properties). Some manufacturers have already adopted DNG as RAW format. And last not least, Adobe is the most important source of graphical software today, and they of course support their own invention. It is an ideal archival format, the raw sensor data will be preserved as such in TIFF format inside DNG, so that the risk associated with proprietary RAW formats is alleviated. All of this makes migration to another operating system a no-brainer. In the near future we'll see 'non-destructive editing', where files are not changed anymore but rather all editing steps will be recorded (into the DNG as it were). When you open such a file again, the editing script will be replayed. This takes computation power, but it is promising as it leaves the original intact and computing power increases all the time.

XML (Extensible Mark-up Language) or RDF (Resource Description Framework). XML is like HTML, but where HTML is mostly concerned with the presentation of data, XML is concerned with the "representation" of data. On top of that, XML is non-proprietary, operating-system-independent, fairly simple to interpret, text-based and cheap. RDF is the WC3's solution to integrate a variety of different applications such as library catalogs, world-wide directories, news feeds, software, as well as collections of music, images, and events using XML as an interchange syntax. Together the specifications provide a method that uses a lightweight ontology based on the Dublin Core which also supports the "Semantic Web" (easy exchange of knowledge on the Web).

IPTC goes XMP

That's probably one of the reasons why, around 2001, that Adobe introduced its XML based XMP technology to replace the "Image resource block" technology of the nineties. XMP stands for "Extensible Metadata Platform", a mixture of XML and RDF. It is a labeling technology that lets users embed data about a file in the file itself, the file info is saved using the extension".xmp" (signifying the use of XML/RDF).

XMP. As much as ODF will be readable forever (since its containing text is written in clear text), XMP will preserve your metadata in a clearly understandable format XML. No danger here of not being able to read it later. It can be embedded into the image files or as a separate accompanying file (sidecar concept). XMP can be used in PDF, JPEG, JPEG2000, GIF, PNG, HTML, TIFF, Adobe Illustrator, PSD, PostScript, and Encapsulated PostScript. In a typical edited JPEG file, XMP information is typically included alongside Exif and IPTC data.

Embedding metadata in files allows easy sharing and transfer of files across products, vendors, platforms, customers, without metadata getting lost. The most common metadata tags recorded in XMP data are those from the Dublin Core Metadata Initiative, which include things like title, description, creator, and so on. The standard is designed to be extensible, allowing users to add their own custom types of metadata into the XMP data. XMP generally does not allow binary data types to be embedded. This means that any binary data one wants to carry in XMP, such as thumbnail images, must be encoded in some XML-friendly format, such as Base64.

Many photographers prefer keeping an original of their shots (mostly RAW) for the archive. XMP suits that approach as it keeps metadata separate from the image file. I do not share this point of view. There could be problems linking metadata file and image file, and as said above, RAW formats will become obsolete. I recommend using DNG as a container and putting everything inside.

The Dublin Core Metadata Initiative is an open organization engaged in the development of interoperable online metadata standards that support a broad range of purposes and business models. DCMI's activities include work on architecture and modeling, discussions and collaborative work in DCMI Communities and DCMI Task Groups, annual conferences and workshops, standards liaison, and educational efforts to promote widespread acceptance of metadata standards and practices.

Best practice: Data protection

  • Use surge protectors (UL 1449 standard), possibly combined with a UPS

  • use ECC memory to verify correct data transmission (even just saving files)

  • watch your hard drives (temperature, noise...), make backups

  • Keep backups at another location, locked up, use web storage space

  • use archival media and burners

  • Don't panic in case of data loss, explain your recovery plan to a layperson

  • choose you file system, partitions, folders to cater for easy scalability

  • Use open, non-proprietary standards to manage and save photographs

  • Do a technology/migration review at least every 5 years

A typical DAM workflow with digiKam

  1. import images from camera, card reader or scanner. As long as the images are stored on the camera media, you can use that as temporary backup.

  2. RAW are converted to DNG and stored away into an RAW archive (not yet implemented)

  3. rate and cull, write-back metadata to the DNG archive

  4. make a backup e.g. on DVD, optical drive or tape

  5. tag, comment, geo-locate

  6. edit and improve photographs

  7. For layered editing use external applications. Back in digiKam, re-apply the metadata, which was probably lost or curtailed by the other applications.

  8. run the routine backup with following data-integrity checks

  9. protect processed images for copyrights with Digital Watermarking. Export to web galleries, slide shows, MPEG encode, contact sheets, printing etc.

Workflow