Home Tech Linux – PhotoRec: File recovery

Linux – PhotoRec: File recovery


When a system administrative error re-formats a hard drive containing professional and personal data that has no backup, time gives way to despair. A speaker at a recent professional development course suggested you have a better chance of a person listening if you take your lesson and wrap it in a story.

So if you don’t make regular backups of your data, this story on recovering data may be something you’ll need. Replacing the OS hard drive with a spare hard drive made it easy to start the install from the self-booting disc. With admin impatience born of numerous system installs, the hunts and pecks were completed to fire up the installation.

As the installation’s  progress streamed a green line across the screen, we reflected on how the Mint installation questions compared to other Linux distro installs. Why, we mused, was the Mint installation occurring on a hard drive with an 80GB capacity when the drive to be used was only 40GB?



Then the brain synapses fired producing a response just at the moment when the screen indicated it was ‘formatting’… Ah… NOOO! “Houston we have a problem,” to steal a famous line from Swigert on Apollo 13 when he reported that fault in the electrical system of one of the Service Modules.

The progress bar displayed 16% by the time our conscious mind understood what had happened, and eventually the author’s index finger lost colour from holding down the off button. The distro’s installation progress injection bar pulsed another five seconds before the whirring fans went silent.

The drive formatting had started on an 80GB hard drive that was a data drive: scribblings, thoughts, ideas, published articles and unpublished writings were on that sole repository – three years of content had been erased.

The nature of data

Having technical knowledge of what happens during a format did little to discourage me from whispering to the computer deities and Saint Stallman, “Maybe it’s OK, I turned the power off before it finished.” However, the file manager application coldly displayed a pristine directory structure of a Linux OS.

The Mint Linux installer had completed the task of formatting the hard drive and gone on to create the file system before it was interrupted. But was the data still on the hard drive or had it all gone?

Data written to a hard drive has some similarities to written words on a page. In a book, words are chained together to form sentences and paragraphs.

On a hard drive, blocks of data are chained together to form files. The index in a book is how the reader locates the chapter in the book. In Linux, the Virtual File System (VFS) contains the directories, names and block structure for the files. If the index is removed from a book, the reader knows the words are still on the pages of the book.

When a system is formatted the file system is laid down but the data blocks on the drive that aren’t used in building the file system are untouched. In order to restore files, the recovery process must find all the data blocks associated with a file and chain them together.


As the police pound on the door attempting to enter the gangs hideout, Books Malone nicknamed The Accountant, is on his laptop  assuing commands to delete entire directories of files.

“You will never take my data alive, copper!” A nefarious persons undoing can be in thinking he/she has removed files from a hard drive by simply using the Delete key but they may not be aware that the file data still exist and are recoverable.

A process that minimizes working with the evidence hard drive is important. For a successful forensic investigation preserving the integrity of the data, ensuring no changes have occurred as part of the recovery, is critical to an investigation.


This process constructs the file without the assistance of a directory. When working on restoring a hard drive it’s advisable to minimize the amount of hard drive use. Powering up and manipulating a hard drive that contains the last known link to the lost data introduces the risk of doing additional damage.

PhotoRec 2

An industry best practice is to make a copy of the troubling hard drive and use the copy for examination.
Instead of copying the damaged hard drive to another hard drive, you can elect to make a file of the damaged drive. Hard drive data in one file creates a flexible container to manage and manipulate the data. Executing the following command from a terminal, will copy the contents of a hard drive to a file:

dcfldd if=/dev/<identifier> of=/<path>/recovery_hope.dd conv=noerror,sync hashwindow=0 hashlog=recovery_page.txt

Replace <identifier> with the drive designation of your test station. Replace <path> with the path name you are using to hold the file.

The dd command line program has a complex command structure. The dcfldd command is an enhanced version of the old dd program, maintained by Nicholas Harbour, and can be downloaded from http://dcfldd.sourceforge.net. if= is the input file (ie, the data hard drive file) attribute and of= is the output file (i.e. data file) attribute.

It’s important to have sufficient storage capacity where the output file is written. The output file of the command will be the same size as the capacity of the hard drive.

The time it takes for the command to complete depends on the drive size. It can be frustrating to wait hours and then  discover a failure because the output device didn’t have capacity to handle the file being produced. With the source, destination and other command attributes defined, the command can be issued. After this process is complete, the troubling data hard drive can be removed and physically
secured. From this point forward only the data file representing the hard drive contents is used in the process.

Recovery begins

Using the data drive file, we’re going to execute the photorec command to start the file recovery process. The program is command line driven and part of the TestDisk utility.

If it’s not already installed on your distro use the usual sudo apt-get install testdisk for Debian/Ubuntu or yum  install testdisk for Fedora/CentOS. PhotoRec uses a number of menu screens to establish the parameters it requires to start the recovery.

The program will produce a series of directories containing recovered files. These files are identified using a number scheme generated with a file extension.

The official wiki (http://www.cgsecurity.org/wiki) has a list of files that PhotoRec is capable of recovering.

photorec <path to hard drive dd file>

The PhotoRec data recovery software is designed to examine the contents of a drive. The program is examining the block metadata, to determine if the blocks are chainable to form a file, and it doesn’t need the VFS. The program will work even if the file system has been severely damaged or in this case reformatted. The location selected to store the recovered files should have the same space capacity as the drive you are recovering.

Since normal file deletion only removes the pointer in the VFS and not the actual data blocks, PhotoRec may recover files long forgotten. The program is distributed by the developers at CG Security (www.cgsecurity.org) under the GNU General Public Licence. The program provides a progress field, and chewed on our 80GB file for close to three hours before it completed. In the process, it produced about 180 separate directories with well over 10K files.

However, in our case the joy of recovering the lost data was short lived, as over 10k files with number sequences followed by file extension for file names, is more metadata than this human can handle.

We were actually keen to find one file to meet a typically short publishing deadline. From the root of the recovered data directories you can execute the following command.

fine -name \*.odt -exec cp {} <path to store openoffice files> \;

The output delivered 51 files with an .odt extension, and examining the individual files we discovered the required publisher’s manuscript. The recovery process didn’t restore the drive to its original condition, however.

Some files were lost because the data had already been overwritten. Before we started the recovery process, we got an estimate from a professional data recovery service, which came in at £450-£680.

Unfortunately, that couldn’t guarantee the recovery of files we needed or that they could only provide anymore than a percentage of recovered files after analysing the drive. I now have a cron’d rsync script and a new one terabyte of USB drive storage as a constant reminder of the reason for regular backups. Lesson learned.


Your email address will not be published. Required fields are marked *