Return to the lecture notes index

Lecture 6 (January 27, 2011)

Hard Drives

Hard drives do the bulk of the heavy lifting in data storage. One day soon, they may be replaced by a solid state solution. And, I do believe that. But, I've also been hearing that my entire adult life. And, well, so have people older than me. We're getting there...but hard drives are still the top of the game when we need volume, speed, and persistence over time.

Hard drives are also the most repairable and recoverable of the storage systems we've discussed. The only trick is that data recovery specialists can do a dramatically better job than forensics specialists. Having them repair a hard drive is expensive -- but not against the cost of any type of significant legal exposure. Typically repair or recovery costs range from about $650 - $3,000, with most repairs probably in the lower half of that range. Data recovery is a booming industry.

How Hard Drives Work, An Overview

Hard drives are stacks of two-sided disks called platters. The disks rotate at a constant rate of anywhere between 3600 RPM and 10,000 RPM. Unlike CLV CDs, the rate of rotation for hard drives is constant. They are said to be Constant Angular Velocity (CAV). This leads to an organization of tracks that are concentric circles, rather than a single spiral.

Instead of with pits and lands, the bits are encoded using magnetic polarity. For the moment, you can imagine a north pole facing upward as a 1-bit and a S-pole facing upward as a 0-bit. But, in reality, as was the case with CDs, it gets more complex than that. A head is positioned over a track and senses the flux resulting from the transition of a north pole to a south pole or vice versa. This small electrical signal is amplified, cleaned up, and interpreted to result in a the stream of 0s and 1s that represent the data. Because coils of wire can sense changes in magnetic fields, not constant magnetic fields, the bits are modulated, similar to the way they with EFM. In this case, it is the transition from a north to a south, or vice-versa, that carries the information. And, as with CDs, the encoding schemes are designed to force transitions. Modern hard drives use variants of a scheme called Run-Length Limited (RLL) encoding.

The heads are actually stacked, so that there is one head for each surface, or side, of each platter. For reasons of simplicity, cost, and efficiency, the heads are not independent, they move together. Specifically, they seek from track to track by moving in and out on the drive. Technically speaking, they pivot in and out on an arc, rather than moving straight in and out. But, you'll see that for yourself soon enough.

As was the case with CDs, and is the case for most any other storage device or communiction system, disks aren't organized as endless streams of bits. They need to have small, manageable parts so that they can be easily addressed, and so that the data can easily be found, edited, and checked for errors. Sectors, themselves, vary in size, with the smallest at the inside, where the circumference is small, to the largest at the outside, where the circumference is large. And, depending on one's perspective, an outside track might be viewed as too large for convenient management. So, these tracks are broken down into sectors.

In addition to the data, itself, sectors contain meta data, not dissimilar with what we saw on CDs. This includes the sector number, synchronization fields, ECCs, and status bits. A status bit might indicate if the sector is in use or defective.

There are small gaps between the sectors. These, of course, allow for tolerances, but they also allow for one sector to be processed by the electronics, before the next one shows up. There is a similar gap between tracks, also to allow for tolerances.

In the end, hard drives are much better at random access than are CDs. The delay as they wait for a sector to fly by from another place on the track is relatively small, because the disks are spinning quickly. And, although seeking is tiem consuming, it is far better than the guessing game spent to stabilize within a single CLV track.

A Historical Model of Sectors, Tracks, and Cylinders

Back in "The Day", when things were simple, it was easy to sketch pictures of the organization of the data on a hard drive. The picture below shows the old school organization, where each surface contains concentric circle tracks. And, because the disk spins at a constant rotational velocity, there are an equal number of them on the inside and the outside. The bit-density is higher on the inside and lower on the outside. But, since the bits pass under the heads at the same rate, the electronics didn't know that, as they timed it out.

Beyond this, since the heads moved together, they moved across corresponding tracks on each surface. There was a negligible delay to switch from one head to another. As a result, data was often written from head-to-head, then sector-to-sector. And, only then would the head move for a seek.

It is important to note that moving the heads was not, in the early days, nor is it now, a fast thing to do. The head needs to be pushed past the moment of inertia, speed up, coast, stop, and become stable. As it stops, it oscillates. So, waiting for it to become stable, basically means waiting for this to dampen out, and tuning it, if necessary. Reading takes less stability than writing, because some error can be managed without degrading future access. But, writes need tighter tolerances, to ensure future reads will be doable.

The basic picture, in the old days, looked like this:

Modern Disks

In the early days of disk storage, the electronics were the limiting factor. We could store bits as quickly as we could clock them flying by. This meant that there was no loss in having a lower bit density on the outside than on the inside, since the speed, not the density, was the enemy.

Over time, disk heads, and especially the electronics for signal processing, got dramatically better. We were now able to clock data flying by faster than we could squeeze the bits together, at least on the inside of the disk. As a result, it became inefficient to lose storage by keeping the same number and size of bits on the inside and outside.

One can't make a flat, circular disk where all of the tracks are the same size and store bits linearly. Varying the size of the sector is impractical, because less-than-maximum size sectors would waste space in hardware and system buffers, never mind force a lot of physical knowledge to organize simple reads and writes.

So, modern disks vary what they can -- the number of sectors per track. Outer tracks have more sectors than inner tracks. The adjacent group of sectors that has the same number of tracks is called a zone. So, we say that the surface is divided into zones, each of which have the same number of sectors per track.

So, we now have a picture that looks like this:

Source: http://www.pcguide.com"

Because, in the push for increased capacity, tolerances have gotten smaller, there is a delay associated with switching from one head to the next. It is smaller than a seek delay, because head movement is usually not needed. But, it does take time to tune the electronics to the signal from a different head.

Rather than forming a cylinder for corresponding sectors, it is actually formed as more of a spiral, with a skew from one platter to the next. This allows the head time to get tuned to the track on the next surface, before the logically sequential sector comes under the head. For example, surfaces might be stacked as follows:

Source: http://www.pcguide.com"

Disks and Latency

Historically, disks are said to encounter three types of latency, in order of significance:

In modern drives, things get somewhat more complex:

Logical vs. Physical Geometry

Back in "The Day", drive manufacturers used to tell use the actual number of surfaces, tracks per surface, and sectors per track. This information could then be used by operating systems to tune disk access.

These days, the actual configuration of the drive is too complicated, e.g. zones, and proprietary. Instead, drives hide their actual geometry and we can basically pretend that they have any logical geometry that adds up to the capacity of the drive.

This, of course, means that the system software's assumptions about the adjacency of sectors might be incorrect. But, in general, the logical sectors are ordered in the same way as the physical sectors. So, even if the system software can't predict exactly when a seek might occur, nearby sectors remain, in the average case, much faster to "seek" to than more distant logical sectors.

Pictures From Class

In class we looked at the inside of a disk, specifically the Seagate Medalist 3221 (ST33221A) that you'll attempt to rebuild for the lab. It is an old, circa 1993, 3.2G drive. But today's standards, it is junk. But, back in the day, it had top specs: 5400 RPM, 11.5ms average access time, 3.23GB (not TB).

Here are a few photos:

top/bottom (top, left and right, respectively) and side (bottom) views

SMART Hard Drives

Manufacturers have included a bunch of monitoring within hard drives. Drives with this capablility are said to have Self-Monitoring, Analysis, and Reporting (SMART)technology. Although this monitoring might be able to predict some failures, this should not be viewed as its primary purpose. Instead it collects information that, in the event the drive fails and is returned to the manufacturer, can be used by the manufacturer to better understand field conditions and drive failure.

The reality is that most hard drive failures are not predicted by SMART drives, and some SMART warnings may be innocuous (but don't bet on it). Google has had ample opportunity to study hard drive failure, and the information colelcted and reported by SMART drives. They found a few strong indicators, but not many. They published a paper on the topic in 2007.

Wikipedia has a a good article on SMART drives. But, for forensics professionals, here's the important thing. It doesn't matter if a drive has reported SMART errors, as long as you were able to copy off the data, you were able to copy off the data.

If you weren't able to read sectors, that's one thing. If you were, it doesn't matter what SMART data the drive reports. Errors are squelched by the ECCs. Don't let other experts hide behind these errors. The litmus test is reading the data -- not SMART chatter.

Bad Sector Relocation

Hard drives are not perfect. They have bad sectors leaving the factory. And, they can accumulate (or discover) bad sectors over time.

Prospects for Repair or Recovery

With hard drives, regardless of the failure mode, it is almost always possible for a professional data recovery firm to recover significant amounts of data. And, beyond the ridiculous, e.g, smashed with a sledgehammer, data recovery is usually good, even from significant failures.

Do-it-yourself repairs sometimes work -- but can also make things worse. The four biggest reasons, beyond the need for great care, in no particular order, are

  1. the dramatic damage that can be cause by contamination of the disk area with dust
  2. the inability, without special equipment, to access the firmware area of the disk
  3. The need for very detailed drive infromation from the manufacturer, or by reverse engineering (or simply bit stealing), such as the correct firmware to copy onto the disk, or whcih board and drive versions are compatible
  4. The tremendous benefit of specialized tools, such as head rakes and platter jigs.

So, if you are in the forensics business, and you are attempting data recovery on a hard drive without all of the above, plus a track record of experience, you might well be doing your clients a great disservice. Unlike other media, where it is either (a) fairly straight-forward for those with the skill, or (b) really challenging for anyone -- hard drives are cost-effectively recoveredor repaired by professionals.

I repair and recover my own drives, because I have the data backed up elsewhere -- getting it off the drive is a convenience. I might do it for a client, but only if somehow time or expense or data sensitive make no other solution a good possibility.

For the most part, the role of the forensics professional is to educate the client about the condition of the drive, and educate the client about data recovery services, and possibly helping to procure those services.

Symptoms, Types of Failure

Warning to all Readers

These are unrefined notes. They are not published documents. They are not citable. They should not be relied upon for forensics practice. They do not define any legal process or strategy, standard of care, evidentiary standard, or process for conducting investigations or analysis. Instead, they are designed for, and serve, a single purpose, to help students to jog their memory of classroom discussions and assist them in thinking critically about the issues presented. The author is certainly not an attorney and is absolutely not giving any legal advice.