Dates
Group Size
This lab is to be done in groups ranging from 3 to five people it is a dramatic amount of work. Everyone needs to do a signficant amount of research and programming. The exam will cover the internals that you'll learn through the lab. And, there will be a peer review at possibly individual or ream interviews at the conclusion of the lab.
Overview
This lab asks you to use what you've learned about the internals of file systems to recover data from FAT, NTFS, Ext2, and Ext3 file system images. We give you the images, which may be corrupted, have deleted files, etc, and you give us all of the whole files (not fragments) you can recover.It is a programming lab, not a scripting lab. So, you should be writing your own recovery tools in a language such as C, C++, or Java, rather than using the tools that other have written. Having said that, if you'd like to use 3rd party read-only tools, or more powerful 3rd party tools in read-only mode, to get oriented to the file system images that you've been given, that's okay.
Hints and Caviats
A really good start to writing recovery tools is writing read-only tools. Consider writing simply tools that can view inodes, journal file entries, MFT entries, change log entries, FAT tables, boot sectors, etc. Check your results against 3rd party tools -- then try to use the information you can collect to extrapolate and collect more.The lecture notes provide some good pointers. But, the Web has all of the details. You'll have to do a lot of research. And, you will almost certainly will want to leverage the structures and other definitions from header files that are already out there, such as those that are used by Linux. This isn't cheating. This is expected. As presented, the file systems will not mount. But, everything has been damaged in place, as by deleting files, deleting inodes, or overwritting useful bits with corruption. Nothing has been bit shifted, etc.
Pay attention to the comments I make in class while the details are fresh in my mind. In this business, small details can make a huge difference in the effort required by limiting the search space. In a month, I might simply remember less about how the images were constructed. As questions earlier, rather than later.
Work form the simple to the complex: FAT, Ext2, NTFS, Ext3.
What's Given
Download the tar ball, which has four image files: FAT16, NTFS, Ext2, and Ext3. We created them by creating empty files with dd, and then using the linux loopback file system to create the file systems. We then mounted them and populated them. Sometimes we damaged them while mounted, e.g. by deleting files. And, sometimes we damaged them by overwritting selected bits using dd after-the-fact.
What You Deliver to Us
We want a few things from you:
- A Report
- In .pdf
- Who is in your group
- Organized by image
- How you approached the problem
- Who contributed and how
- What surprised you
- What percentage of the files do you believe you recoved?
- Any surprises? Major hurdles?
- What do you most wish you knew in advance?
- The tools you wrote, which will be:
- General purpose to a point, keeping in mind that your immediate goal is to solve the puzzles you've been given, not every possible problem.
- Neatly organized by file system
- Buildable with "make" or whatever build tool is appropriate for the tools you used
- Readable, clean, and internally documented
- Attributed with internal comments highlighting who was -primarily- responsible for each -major- code section
- The whole files you recovered
- We aren't interested in fragments
- Separated by file system image
- Recover the directory structure as much as you can
- Provide a listing, e.g. fat.txt that shows what you recovered, where in the file system it belongs, or that it was recovered as an orphan
Plan of Attack
In our estimation, which might, or might not, be accurate, the relative difficulty of the images is as follows, from easiest to hardest, FAT16, Ext2, NTFS, Ext3. Plan to solve them in this order, one per week. But, to even out the work over time, overlap the research and the coding.Basically, FAT16 is pretty easy. Have everyone collect the best references they can find on the Web and get started Day One!. Work together on this one.
In the meantime, divide up the research on the other file systems. Research means (a) collecting information, (b) getting a working image somewhere (if you are really stuck on this, ask for help), (c) testing using 3rd party tools like hex dumps and fs-specific tools to test your understanding of the on-disk structures, and maybe even writing simple tools to get at these structures.
Then, as you get the bits off of the FAT image, which should just take a day to a small few (but less than a week), depending on your schedules, move to tool-wriitng and recovery for the Ext2 file system, while continuing the research on NTFS and Ext3. Then, in about a week, once done with NTFS, move to NTFS, etc, etc, etc. As you finish up with the file system you have researched, research another or code.
Keep close tabs on your teammates. Don't let them fall behind in research. If you apprach week 4 and, collectively, aren't experts in the Ext3 file system and journal, there is no way to become and expert and code in a week when everyone is climbing the learning curve.
We're Here to Help!
This lab is designed to be a team exercise, where you lean hard on each other. But, if you -group- becomes -stuck-, or, gets -behind- please do get help from the course staff. We want you to be resourceful -- but we want you to be productive.We're here to help!