First authored June 2022. However, by the time you read the article, a lot of time may have passed and the software that was tested may have been updated and now just might pass the tests. However, you should conduct tests of your own to see if the current version passes your tests and meets your needs.
After reading this article, read these next ones in forensic order.
Start here:
This is the article you are currently on Inventory/Catalog files Creating an inventory of evidentiary files
Forensic Hashing Article tests over 30 "forensic" hash programs.
Forensic file copying Article tests over 40 "forensic" file copiers
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
ZIP FILE/container/container Hashing your zip container reliably
MATCH FILE HASHES Demonstrates hash matches using Maresware.
A HASH software buffet How-to use Maresware hash software
Preliminary case information which determines why I chose the items to test.
First is you have a situation where you can seize the entire computer, or make a full bit image of the drive then some of these
test requirements will be easily met using a suite. See Suite stuff below. However, there are
situations which will be a little more restrictive, and which will cause you (or rather your software) to be more restrictive in
what and how you process the evidence. That situation will be explained here, and again below, just so you get the idea behind
the topics I chose to perform the tests around. I think (I know thinking is bad), that testing software under these more
restrictive scenarios will show that the software can not only perform in a more restrictive environment, but also in one in
which you have complete control.
So lets begin:
The tests were performed on an NTFS file system because I believe that is the most common file system used by corporations today.
It also offers the more items with which we will perform the tests.
So number one is the fact that the software will be able to find unicode file names. Not necessarily display in full unicode
format, but merely find and process those items.
Then, second because we are on NTFS files system, we must be able to find and process long filenames. Those filename paths
greater than 255 characters. You will be surprised at how many programs can't do that. I have seen cataloging software turn the
long filenames into traditional 8.3 path/filenames. Try and explain that to the opposition.
Third, again because NTFS, we will ass ume that the owner of the computer system, (usually a corporation) has last access
update turned on. The last access update may or may not be important to your investigations, but if it is turned on, your program
should be able to NOT tamper with the file last access date. Wouldn't you like to keep a record of all the original file MAC
date/times?? I would. This is part of the inventory you should take of everything you seize regardless of the type of case.
And fourth and final: again because of NTFS, we should be able to find, identify, and process where necessary any alternate data
streams. Consider a porn investigation where the user downloads porn from various sites. Did you know, that some browsers (I'm
not telling you which, thats for you to find out) actually store in ADS's the original URL and other information of the download.
Might be very interesting in porn or other internet investigations.
Also, you must consider when copying from suspect to a work drive for transmit to your office, that the copy program retains ALL
original file dates so as not to corrupt or influence the analysys. Does yours maintain all the dates?
If you perform a bit-image of the drive using a suite, most of these items above will easily be identified and located as
evidence. However, in our test scenario, we are sitting at a corporate server where we can ONLY process/examine/image/copy (call
it what you will) that directory tree belonging to the suspect. So this fine line refinement and restriction must be considered
when testing our software. Period....
Do you create an inventory of the (physical) items you seize during an investigation?
Then why don't you
routinely create an inventory (full listing, catalog) of ALL the files within the drive, tree, directory
of the suspect computer you just seized?
Read this article and raise your forensic intelligence level a few points. 😄
This article will discuss the idea and processes that might be considered when listing, or producing a catalog of files
- A: contained within the entire tree/directory on the suspects machine while at the original physical location or
- B: within a specific evidence directory that you have restored on your forensic/analysis computer from the suspect or seized
computer. (hopefully A and B match)
- C: the work or evidence files you are reviewing on your forensic workstation, or
- D: the entirety of the evidence you are turning over to the reviewer or prosecutor, which is most likely NOT the entire file list.
After all, if you can't create a catalog of the files within this subject tree, or a list of files within your evidence presentation, how will you or others know what files might be of interest and should be highlighted or captured for evidentiary purpose.
We also discuss some of the possible shortcomings, problems, and/or restrictions you may encounter when using the more traditional or recommended file listing software which may have been recommended thru one of your forensic list serves.
I tested a number file listing programs. Some of which are: suite type, installable, and stand alone. When tested against my simple evidence tree, I found most of them lacking the capability to accomplish all the requirements I set up.
A definition: Let's use the term catalog to also mean a list or listing of the files within the subject location. This catalog listing should lend itself to easily be imported or further massaged by a data base, spreadsheet, or simple text editor. Technically and legally, what we are talking about here is actually an "inventory" of the files contained within the specific evidence location (ie: computer hard drive, server or suspect folder, forensic work drive files). So for practical purposes and the purpose of this article, the following all mean the same: catalog, list, inventory, whole bag of ....
Table of Contents:
BASIC explanation of why you should create a catalog.
Overview of my test requirements.
Suite stuff Suites don't do it all.
Other considerations of format of the data.
Programs Tested
Before we start:
A challenge
(6/2020) for you to test your forensic LIST_IT/copy/zip software for forensic and evidentiary reliability.
List or create an inventory of the files within the evidence source, or your forensic working folder.
A lot of circular references. That way you become a big wheel. 😄
There are some situations where you might want or need to create a catalog of the files within the
specific evidentiary tree you are working with.
1. Create a true and accurate catalog of ALL the files
contained within the seized evidence available (ie: a suspect tree on the company server, or a tree on the
suspect computer).
2. Create a catalog of the files produced or mentioned in
your forensic report so the reviewer will have a clean succinct easy to review list of ALL the files you
are working with.
3. Create a catalog of any and all files within other key directories which are either
part of the original evidence collection, or the final product that is going to long term storage.
4. Create a catalog of those files seized, which are NOT part of the final report.
Situation 1: Installaion of cataloging software is restricted due to corporate or legal restriction".
The source of your evidence is located on a corporate server or on a stand alone (users) computer at a
corporation. The major problem is that this corporation, or search warrant has the following
restriction.
Situation 2: Suspect system has Last Access Date update turned on.
Once you determine that you can only run the cataloging software from the thumb drive you have to consider
if the software will also perform some sort of hashing of the files. This is not generally a requirement, but
some of the cataloging software has this capability. Since the suspect computer has last access
update turned on, you MUST make sure any and all of your processes do NOT update or alter any of the MAC dates. You must
also make sure that your software, when capturing the file list also captures the true and accurate MAC dates. Else you
could alter/corrupt the original evidence. Capturing the MAC times in GMT format is a plus.
Situation 3: Final data production/catalog for reviewer.
Now you have completed your examination on your forensic computer and it's time to prepare a final report,
provide evidence files and a file list to the attorney. You may also wish to create a separate catalog
of all the files which make up your final report. This list might be created for future reference.
You have hundreds or thousands of evidence files extracted from your forensic process which are on your forensic
computer. Your production process produces the selected files to the reviewer. However, to make things understandable for
the attorney, you wish to create a catalog of ALL the evidence files which are being provided. This list is in a clean and
succint listing which most likely will be imported and possibly additionally massaged/sorted/selected by the reviewer. So
the final catalog for presentation to the reviewer must be clean and easily manipulated and re-processed by the reviewer.
You would be surprised at the kludgey (that's a technical term) format which a lot of these recommended packages
produce. Which are almost impossible to form or reform to a clean format. But don't take my word for it.
Below. Notice some of the output formats create completely separate segments for each folder. This would be almost impossible to reprocess logically for hundreds of folders in the tree.
There are other output formats found (see the "other consideration" section below). All of which are problematic when considering taking the output to the next step.
For instance, prior to 2022 the NIST NSRL data sets were produced in a clean flat file format. This meant that those files could be processed/reprocessed by
almost any program capable of manipulating "flat" files. So 10 years from now, any program worth its weight in bits could process that data. Then in 2022 NIST
decided in their infinite "wisdom" that they would now produce the data sets in sql format. Which means that basically only one type of software could process
the data for the next step. Maybe you have sql knowlendge and maybe you don't. But you will need to obtain sql software. Then, what happens years down the road
when sql becomes extinct, and you have little capability of processing this ancient data. However, "flat" data formats will probably never go out of style,
however you decide to process that data.
Finally, the easiest output format might be a fixed length or properly delimited file with complete information
in each record that would allow for easy processing or import to a
spreadsheet or database. So research and practice generating different formats which would best suit your needs. DAH!
Something like this pipe delimited record. No added overhead or "STUFF" to bloat the size.
NAME | EXT | SIZE | WRITE | WR_TIME | CREATE | CR TIME | ACCESS | ACC TIME| MD5 | FULL PATH |DR SER NO
filename.jpg | JPG | 176,626| 2020/03/03| 07:34:56| 2020/03/03| 07:34:56|2021-12-31| 11:38:00| C06BA...| F:\SUBJECT1\filename.jpg | ABC909
Which of the following sample output formats would you rather create and/or have available to load into a spreadsheet of
perform additional analysis on? (some fields truncated, or removed for legibility). Notice some of the output formats
create completely seperate segments for each folder. This would be almost impossible to reprocess logically for hundreds
of folders in the tree.
There are other output formats found. All of which are problematic when considering taking the output to the
next step.
Finally, the easiest output format might be a fixed length or properly delimited file (i.e: pipe) as shown above with complete
information in each record that would allow for easy processing or import to a spreadsheet or database. So
research and practice generating different formats which would best suite your needs. DAH!
===================================================== notice a seperate set of records/lines identifying the new folder name. difficult to injest. Volume in drive Y is Y_2T Volume Serial Number is 7C1E-81A3 Directory of Y:\TMP\TEST_USB\SOURCE2 05/23/2022 11:21 AM "DIRECTORY" CYRILLIC_COPY 05/23/2022 11:21 AM "DIRECTORY" CYRILLIC_NAMES 01/01/2019 08:34 AM 48 ALTERNATE_STREAM_FILE.TXT 34 ALTERNATE_STREAM_FILE.TXT:ALTERNATE.TXT:$DATA Directory of Y:\TMP\TEST_USB\SOURCE2\CYRILLIC_NAMES 01/01/2019 08:34 AM 12,889 Cyrillic.7z 47,814 Cyrillic.7z:LFN_HASHES.TXT:$DATA 34 Cyrillic.7z:signature.txt:$DATA 01/01/2019 08:34 AM 25,894 CYRILLIC_NAMES_W_ADS.7z ====================================================== OR reasonable output as long as its properly delimited. FOLDER C:\TMP\TEST_USB\D1\ ------- 2 15 772,744 772,744 FILE ---A---X 1/1/2019 07:34 1/1/2019 07:34 1/1/2019 07:34 54 _RESET_D1.BAT FILE ---A---- 1/1/2019 07:34 1/1/2019 07:34 1/1/2019 07:34 48 ALTERNATE_STREAM_FILE.TXT FOLDER C:\TMP\TEST_USB\D1\CYRILLIC_NAMES\ ------- 0 5 226,341 226,341 FILE -------- 1/1/2019 07:34 1/1/2019 07:34 1/1/2019 07:34 12,889 Cyrillic.7z FILE -------- 1/1/2019 07:34 1/1/2019 07:34 1/1/2019 07:34 93,971 Cyrillic_NAMES_W_ADS_PK.zip ====================================================== OR Next 3 as long as properly delimited Path,File,Size,Created,Modified Y:\TMP\TEST_USB\SOURCE2\,ALTERNATE_STREAM_FILE.TXT,48,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00 Y:\TMP\TEST_USB\SOURCE2\CYRILLIC_COPY\,Cyrillic.7z,12889,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00 Y:\TMP\TEST_USB\SOURCE2\,Lec 11.htm,52219,1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00 Y:\TMP\TEST_USB\SOURCE2\,ZERO_BYTE.TXT,"",1/1/2019 7:34:56 AM -05:00,1/1/2019 7:34:56 AM -05:00 ==================================================== OR PATH | SIZE| ATTR | MDATE | MTIME | TZ| SERIAL #| DISK LABEL F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z | 12889|.......|01/01/2019|07:34:56:789w|EST| BA0E-5287| 1G_CRUZER F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z:LFN_HASHES.TXT | 47814|.adata.|01/01/2019|07:34:56:789w|EST| BA0E-5287| 1G_CRUZER F:\SOURCE2\CYRILLIC_COPY\Cyrillic.7z:signature.txt | 34|.adata.|01/01/2019|07:34:56:789w|EST| BA0E-5287| 1G_CRUZER F:\...\fifth_folder_starting_at_188_characters_of_longfilenames\ads.htm | 8550|.......|01/01/2019|07:34:56:789w|EST| BA0E-5287| 1G_CRUZER F:\...\fifth_folder_starting_at_188_characters_of_longfilenames\ads.htm:ads_hash.txt | 388|.adata.|01/01/2019|07:34:56:789w|EST| BA0E-5287| 1G_CRUZER ===================================================== OR Name Format Size Modified Created Accessed MD5 Path ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ALTERNATE_STREAM_FILE.TXT TXT 48 Byte 2020/01/01 07:34:56 2020/01/01 07:34:56 2021-12-31 11:38:00 844FB10A6494E10139AD5D91661B5D29 F:\SUBJECT1\ALTERNATE_STREAM_FILE.TXT CHESS_20180226A_sml.jpg JPG 176.626 KB 2020/03/03 07:34:56 2020/03/03 07:34:56 2021-12-31 11:38:00 C0668A5AC70243D30EE3C4DD35B0678B F:\SUBJECT1\CHESS_20180226A_sml.jpgTOP
As far as the actual format of the final data file, a lot a responses have been received. Some suggest XML is how many of the suites can export the data. Others say JSON (SON of Java 😇 ) might be better. And still others might say a Mongo data base (yeh right!!!), or any number of other formats.
But consider this. First, what I was explaining in the last section was the initial output of the more popular cataloging software. In most cases shown above, all the formats are little less than cludgey (not sure of the correct spelling) when attempting to import them into the next step, which may be a data base or a spreadsheet. Also, let me mention that I have seen Excel choke on incorrectly formatted CSV data where a field may have an unusual format, and the CSV totally confuses the program.
Try importing these two data records (seperately), one is CSV, the other pipe delimited into Excel and see what happens to the quoted \"Roswell\" city name.
"dan mares","1234 lakeway,\"rosell\", ga","12345"
dan mares|1234 lakeway,\"rosell\", ga|12345
Also, the added volume of data which formatting each JSON record in the hundreds of thousands of records may also be a problem either for data loading or just handling the size. Bigger isn't always better. Just ask .... (well you know).
XML and JSON might be alternative processing formats, but again, how many of the usual cataloging software packages (except suites) output that way. And for what reason? That they say, do it my way, not how you wish to further process the data.
Then the most important consideration comes to light. In what format can your customer (yes the prosecutor or manager) handle the data. Do they have the knowledge and expertise, and maybe even the software that can efficiently handle the XML, JSON or DataBase format you provide them. Do they even want to learn another language format? Or maybe, all they want in the report is a clean set of records that they can open with a word processor, or text editor, to get a quick view of the data. Maybe they have their own program which they wish to use, and converting from your format to theirs might be somewhat difficult. So, its not so much how you process or handle the data, it how does this initial cataloging software output the data so the next step is manageable for ALL.
So, KISS it when considering the next persons' needs and capabilities to analyze or review the data. And keep in mind years down the road, will your format be usable?
TOP
The requirements I set up when I performed my tests were a few simple items as described here.
When performing your own tests of the cataloging software, these items are simple requirements but
most of the tested software failed one or all of them. So test your own software against similar
requirements. Remember, your requirements may not be those which the defense attorney will challenge.
Consider that today, I would expect to find most corporations are using Windows operating systems, with NTFS file
formats on their main drives. For this reason, I concentrated a lot my testing with requirements of the NTFS system.
My "definition" and explanation of what a good cataloging software capability and should accomplish is:
The test requirements at a minimum are the following:
- ★ NTFS Long Filename/path identification/process: Able to find, articulate and list all files found within any long filename paths.
- ★ NTFS Alternate Data Streams: Able to find, articulate and list all Alternate data streams, whether in LFN's or normal file lengths.
- ★ Report Generation: Able to provide output easily imported into a spreadsheet or data base for next step process, see above sample outputs.
- ★ Time display/retention: Able to find/display and include in report all three MAC times. GMT time listing might also be nice.
- An added plus might be to produce a log of the "cataloging" process for the final report. But not part of the testing.
A suite digression here.
Some times you can do a bit image of the entire drive with a suitable suite. Lets hope you aren't imaging a multi-terabyte server where you only need a single suspects directory. When you do the bit image, the suites generally can create reasonbly understandable catalog outputs which can be further manipulated as needed. This section does not deal with processing full bit images of the data. It deals with using a suite to process data less than the bit image. And may digress a little to explain why if you obtain my test data, you should not use a suite to perform the tests. Again, suites on full bit images work fine, but we are not talking about that here. We are talking a single top level tree/directory of a single suspect possibly contained on a large corporate server located who knows where.
Remember, the situation we are talking about here is twofold. First you may be at a location which has a large server farm, and you can only obtain a single tree/folder belonging to the suspect. A full bit image of the server is not possible. Another instance might be that you are prohibited either by the corporation or court order to install the suite on the computer, or you just aren't in a position to do a full bit image of the drive and must ultimately rely on a logical processing of the tree.
In the cases mentioned above you must therefore run the suite against a single tree/directory. Now if the suite can capture a low level bit image at this point, good for you. But in most instances where you can only operate at the directory level, most of the suite software can at best operate at the logical level. Which means you will have to capture your data at the tree level. Therefore no bit captures allowed.
So when testing your suite against its capability to create a reasonable catalog of files, and to make the test results evenly evaluated, make sure you are doing so at the logical folder level and not the bit image level. You will find a significant difference in the output capability.
Also, remember, your final output is something the reviewer can see, feel, and massage for their use. So the output of the suite may not totally be compatable with his needs.
================================================================Want to see how bad some software is at creating a full catalog of a tree. Try creating a full tree catalog using Windows Explorer. HA HA
Some preliminary information: I want to remind that all the testing I have done and reference in this and any other testing related article was done using Windows10 on an NTFS file system on a desktop computer. The NTFS file system was used as the test environment because I believe that a significant number of corporations and other forensic investigations take place using the NTFS file system. Also, the test environment regarding ability to alter a files last access date, use long filenames and alternate data streams adds to the forensic and evidentiary complexity.
Everyone who deals with “digital evidence” should be aware that no matter what or when you obtain the evidence, your ultimate goal or expected end point is to present this evidence in court. So treat all electronic evidence as if it will end up as court evidence. If you don’t do it from the beginning, it will be hard to backtrack later.
When you first encounter the suspect system, don't you think it might be wise to obtain a full listing of all the files that are visible within the suspect tree/directory. Don't forget, this is possible evidence, and you want to catalog all the evidence you seize. Yes/NO?
This suspect tree might be the entire drive of the suspect, or a single tree on a large server that belongs to the suspect. In any case, you may have hundreds of thousands of files within this evidence location. Wouldn't you, and your later reviewers/attorneys, etc. like to have a full and complete list of those files???
Create a catalog or list of the original evidence and any important intermediate product, and above all the final product produced that will be sent to attorneys, or produced in court should be created.
Cataloging original evidence files is justified and almost mandatory. But let the attorneys argue that one. What about any report provided to outsiders. Are you certain the recipient will not alter the content (add or delete files) and present the alteration as original. Yes, you will say, the recipient has integrity. DAH!
And to put the icing on the cake, so to speak. What if this catalog made sure to include any Alternate Data Stream of each file. If you have reviewed some of my other articles, you will remember that when downloading files from the internet, some browsers add the source of the item and place it in an alternate data stream. Wouldn't it be nice to be able to see in the download folder, that there might be good source evidence in alternate data streams which might lead you to valuable evidence.
TOP
For those adventurous souls who wish to test their forensic suites or stand alone software I
have created a software testing
challenge
to see if your cataloging/listing, copy, zip software passes the test.
Also available is an executable which
contains about 50+- files, in a self extracting
executable which must be run on and NTFS file system to get all the benefits. Email or call 678-427-3275 (leave a message) for the file and its password.
DIR_CMD CMD DIRLISTER GUI DISKCAT CMD FORENSIC EXPLORER GUI FILELIST_CREATOR GUI FILELIST_v2 GUI FTK_IMAGER - Tested on both Drive and Folder GUI KARENWARE GUI (must be installed) PARABEN_E3 GUI POWERSHELL CMD SEARCHMYFILES GUI SLEUTHKIT_FLS CMD TREESIZE_FREE GUI (must be installed) Windows - Powershell CMDBelow are the results sorted in best to not the best. The order listed in no way corresponds to the alphabetic listing above. And some of the above, like DIR actually have two lines within the results.
LFN ADS Write Date Create Date Access Date YES YES YES YES YES YES NO YES YES YES YES NO YES YES YES YES 1/2 YES YES YES YES NO YES YES YES NO YES NO NO YES NO NO NO NO YES 1/2 1/2 1/2 1/2 FAIL FAIL YES YES YES NO NO YES YES YES NO NO YES YES NO Image Image Image Image Image only 1/2 NO YES YES 1/2 NO NO NO NO NO 1/2TOP
Associated articles and programs of interest:
Forensic file copying Article tests over 40 "forensic" file copiers
Forensic Hashing Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
MATCH FILE HASHES Demonstrates hash matches using Maresware.
A HASH software buffet How-to use Maresware hash software
I would appreciate any comment or input you have regarding this article. Thank you. dan at dmares dot com