First authored Feb 2019.
However, by the time you read the article, a lot of time may have passed and the software that was tested may have been
updated and now just might pass the tests. However, you should conduct tests of your own to see if the current version
passes your tests and meets your needs.
Here are a few articles you might like to read in the order listed. But before reading them, think about this small difference:
the difference between "processing the evidence", and "conducting the forensic investigation". I think these
articles are more targeted to the processing of the evidence rather than the direction you use to conduct the
investigation. They may be very similar but no cigar.
Before you get into this article, you might read these associated sequence of articles.
Start here:
Inventory/Catalog files Creating an inventory of evidentiary files
the one you are on: Forensic Hashing Article tests over 30 "forensic" hash programs.
Forensic file copying Article tests over 40 "forensic" file copiers
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
ZIP FILE/container Hashing your zip container reliably
MATCH FILE HASHES Demonstrates hash matches using Maresware.
A HASH software buffet How-to use Maresware hash software
HASHING article Contains reference to the NSRL 180 Million MD5 data set.
Preliminary case information which determines why I chose the items to test.
First is you have a situation where you can seize the entire computer, or make a full bit image of the drive then some of these
test requirements will be easily met using a suite. See Suite stuff below. However, there are
situations which will be a little more restrictive, and which will cause you (or rather your software) to be more restrictive in
what and how you process the evidence. That situation will be explained here, and again below, just so you get the idea behind
the topics I chose to perform the tests aroung. I think (I know thinking is bad), that testing software under these more restrictive
scenarios will show that the software can not only perform in a more restrictive environment, but also in one in which you have
conplete control.
So lets begin:
The tests were performed on an NTFS file system because I believe that is the most common file system used by corporations today.
It also offers the more items with which we till perform the tests.
So number one is the fact that the software will be able to find unicode file names. Not necessarily display in full unicode
format, but merely find and process those items.
Then, second because we are on NTFS files system, we must be able to find and process long filenames. Those filename paths
greater than 255 characters. You will be surprised at how many programs can't do that.
Third, again because NTFS, we will ass ume that the owner of the computer system, (usually a corporation) has last access
update turned on. The last access update may or man not be important to your investigations, but if it is turned on, your program
should be able to NOT tamper with the evidneces last access date.
And fourth and final: again because of NTFS, we should be able to find, identify, and process where necessary any alternate data
streams. Consider a porn investigation where the user downloads porn from various sites. Did you know, that some browsers (I'm
not telling you which, thats for you to find out) actually store in ADS's the original URL and other information of the download.
Might be very interesting in porn or other internet investigations.
If you perform a bit-image of the drive using a suite, most of these items above will easily be identified and located as
evidence. However, in our test scenario, we are sitting at a corporate server where we can ONLY process/examine/image/copy (call
it what you will) that directory tree belonging to the suspect. So this fine line refinement and restriction must be considered
when testing our software. Period....
Read this article and raise your forensic intelligence level a few points. 😄
This article will discuss the idea of using hashes and hashing for investigative and evidentiary integrity. It points out the reason you may want to perform hashes to show data integrity for administrative and legal adjudication. Also describes some simple NTFS testing requirements that may show your "forensic" hashing program isn't all its cracked up to be. Over 90% of the hashing programs I tested failed one or more of my NTFS requirements. Do you want to testify why your hashing program failed this test? Not me. And discusses some reasons why you may wish to use stand alone hashing software as apposed to installed software that can't be easily moved/transported from one investigative machine to another.
Table of Contents:
Before we start:
A challenge
(6/2020) for you to test your forensic hash/copy/zip software for forensic and evidentiary reliability.
DON'T BOTHER USING A SUITE AS MOST SUITES ARE DESIGNED TO PROCESS FULL BIT STREAM PHYSICAL "IMAGES", AND WHEN PROCESSING FULL PHYSICAL IMAGES WILL GENERALLY PASS THE TESTS. THEREFOR THE TESTS ARE DESIGNED TO TEST THE SOFTWARE CAPABILITY AT THE FOLDER/FILE LEVEL, AND AS SUCH ASSUME YOU ARE PERFORMING THESE TESTS ON LIVE SUSPECT MACHINES WHICH NORMALLY WILL NOT ALLOW INSTALLATION OF SUITES.
AGAIN, IF YOU TEST PROGRAMS/SUITES THAT CAN PERFORM A FULL BIT IMAGE, A LOGICAL DRIVE IMAGE, AND/OR A TREE/PATH/FOLDER FILE PROCESS, IN ORDER FOR THE TEST TO PROVIDE RESULTS SIMILAR TO STAND ALONE SOFTWARE YOU MUST ONLY RUN THE TEST USING THE TREE/PATH/FOLDER LEVEL PROCESSING ONLY OF FILES. THIS SETS UP THE SCENARIO CLOSEST TO MY TEST PARAMETERS WHICH YOU WOULD ENCOUNTER AT A SUSPECT LOCATION WHERE YOU "CANNOT" IMAGE AN ENTIRE PHYSICAL OR LOGICAL DRIVE BUT MUST PROCESS ONLY AT THE FILE/TREE/FOLDER LEVEL.
Also, regarding the processing of the test files by suites. Remember, we are testing the ability of the program to calculate a
hash without altering the original evidence file or its meta-data. Most suites will be able to perform these tests. However not
all suites or stand alone program can perform a hash of the source (original) evidence without altering some of the meta data. As
there are many "recommended" hashing programs out there, it is suggested you test the stand alones first, then go on to the suite
process/capability of performing the hash. Also, when conducting any of the tests, please ONLY perform the test at the
folder/file level. No physical bit/sector analysis allowed for these tests.
And finally: when you are in a situtation, whether at a suspect/client or on your own machine there will be times when you will
be required to merely hash individual files from the source and not have a suite available. Either hashing some single
piece of evidence on the suspect system, or merely hashing your evidence
zip file/container which will be placed in storage. In
these and many other forensic/evidentiary situations you will not be able to use your suite, which means you must hash using a
reliable hashing stand alone. So you better have one available that doesn't need installation.
Some preliminary information:
The scenario I have chosen under which I chose to test the software, and why.
This scenaio which I will explain, is I think a more generic, and if you think about it, will be one of the easiest situations
where an opponent could challenge the accuracy and evidentiary reliability of your software. So here we go with my basic ass
umptions.
First: we are at a corporate environment where the suspect/user has a single top level dirctory on a rather large server. This
means you CANNOT image the entire drive. Whether the corporation won't let you image the entire drive, or the search warrant
requires that you only copy/image/process ONLY the data/evidence files under the suspects control. Meaning their diretory.
Second: The corporation has the last access update on an NTFS file system turned on. The access date update is
turned on as a matter of corporate security policy. The last access date of the file might also help you in determining when the
suspect may have copied to an external drive, or printed the item they are accused of stealing. No more discussion.
Third: The case involves possibly two violations: first the corporation feels the suspect may have copied or printed senstive
information and you are tasked to see if you can locate those items and their dates. Also, the criminal side of this
investigation indicated that the suspect may be dealing in some sort of pornography, and you are attempting to determine,
if/when/where/how etc. there has been any porn downloaded from the web.
Fourth: To accomplish the above, the first two things you attempt to accomplish is: make or
create a full inventory of all the
files within the suspect folder(s). That's another topic of discussion. And second (the reason we are here) create a hash of all
files within as an evidentiary checkpoint of what is there. Later on you will probably copy all the files to take back for
further analysis. now, get to it!
I want to remind that all the testing I have done and reference in this and any other testing related article was done using Windows10 on an NTFS file system on a desktop computer. The NTFS file system was used as the test environment because I believe that a significant number of corporations and other forensic investigations take place using the NTFS file system. Also, the test environment regarding ability to alter a files last access date, use long filenames and alternate data streams adds to the forensic and evidentiary complexity.
One part will discuss the need for hash values when dealing with forensic and electronic evidence. The second part will discuss how to process hash values for your specific needs. And part three suggests a very interesting evidence verification technique.
Disclaimer: The mention of any program, website or algorithm in no way should be taken as an endorsement of same. And in some cases, I may even point out a flaw or limit to its actions.
Before we start, let us agree on what a hash value is. I’m in no way a mathematician. So any description used will hopefully be in plain English and layman’s terms. That being said let us examine some websites that attempt to define hash values, hashing, hash algorithms, etc.
Here are some sites I found which explain hash. There are many, both scientific and common definitions. So take these definitions with whatever grain of salt and rebuttal you wish. Many may seem redundant, but explain hash in their own way.
OMNISECU.com
A Hash Value (also called as Hashes or Checksum) is a string value (of specific length), which
is the result of calculation of a Hashing Algorithm. Hash Values have different uses. One of
the main uses of Hash Values is to determine the Integrity of any Data (which can be a file,
folder, email, attachments, downloads etc).
Trendmicro
Hash values can be thought of as fingerprints for files. The contents of a file are processed through a cryptographic algorithm, and a unique numerical value – the
hash value - is produced that identifies the contents of the file. If the contents are modified in any way, the value of the hash will also change significantly.
Two algorithms are currently widely used to produce hash values: the MD5 and SHA1 algorithms.
Microsoft website:
Google:
Wikipedia
Cryptographic hash functions have many information-security applications, notably in digital
signatures, message authentication codes (MACs), and other forms of authentication.
NIST articles/information
A NIST pdf file on the NSRL hash library and hashing.
NIST HASHING ARTICLE
NIST Crytographic attack Article.
Now that you have a number of definitions to choose from, let us talk turkey or hash.
Everyone who deals with “digital evidence” should be aware that no matter what or when you obtain the evidence, your ultimate goal or expected end point is to present this evidence in court. So treat all electronic evidence as if it will end up as court evidence. If you don’t do it from the beginning, it will be hard to backtrack later.
Even when a company suspects an employee of wrongdoing involving their computer data, and the personnel department (yes, I said personnel department, I’m not politically correct) decides to secure the employees data they should always expect that down the road this data may turn out to be forensic evidence in court and should be treated as such. If we assume the worst, and it becomes evidence in court, we must ensure that the original evidence is not tampered with, or altered in any way.
As one of the above definitions advised: (verify the integrity of the data). Any alteration by the examiner could possibly lead to a plausible defense. So how do we ensure this “non-alteration” integrity and validity? We hash all the data/evidence collected by the examiners. If you encounter a network breach and capture network traffic to files, a theft of company keys to the kingdom, improper email (pornography etc.), virus, extortion, anything that could get a person fired or arrested, you should make sure the original evidence is not tampered with and you can verify the integrity of the original data. To do this is to create a hash of any evidence from the get go.
Hashing the original data, any important intermediate product, and above all the final product produced that will be sent to attorneys, or produced in court should be hashed.
Hashing original evidence is justified and almost mandatory. But let the attorneys argue that one. What about any report provided to outsiders. Are you certain the recipient will not alter the content and present the alteration as original. Yes, you will say, the recipient has integrity. DAH!.
Simple solution I have used to hopefully guarantee integrity of your report/data.
1: Hash the report or appropriate data, which may be many files, or image files.
2: Take that values, put it in a file.
3: Then encrypt the file with a password only you have.
Send the evidence, and the file of encrypted hash value(s) to the opposition. Let them play with the data. When it comes time later to validate the integrity
of what they received. Decrypt the encrypted hash value which you sent to them, and they had in their possession during the entire process. Then compare your
original hash value with the one of the data they are working with. Simple three step process to ensure integrity of the original data, especially where
“images” are used.
And to put the icing on the cake, so to speak. What if the hashing process added an Alternate Data Stream to each file that is being hashed. Then for each file, you have a hidden piece of evidence which can later be "extracted" to confirm the original hash value. Don't worry. That cabability already exists in the Maresware HASH.exe program (--ADS option). That combined with the copy_ads.exe (to later extract the ADS), can provide an additional layer of hash security.
Now that you have found a way and set a process to verify and hopefully guarantee the integrity of the data you are working with, lets talk about the actual hashing process and/or programs.
TOP
You will need software that will produce a valid hash of the data. There are as I see it two types of data which are the main important items to hash. First is the entire physical device. Most notably the hard drives being used by the subject of the investigation. But cell phones also come into this mix. There are a number of hardware and software procedures available to “image” the entire hard drive. Any device/software capable of doing this should also provide you with the hash value of the entire drive. This value should confirm that the original data collected is what you see is what you got.
In some instances, you will decide to only copy or use specific files. (ie: virus programs, documents, images, emails, etc). In these cases find a reliable product that can hash individual data items. Its your responsibility to determine that the program you use actually does what it advertises. Don't always rely on replies from a list serve you belong to.
Now, proprietary devices/software often compress, place headers, footers, etc into the final image “file”. Thus making it a little difficult to independently confirm the hash value of the evidence image. I personally, even though it takes up a lot more room, prefer when possible, to create and produce what is called a dd image of the drive. Think about it. A dd image can be processed/looked at by almost any “forensic” process, not only its originator. This way, the hash could be independently confirmed and validated by any software program capable of calculation of hash of “raw” data. Proprietary packages are good in that they compress, and make images manageable, but what happens 5 years down the road, when that company no longer supports the compressed image format you created 5 years before. Just my $.02.
A personal instance which I just experienced, is that during one of the "constant" Windows 10 updates, it said that my current version of the PGP encryption software wasn't compatable with WIN10 and WIN10 suggested that I remove that package of PGP. Had I removed that version of PGP, what would have become of all my encrypted documents? A case of a program or OS, not working any longer with an older version of your software. Guess what I told the OS to do. I can't repeat it here. And my version of PGP still works fine.
Also, think about it. If you ask product ‘A’ to confirm its own calculation. Isn’t that like asking the fox if he raided the hen house. What if the product had an internal flaw that no one (except the defense) knew about. He (product A) will always confirm his answer. Use products which produce images and values that can be independently verified/validated. (another $.02).
Once we have the image hashed, then it comes to hashing individual files. You may wish to hash files to confirm they are inconsequential (or important) to your investigation.
NIST STUFF
One way to do this is to obtain the NIST, NSRL (National Software Reference Library) data set.
NIST NSRL
The NIST data set (ver 2024_03) contain over 180 million hash values of “programs” which it considers “known” entities.
“The RDS is a collection of digital signatures of known, traceable software applications. There are application hash values in the hash set which may be considered
malicious, i.e. steganography tools and hacking scripts. There are no hash values of illicit data, i.e. child abuse images.”
Notice NIST didn’t say, good, bad, or ugly. It is up to the user to determine the files providence and usefulness in your investigation.
Your analysis whether it be thru a forensic software suite (which does everything but cook dinner), or individual packages that calculate file hashes can use the NSRL data set to “hopefully” eliminate non-essential files, or identify important ones. Find and learn how to use appropriate software to calculate, confirm, massage and work with hash data that is generated.
The problem with the NIST .db format, is that is contains a lot of information in the data base format which may not be of any use to most people and hard to manipulate because of size and. Also:
The problem with the forensic suites, is that sometimes they require a specific format of the hash values you provide as reference. For instance, one suite, which may have changed its requirement since I last had a license, was that if you supplied a list of MD5 values, it required a header line of “MD5”. It just seems that a program with the smarts to analyze hash values, doesn’t need the explicit MD5 as a column header. Again my $.02. Another product I recently heard requires another specific format to be ingested into its analysis. On my soapbox again. These packages are so good, why require any special format. A list of MD5 hashes, is just that, a simple data list. (Now I’m up to $.04).
So, when dealing with hashes, whether to confirm importance or irrelevance find a process that works for you. And one you have thoroughly vetted, tested and can testify to. That being said, I have tested over 20 stand-alone hashing programs which are very well known and respected. And I hate to say, that about 90% of them fail on differing aspects (see spreadsheet below for the items i tested) of their operation.
For those of you who wish to obtain a clean subset of the NIST data, check out my website
DMARES.COM
SAMPLE NSRL DATA
HASHING article Contains explanation and link to the entire 180 million MD5's.
Regarding my own hashing programs, and the culled NIST data sets which are clean, fixed length records, compatible with any piece of software worth its salt. I DID NOT FIND any collissions.
Below is part of a spreadsheet of tests I conducted on some recommended "forensic" hashing programs. Not all are listed, as I have added some software since the spreadsheet was first produced. You may be surprised at the results. I only tested those that were "free". As I'm not about to pay for a program that I don't plan to use. Anyone who wishes me to test a pay for play program, feel free to send me a license. Also tested were some programs that were close to suite operation which claimed to perform hashes also. I did not test any fully functional suites, only stand alone hashing packages, or when possible the hashing capability of a suite I had access to. BUT ONLY its hashing capability at the file/folder level. NEVER from a full bit image.
Any links from here down may not be alive, since I am constantly updating the tests and classes provided. However, if you wish the items, contact me with an email or phone call.
For those adventurous souls who wish to test their forensic suites or stand alone software I
have created a software testing
challenge
to see if your hash, copy, zip software passes the test.
Also available is an executable
Test data containing about 30 files, in a self extracting
executable which must be run on and NTFS file system to get all the benefits.
Lets suppose that during your investigation you develop a number of stand alone files (not images) which you consider to be evidence. And at some point anticipate that these items (files) might be introduced in court or otherwise as evidence items.
At the point you determine the files contain or are evidence you would want to preserve the integrity of the file and its data. Of course at some point you would probably hash the files, record the hash values in a seperate stand alone hash file, and zip the file(s) and the recorded hash values. This is probably a routine task for the preservation of evidence files.
Now lets assume that the process just mentioned includes hundreds of individual files. Part of the delivery process, meaning providing the files to the opposing party for review would involve making a copy on a drive and providing the drive to them. (remember you still maintain the originals, the hash values, and the zipped container). So far so good. But lets figure a way to include with the file copies a little extra security. That is where my next suggestion comes in.
My suggestion is that before forensically copying the files to the delivery drive, you run a process which will calculate the hash value of each file, and then place the hash value, along with minimal meta-data into an Alternate Data Stream of the file. This way, each file has its own small verification capability. All you have to do at any time, is read the hash value contained in the Alternate Data Stream and compare it to the single file you happen to be concerned with. No need to review the larger hash file list, or even tell the recipient of the evidence drive that these data streams exist, as most people, and explorer can't see/find data streams. It will be your additional security tests.
If you like this idea, check the Maresware hash program for the --ADDADS option, and the copy_ads or upcopy program for their cabability on displaying Alternate Data Streams.
Associated articles and programs of interest:
hash program to calculate hash values.
COPY_THAT an article discussing forensic copying of evidence.
ZIP_IT an article regarding use of zipping software for forensics.
ZIP_IT_TAKE2 an article explaining the testing of zipping software.
To wet your curiosity, here are the stats from a study I performed on some "forensic" hashing programs. And the results.
The study was done using about 150 files contained in the images mentioned in the above challenge. Totally non-scientific,
but practical.
As you can see, only one passed all four tests. And only 2 passed the NTFS Alternate Data Stream test. While only about 9
passed the long filename test. For NTFS, if you can't pass the LFN test, you are severely lacking evidentiary integrity. If you
are not interested in the stats, just jump over them. I'm sure by the time you read this article that the
software manufacturers who have software listed on this page may have already updated their software to work efficiently. But
what do I know.
The column headers are:
UNICODE (does the program properly identify and process Unicode filenames)
LFN (does it properly find and hash long filename files > 255 character path/names)
ADS (does it fine and process NTFS Alternate Data Streams, great hiding places)
RESET LAST ACCESS (does it properly reset original last access, and properly set all dates). Notice surprising failure of many items
which failed to reset last access date, thus leading to possible evidence corruption, or at the least, misinformation regarding
original evidence last access. (no not Russian misinformation).
Those showing 1/2 or 1/4 pass means they didn't always pass either the LFN or ADS test.
UNICODE |LFN |ADS |RESET LAST ACCESS | | | PASS |PASS |PASS |PASS PASS |PASS |FAIL |PASS PASS |PASS |PASS |FAIL PASS |PASS |FAIL |FAIL PASS |PASS |FAIL |FAIL PASS |PASS |FAIL |FAIL PASS |PASS |FAIL |FAIL PASS |PASS |FAIL |FAIL PASS |PASS |FAIL |1/2 PASS PASS |FAIL |FAIL |PASS PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |1/2PASS |FAIL |FAIL PASS |1/2PASS |FAIL |FAIL PASS |1/2 PASS |FAIL |FAIL PASS |FAIL |FAIL |FAIL PASS |FAIL |FAIL |FAIL FAIL |FAIL |FAIL |FAIL FAIL |FAIL |FAIL |FAIL FAIL |FAIL |FAIL |FAIL FAIL |FAIL |FAIL |FAIL FAIL |1/4PASS |FAIL |FAIL FAIL |1/4PASS |FAIL |FAIL FAIL |1/4PASS |FAIL | FAIL |"---- |FAIL |FAILJust a little more intrigue. Here is a partial alphabetical list of the software tested, and version number (if I saved it). Some tested fully, others partial, some not at all because of installation problems or licensing. Remember, for the suites, ONLY the simplest hash process was performed, and not any hashing of the files within an image. The order here is NOT the same or as inclusive as the result table above. So don't try to match it up, one for one. Do your own testing before the defense does.
NAME | version used at time of testing | Advanced File Hash |2.02 Autopsy-Sleuthkit-HASHING |4.14 Belkasoft - evidence finder |9.7 Exact_File (10/06/2020) |1.0.0.15 FCIV - Microsoft |2.05 Forensic Explorer |5.x FSUM |2.52 FTK_IMAGER |4.5 Get_Data Forensic Explorer |5.2.2.9680 HashConsole | HASH |20.xx HASHCAT |5.1.0 HASHDEEP64 |4.4 HASHER |1.9.3 HASHING |1.4 HASHING 2.1, by Deadmoon |2.1 HASHMYFILES |2.36 HASHTOOL |1.21 HASHTOOL#2 |N/A IgorWare hasher |1.7.2 Karens Hasher (power tools) |2.3.1 MD5 - fourmilab.ch |2.2 MD5 - sanderson |-- MD5CHECKER |3.3 MD5DEEP64 |-- Paraben |E3P2C OS_Forensics |7.0.1004 Quickhash |3.0.2 Rhash |1.3.8 SHA256DEEP64 |-- SigCheck_Russinovich |2.72 Sourceforge - ReHash |0.2 Sourceforge - Simple Hasher |1.2.0 ssdeep |2.14.1 TotalCommander (6/2020) |9.5.1 Toms_Hash_Explorer |1.2 WINMD5 |1.2 XL-File Tools |4.3TOP
If you test your software against the suggested parameters please let me know the results so i can add to the list.
If you get the same results as I have, please advise also.
For questions or answers (no flames please) regarding the hashing software, the NIST data records on my site, work007 (at) dmares.com.
One final note:
During the testing one of the items I looked at, but is not covered here is the output format of the program. The output format
can be very confusing or annoying depending on what you plan on doing with the output. Some of the items tested produced such a
cludgey (again thats not a scientific term) output format that I wouldn't want my worst forensic adversary to try and manipulate
it to a reasonable report inclusion. So not only should you consider the actual operation, but how hard it will be to include or
manipulate the output format to your next investigative stage. Just my $.02
Before closing, you might want to read some of these other articles:
Inventory/Catalog files Creating an inventory of evidentiary files
Forensic file copying Article tests over 40 "forensic" file copiers
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
MATCH FILE HASHES Demonstrates hash matches using Maresware.
A HASH software buffet How-to use Maresware hash software
I would appreciate any comment or input you have regarding this article. Thank you. dan at dmares dot com,