Inventory/Catalog files Creating an inventory of evidentiary files
Forensic file copying Article tests over 40 "forensic" file copiers
Forensic Hashing Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
ZIP FILE/container Hashing your zip container reliably
MATCH FILE HASHES Demonstrates hash matches using Maresware.
the one you are on: A HASH software buffet How-to use Maresware hash software
ABSTRACT
Original version: January 2021.
Often, outside of the usual suite menu of processing forensic evidence you may wish to be able to perform various other processes using hash
values of files. Such processes may include the calculation, comparison, analysis of the hash values of files in a
directory or tree structure.
Check out the
NIST NSRL
data sets and their information regarding using "known" hash values in your investigation.
A. You may need to simply calculate hash values for future retention, or reprocess the data for some additional refinement.
hash.exe
B. In a forensic/evidentiary environment, you have (hopefully)
forensically copied
files from subject SOURCE_A location, to work/examination DESTINATION_B and wish to make sure the files were copied
correctly, and the destination is an exact copy of the source location. To do this, you will most probably hash A, hash B,
and compare the two hash files for differences. hashcmp.exe
C. In this same forensic environment, or even on your own personal computer, you have directory source, and directory destination, and you wish
to see what files may be on one location and not on the other. This may occur when you are doing simple backup from pointA to pointB.
hashcmp.exe
D. You have multiple directories/folders/trees on a drive, and want to see what files may be duplicated across the
multiple locations. I run into this situation routinely when I copy files from my cell phone to my permanent computer
storage. (I'm a terrible housekeeper.) Often placing them in different locations, or renaming the files, which cause
duplicates to show up based on hash.
hashdup.exe
E: You want to constantly be able to check to see if a file (executable or otherwise) has been changed.
The crudimentary process in a batch file could be devised to determine if files were added or changed. Create a reference
set of MD5 values, then periodically run MD5 using the --MATCH option with the "known" MD5's as an input.
F: You want to constantly be able to check to see if an NTFS file (executable or otherwise) has been changed.
Use the combination of --ADDADS and --ADS_COMP options. See below.
Check out the hash_matching article which contains a number of processes and programs contained in a simple (basic) batch file located within this hash_test.zip file which will demonstrate the actions descibed here.
Please review these two articles before going further.
hash_software_tests
This hash_test_article on testing your hashing software in a forensic/evidentiary environment, and how the hashing software will (notice I
didn't say may) fail strict cross examination.
hash_matching article describes a number of processes to
accomplish hash matching of hash data files using various Maresware software
Table of contents of this article: Jump to this Section: Help file and executable download MD5 basic hash program. simple but efficient. Manual md5.exe HASH next level hash program. Manual hash.exe HASHCMP compare hash files. Manual hashcmp.exe HASHDUP find duplicate hash values within a single hash run. Manual hash_dup.exe HASH_VERIFY Verify md5 integrity the easy way.
In short, the programs shown above are specifically designed to work with each other to:
1. MD5 performs hash calculations (md5, sha) on files within a directory/tree structure. Provides basic output format.
2. Hash creates fixed length records of the hash (md5, sha) data that is calculated. Provides more verbose capabilities than the MD5 program.
3. Hashdup performs calculation on the data set to see which files are duplicates based on hash value.
(This can be used to deduplicate known good files or to identify multiple instances of evidence relevant files.)
4. Hashcmp performs calculation to see which files are contained in SOURCE and not found in DESTINATION. (Your forensic copy didn't quite work. What a
surprise!)
In addition, other Maresware software can used to further analyze the outputs of these software programs, and almost any fixed length data file
your software will provide. Other useful Maresware software includes: diskcat, search, bsearch, compare, and filbreak; which can be used to
further analyze the data. All the respective help files, and executables can be found on the
Maresware home page.
Besides using these programs which are specifically designed to work with each other on Maresware hash related data, you may have access to
many other non-Maresware applications which can reprocess this data for forensic or evidentiary requirements. Don't limit your software menu.
The explanations and examples provided in this document are basic and not in any way inclusive of the capability and usefulness of the software. I only provide top level examples and explanations in order to give you a taste of the capability, and hope you will want to partake more of the hash buffet available.
Below are external links to the hash family of executables and help files.
md5 Help file, and
md5.exe download.
hash Help file, and
hash.exe download.
hashcmp Help file, and
hashcmp.exe download.
hashdup Help file, and
hashdup.exe download.
The md5.exe program is the most basic of the Maresware programs able to calculate the md5 hash value of files. The md5 program (as other Maresware programs) can also calculate various SHA values of the files.
In its default operation the md5 program displays basic information of a filename, file size and the md5
value, along with some "accounting" information.
Below is a sample of 4 lines of output from a larger run. The basic output format is just that, filename, md5 and file size and is adequate for
most uses.
grep.exe BA67233FAAFB95316E6CCAD42438BBBC 160768 Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 sed.exe E26824B098033E9682850673AB548B7E 82944 Total.exe 7B29E04B436F3D581F0144DB0CA04FF3 159432 1 directory, 8 files, 1,111,076 bytes, 1.11 MB
grep.exe | BA67233FAAFB95316E6CCAD42438BBBC | 160768 | 01/01/2019 | 11:30:01:000c | EST | Search.exe | 4E8DC094BD055248C406D6A0814A9C4D | 198344 | 01/01/2019 | 11:30:01:000c | EST | sed.exe | E26824B098033E9682850673AB548B7E | 82944 | 01/01/2019 | 11:30:01:000c | EST | Total.exe | 7B29E04B436F3D581F0144DB0CA04FF3 | 159432 | 01/01/2019 | 11:30:01:000c | EST | 1 directory, 8 files, 1,111,076 bytes, 1.11 MB
BA67233FAAFB95316E6CCAD42438BBBC | 160768 | 01/01/2019 | 11:30:01:000c | EST | D:\...\grep.exe 4E8DC094BD055248C406D6A0814A9C4D | 198344 | 01/01/2019 | 11:30:01:000c | EST | D:\...\Search.exe E26824B098033E9682850673AB548B7E | 82944 | 01/01/2019 | 11:30:01:000c | EST | D:\...\sed.exe 7B29E04B436F3D581F0144DB0CA04FF3 | 159432 | 01/01/2019 | 11:30:01:000c | EST | D:\...\Total.exe 1 directory, 8 files, 1,111,076 bytes, 1.11 MBThe choice of options and command line operation makes this a basic choice for batch files, using basic md5 calculations. Top
The next item in our hash buffet is hash.exe. This is the next more complex and verbose Maresware program to calculate the md5 hash value of
files.
The hash program (as other Maresware programs) can also calculate the SHA values of the files.
In its default operation the hash program displays the path, hash, size of a file and write date/time.
By default it recurses the directory tree to which it is pointed, so your output is more inclusive (with path, and dates) than the md5
program.
D:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\sed.exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\Total.exe 7B29E04B436F3D581F0144DB0CA04FF3 159432 01/01/2019 11:30w EST 0 directories, 10 files, 1,746,356 bytes, 1.75 MB
D:\TMP\TEST_FILES\EXES\Search.exe | 4E8DC094BD055248C406D6A0814A9C4D | 198344 | 01/01/2019 | 11:30w | EST D:\TMP\TEST_FILES\EXES\sed.exe | E26824B098033E9682850673AB548B7E | 82944 | 01/01/2019 | 11:30w | EST D:\TMP\TEST_FILES\EXES\Total.exe | 7B29E04B436F3D581F0144DB0CA04FF3 | 159432 | 01/01/2019 | 11:30w | EST 0 directories, 10 files, 1,746,356 bytes, 1.75 MBTop
When you have the output of two hash runs, from SOURCE and DESTINATION, and you wish to see what files may show up on SOURCE that are not on DESTINATION, or vise-versa, the hashcmp program will do this easily.
The hashcmp program is the simplest way to compare two hash data sets to see which files either match or do not match from one hash run to the other. Hashcmp is the most basic of the processes to compare two hash outputs.
Check out these two sample records (one from source, the other from destination), with spaces truncated for legibility, comparing two files of identicle length and format.Although hashcmp was originally designed to operate on the output of the Maresware hash.exe program, with a little thought and understanding of its operation, the hashcmp program can be adapted to process/compare any two files of identical fixed length data that have a common sorted field such as the MD5 field. (for instance, compare two directory listings to see what might match or not).
Hashcmp takes two "fixed length record" files created with the hash.exe program, and compares them either on the entire record length, or just the hash value. In most cases you would want ONLY to compare on the hash value. The intent is to ensure that you have accurate copies of evidence files in both the SOURCE1 directory, and a DESTINATION1 work location. To make sure all the hashes in SOURCE and DESTINATION match.
NOTE: Again, a reminder, hashcmp is designed to compare two identical formatted files on a single field (ie: hash value). This generic comparison is regulated by the appropriate -d and -l (ell) options, not described here.
Now that we have two hash runs available, we can run the hashcmp program.
The generic hashcmp program is:
C:>hashcmp SRCE.out DEST.out -o mismatch.out -h
What you will get from an actual run is an output file containing references like the item shown below where the program identifies the hash values
found in a SOURCEA file that don't match the DESTINATIONB file. Take notice of the different hash values. (spaces truncated for legibility)
found in SRCE.out not in DEST.out | C:\TMP\ZIP_IT.htm | C772D55C42A41B4E6F261F28B8DAA7FF | 12072 ....
found in DEST.out not in SRCE.out | D:\TMP\ZIP_IT.htm | D772D55C42A41B4E6F261F28B8DAA7FF | 12072 ....
Notice that because the same file has two different hash values, you actually get two records in the output. One references the hash value in
file1 not in file2, and conversely, one value found in file2 not in file1.
Check the help file for explanation on the appropriate hashcmp command option to only show those in file1 or file2 in the output mismatch.
The final three line batch file you might use to accomplish this process is shown here, and contained in the hash_matching article. First, hash the source, then hash the destination, then run hashcmp against the two outputs.
C:>hash -p x:\source1_folder -f files_to_hash(usually *.*) -w 300 -d "|" -o SRCE.out -R -1 logfile1 C:>hash -p x:\destination1_folder -f files_to_hash(usually *.*) -w 300 -d "|" -o DEST.out -R -1 logfile1 hashcmp source.out destination.out -o mismatch.out -hThere are some minor drawbacks or requirements to the hashcmp process. To see a more complete explanation on how to use hashcmp, check out the hashcmp section of the article. Top
The fourth and final part of our hash buffet is the hashdup program.
Hashdup is designed to analyze a single fixed length record file on the hash field and produce an output file showing which files have duplicate
hash values.
This operation could be used to see how many copies of a suspect fileX the person may have in different locations on the drive
under different, or the same name, or when run against your own drives hash list, it could point out how many copies you have of pictures of
your family in different locations. This is how I use it every time I download pictures from my cell phone to my main storage location. I rehash all the
photos and then run hashdup to see how many duplicates show up. I guarantee you will always find some.
Now, back to our hash buffet.
Below is a sample of a few hash records.
D:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\sed1.exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\Total.exe 7B29E04B436F3D581F0144DB0CA04FF3 159432 01/01/2019 11:30w EST E:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST E:\TMP\TEST_FILES\EXES\sed2exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\Search.exe 5E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\sed3exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w ESTNow, let's sort on hash and find out if there are duplicates. (this sorting is done internal by the program)
D:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST E:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\Search.exe 5E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\Total.exe 7B29E04B436F3D581F0144DB0CA04FF3 159432 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\sed1.exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST E:\TMP\TEST_FILES\EXES\sed2exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\sed3exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w ESTNotice which hashes show up more than once. The search.exe shows up in two locations, while the sedX.exe shows up in three locations.
After running the program, this is the file that is created with the following stats shown on the screen.
Here are the statistics from the run. (blank lines added for legibility)
Processed 7 (files)
There were 2 duplicate sets found
E:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\sed3exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST E:\TMP\TEST_FILES\EXES\sed2exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\sed1.exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST
The program indicates that two files exist with duplicate or more copies.
Now it's up to you to determine what to do with this list.
When I run it on my own volumes, I remove from the list those copies I wish to retain,
(in the display above, I would remove from the list those items on the E: drive that I want to keep in place, on the other drives)
leaving me with a list like: (notice only the D: and G: drive items remain in the output file.)
D:\TMP\TEST_FILES\EXES\Search.exe 4E8DC094BD055248C406D6A0814A9C4D 198344 01/01/2019 11:30w EST G:\TMP\TEST_FILES\EXES\sed3exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w EST D:\TMP\TEST_FILES\EXES\sed1.exe E26824B098033E9682850673AB548B7E 82944 01/01/2019 11:30w ESTThen I run the duplicate file list file thru the Maresware: rm or rmd program, using the -S option thereby removing (deleting) the duplicates. But for evidence processing, you might wish to have a different approach of first isolating or copying the duplicates to another location and specifically identifying them as such. The upcopy program does an excellent job of performing such a copy operation.
Many people are concerned with the possibility of a virus or ransomware inserting itself or changing key
system files, such as those within the \WINDOWS tree, which on my machine contains thousands of files. Or any
nefarious actions which may cause what should be static trees to be altered.
It has been suggested to me that one way to determine if such corruption occurr is to see if any of the key
system files have been changed. Or if suspect programs have been added to these directories. Not to say, that
normal changes occassionaly occur, but file content changes, or files added to the tree might be something to
look at.
Take for instance a standard exe called MS.EXE which might be a very important system file. The virus or
ransomware renames MS.EXE to MS1.EXE and inserts the corrupt file called MS.EXE. So when you run MS.EXE you
actually initialize the bad file which does its thing, then calls MS1.EXE to accomplish what you asked for.
On thing that could be done, is to create a static time checkpoint of the tree. Lets call it hash_test.txt.
Then periodically run another hash output of the same tree, hash_test2.txt. Compare both the original hashes
and the new hashes, and the number of original files with the new number. Any difference might be cause to
take a look at why the change occurred. Possibly it was a simple install of a new program, or it was in fact
something quite bad which caused the hash and file count alteration.
This simple process is contained within this
batch script   and with minimal alterations to set
the target tree, can be run periodically, or scheduled each nite to see what changes are there.
NOTE: THIS PROCESS WILL NOT DETECT CURRENT REAL TIME PROBLEMS, BUT MAY HELP IN DETERMINING SOMETHING IS
AMISS
This is actually the fifth step in the hash_buffet. It shows how to be able to quickly determine if a file,
whether executable, jpg, evidence or any other type of file has not been changed. The above sections like
hashcmp make use of the output files created by the hash program to create various output files, which can
then be used to test and compare various operations.
This process uses the NTFS alternate data streams to store and check the MD5 of any ONE (or many) files as the
program is run. Lets say you have a directory of executables which you want to keep an eye on that none of
them have been altered (possible virus infected), or maybe you have a single evidence file in storage that
you want to make certain the next time you look at it, the contents haven't been altered. This process can
confirm a files alteration on either a single file, or any number of files the user chooses.
How is this accomplished? A simple two step process on the NTFS file system.
Step 1: Run hash on any of the files you choose using the option: --ADDADS. This option creates an
alternate data stream containing various information relating to the file. The piece of information we are
looking at for this pupose is that the program adds to the alternate data stream the current MD5 and SHA
value of the file. You now end up with an alternate data stream with a data stream name format of:
filename_hash.txt see (hash --ADDADS) below example:
ads.htm Parent name ads.htm:ads_hash.txt ADS nameAmong other things in the ADS file is a line like this, which contains the current MD5:
C:>hash -p ... -f ... --ADSONLY --ADS_COMPThis run produces output of files containing ADS's and it checks the parent MD5 value, compares it against the MD5 contained in the ADS, and if a mismatch is found, it produces a line in the output, similar to:
ads.htm 11351CF9A1A93A7022223BFDC7578D70 ads.htm:ads_hash.txt HASH MISMATCHThe user then should examine the reason the current hash is not what was stored in the original alternate data stream. This process is easily implemented into a batch file to constantly check the original hash value of a file(s). A little practice and testing makes this a simple test for a lot of files, or a single file integrity. Top
That concludes out discussion on the buffet of Maresware hash related software. Read and practice often. Hope you enjoyed the meal.
Take a look at these related articles.
Inventory/Catalog files Creating an inventory of evidentiary files
Forensic file copying Article tests over 40 "forensic" file copiers
Forensic Hashing Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention Article test a few zipping programs and
ZIP_IT_TAKE2 More tests for your zipping capabilities.
MATCH FILE HASHES Demonstrates hash matches using Maresware.