Maresware Hash software or: a buffet of Hash software


Before you get into this article, you might read these associated sequence of articles.

Start here:

Inventory/Catalog files  Creating an inventory of evidentiary files
Forensic file copying  Article tests over 40 "forensic" file copiers
Forensic Hashing  Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention  Article test a few zipping programs and
ZIP_IT_TAKE2  More tests for your zipping capabilities.
ZIP FILE/container  Hashing your zip container reliably
MATCH FILE HASHES  Demonstrates hash matches using Maresware.
the one you are on: A HASH software buffet   How-to use Maresware hash software


 

ABSTRACT

Original version: January 2021.
Updated Feb. 2022

Often, outside of the usual suite menu of processing forensic evidence you may wish to be able to perform various other processes using hash values of files. Such processes may include the calculation, comparison, analysis of the hash values of files in a directory or tree structure.

Check out the NIST NSRL   data sets and their information regarding using "known" hash values in your investigation.

A. You may need to simply calculate hash values for future retention, or reprocess the data for some additional refinement. hash.exe

B. In a forensic/evidentiary environment, you have (hopefully) forensically copied  files from subject SOURCE_A location, to work/examination DESTINATION_B and wish to make sure the files were copied correctly, and the destination is an exact copy of the source location. To do this, you will most probably hash A, hash B, and compare the two hash files for differences. hashcmp.exe

C. In this same forensic environment, or even on your own personal computer, you have directory source, and directory destination, and you wish to see what files may be on one location and not on the other. This may occur when you are doing simple backup from pointA to pointB. hashcmp.exe

D. You have multiple directories/folders/trees on a drive, and want to see what files may be duplicated across the multiple locations. I run into this situation routinely when I copy files from my cell phone to my permanent computer storage. (I'm a terrible housekeeper.) Often placing them in different locations, or renaming the files, which cause duplicates to show up based on hash.
hashdup.exe

E: You want to constantly be able to check to see if a file (executable or otherwise) has been changed. The crudimentary process in a batch file could be devised to determine if files were added or changed. Create a reference set of MD5 values, then periodically run MD5 using the --MATCH option with the "known" MD5's as an input.

F: You want to constantly be able to check to see if an NTFS file (executable or otherwise) has been changed. Use the combination of --ADDADS and --ADS_COMP options. See below.

Check out the hash_matching article which contains a number of processes and programs contained in a simple (basic) batch file located within this hash_test.zip file which will demonstrate the actions descibed here.


Please review these two articles before going further.
hash_software_tests   This hash_test_article on testing your hashing software in a forensic/evidentiary environment, and how the hashing software will (notice I didn't say may) fail strict cross examination.
hash_matching  article describes a number of processes to accomplish hash matching of hash data files using various Maresware software


Here are the separate sections of this article for you to jump quickly to the section that interests you;
Table of contents of this article:
 Jump to this Section:                                        Help file and executable download

MD5     basic hash program. simple but efficient.               Manual       md5.exe
HASH    next level hash program.                                Manual       hash.exe
HASHCMP compare hash files.                                     Manual       hashcmp.exe
HASHDUP find duplicate hash values within a single hash run.    Manual       hash_dup.exe
HASH_VERIFY Verify md5 integrity the easy way.


There are probably numerous other situations to use the file hash values. One other that comes to mind, it that you may wish to compare your hash data with the NIST NSRL data sets.

In short, the programs shown above are specifically designed to work with each other to:
1. MD5 performs hash calculations (md5, sha) on files within a directory/tree structure. Provides basic output format.
2. Hash creates fixed length records of the hash (md5, sha) data that is calculated. Provides more verbose capabilities than the MD5 program.
3. Hashdup performs calculation on the data set to see which files are duplicates based on hash value. (This can be used to deduplicate known good files or to identify multiple instances of evidence relevant files.)
4. Hashcmp performs calculation to see which files are contained in SOURCE and not found in DESTINATION. (Your forensic copy didn't quite work. What a surprise!)

In addition, other Maresware software can used to further analyze the outputs of these software programs, and almost any fixed length data file your software will provide. Other useful Maresware software includes: diskcat, search, bsearch, compare, and filbreak; which can be used to further analyze the data. All the respective help files, and executables can be found on the Maresware home page.
Besides using these programs which are specifically designed to work with each other on Maresware hash related data, you may have access to many other non-Maresware applications which can reprocess this data for forensic or evidentiary requirements. Don't limit your software menu.

The explanations and examples provided in this document are basic and not in any way inclusive of the capability and usefulness of the software. I only provide top level examples and explanations in order to give you a taste of the capability, and hope you will want to partake more of the hash buffet available.

Below are external links to the hash family of executables and help files.

md5   Help file, and md5.exe download.
hash   Help file, and hash.exe download.
hashcmp   Help file, and hashcmp.exe download.
hashdup   Help file, and hashdup.exe download.

Top

MD5

The md5.exe  program is the most basic of the Maresware programs able to calculate the md5 hash value of files. The md5 program (as other Maresware programs) can also calculate various SHA values of the files.

In its default operation the md5 program displays basic information of a filename, file size and the md5 value, along with some "accounting" information.
Below is a sample of 4 lines of output from a larger run. The basic output format is just that, filename, md5 and file size and is adequate for most uses.


command line:      C:>md5
grep.exe          BA67233FAAFB95316E6CCAD42438BBBC       160768
Search.exe        4E8DC094BD055248C406D6A0814A9C4D       198344
sed.exe           E26824B098033E9682850673AB548B7E        82944
Total.exe         7B29E04B436F3D581F0144DB0CA04FF3       159432

  1 directory, 8 files, 1,111,076 bytes, 1.11 MB

Notice the filename is first, then the md5, then the file size.
With appropriate options, the output will contain fixed length records to be able to be processed by other Maresware software, it can also output a pipe ( | ) delimited file for import to spreadsheets, and be easily reprocessed.
It can also contain one or all three file dates, which include date and time in milliseconds.
And if necessary, can add full path to the output record.
With this next command, we ask for the create time to be included.
command line:      C:>md5 -tc
grep.exe         | BA67233FAAFB95316E6CCAD42438BBBC |  160768 | 01/01/2019 | 11:30:01:000c | EST |
Search.exe       | 4E8DC094BD055248C406D6A0814A9C4D |  198344 | 01/01/2019 | 11:30:01:000c | EST |
sed.exe          | E26824B098033E9682850673AB548B7E |   82944 | 01/01/2019 | 11:30:01:000c | EST |
Total.exe        | 7B29E04B436F3D581F0144DB0CA04FF3 |  159432 | 01/01/2019 | 11:30:01:000c | EST |

  1 directory, 8 files, 1,111,076 bytes, 1.11 MB

If necessary, we can place the md5 value at the beginning of the line.
(path truncated for display. default is variable length path output. With the -w xx option, a fixed length path is made. suggest using >255, such as -w 255)
command line:      C:>md5 -tc --nameafter
 BA67233FAAFB95316E6CCAD42438BBBC   |  160768 | 01/01/2019 | 11:30:01:000c | EST | D:\...\grep.exe
 4E8DC094BD055248C406D6A0814A9C4D   |  198344 | 01/01/2019 | 11:30:01:000c | EST | D:\...\Search.exe
 E26824B098033E9682850673AB548B7E   |   82944 | 01/01/2019 | 11:30:01:000c | EST | D:\...\sed.exe
 7B29E04B436F3D581F0144DB0CA04FF3   |  159432 | 01/01/2019 | 11:30:01:000c | EST | D:\...\Total.exe

  1 directory, 8 files, 1,111,076 bytes, 1.11 MB
The choice of options and command line operation makes this a basic choice for batch files, using basic md5 calculations.

Top

HASH

The next item in our hash buffet is hash.exe. This is the next more complex and verbose Maresware program to calculate the md5 hash value of files.
The hash   program (as other Maresware programs) can also calculate the SHA values of the files.

In its default operation the hash program displays the path, hash, size of a file and write date/time.
By default it recurses the directory tree to which it is pointed, so your output is more inclusive (with path, and dates) than the md5 program.


command line:      C:>hash
D:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\sed.exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\Total.exe     7B29E04B436F3D581F0144DB0CA04FF3   159432 01/01/2019 11:30w EST

  0 directories, 10 files, 1,746,356 bytes, 1.75 MB

As with all Maresware software, the output of fixed length (-w xx) output records is possible (but not default), and the records can be pipe ( | ) delimited (-d "|").
Also, with appropriate options, any or all three file times can be included. The main difference between hash and md5 is that hash defaults to recurs the directory, and prints the full path by default.
If fixed length output is desired, (for passing to hashcmp and hashdup) the -w max_width option should be used to make certain the first field (path) is wide enough to provide the full path without trun/../cations.
If using a default command line: C:>hash -p c:\tmp -o outputfile -d "|" -v -w 300 -AT3 --milli --GMT
You end up with a substantially long (403 characters) output file of fixed length records. shown below with file path spaces truncated for legibility.

C:\TMP\JUNK_DEL\HASH_TEST\DEST\IN_DEST_NOT_IN_SRCE.htm | A5FFFF636E13C049F3E0ABB6A37A8D8A| 36687|2021/01/27|18:17:22:087c|2021/01/08|14:49:50:451w|2021/02/01|03:18:00:925a|GMT| A......

As with almost all Maresware, output files can be created for passing the output to additional batch processing.
The following command adds a pipe delimiter, and the last write time. (-w option is eliminated in this display to shorten the screen display)
command line:      C:>hash -tw -d "|"
D:\TMP\TEST_FILES\EXES\Search.exe  |  4E8DC094BD055248C406D6A0814A9C4D |  198344 | 01/01/2019 | 11:30w | EST
D:\TMP\TEST_FILES\EXES\sed.exe     |  E26824B098033E9682850673AB548B7E |   82944 | 01/01/2019 | 11:30w | EST
D:\TMP\TEST_FILES\EXES\Total.exe   |  7B29E04B436F3D581F0144DB0CA04FF3 |  159432 | 01/01/2019 | 11:30w | EST

  0 directories, 10 files, 1,746,356 bytes, 1.75 MB
Top
HASHCMP

When you have the output of two hash runs, from SOURCE and DESTINATION, and you wish to see what files may show up on SOURCE that are not on DESTINATION, or vise-versa, the hashcmp program will do this easily.

The hashcmp  program is the simplest way to compare two hash data sets to see which files either match or do not match from one hash run to the other. Hashcmp is the most basic of the processes to compare two hash outputs.

Check out these two sample records (one from source, the other from destination), with spaces truncated for legibility, comparing two files of identicle length and format.
Assume C:\TMP is the original data, and D:\TMP is the duplicated or copied data set that has a hash copy error.

C:\TMP\ZIP_IT.htm | C772D55C42A41B4E6F261F28B8DAA7FF | 12072 | 06/14/2019 09:02:22w EST

D:\TMP\ZIP_IT.htm | D772D55C42A41B4E6F261F28B8DAA7FF | 12072| 06/14/2019 09:02:22w EST

Although hashcmp was originally designed to operate on the output of the Maresware hash.exe program, with a little thought and understanding of its operation, the hashcmp program can be adapted to process/compare any two files of identical fixed length data that have a common sorted field such as the MD5 field. (for instance, compare two directory listings to see what might match or not).

Hashcmp takes two "fixed length record" files created with the hash.exe program, and compares them either on the entire record length, or just the hash value. In most cases you would want ONLY to compare on the hash value. The intent is to ensure that you have accurate copies of evidence files in both the SOURCE1 directory, and a DESTINATION1 work location. To make sure all the hashes in SOURCE and DESTINATION match.

NOTE: Again, a reminder, hashcmp is designed to compare two identical formatted files on a single field (ie: hash value). This generic comparison is regulated by the appropriate -d and -l (ell) options, not described here.

Now that we have two hash runs available, we can run the hashcmp program.
The generic hashcmp program is:

C:>hashcmp  SRCE.out  DEST.out  -o  mismatch.out  -h 
What you will get from an actual run is an output file containing references like the item shown below where the program identifies the hash values found in a SOURCEA file that don't match the DESTINATIONB file. Take notice of the different hash values. (spaces truncated for legibility)
found in SRCE.out not in DEST.out | C:\TMP\ZIP_IT.htm | C772D55C42A41B4E6F261F28B8DAA7FF  | 12072 ....
found in DEST.out not in SRCE.out | D:\TMP\ZIP_IT.htm | D772D55C42A41B4E6F261F28B8DAA7FF  | 12072 ....
Notice that because the same file has two different hash values, you actually get two records in the output. One references the hash value in file1 not in file2, and conversely, one value found in file2 not in file1.
Check the help file for explanation on the appropriate hashcmp command option to only show those in file1 or file2 in the output mismatch.

The final three line batch file you might use to accomplish this process is shown here, and contained in the hash_matching   article. First, hash the source, then hash the destination, then run hashcmp against the two outputs.

C:>hash     -p  x:\source1_folder        -f  files_to_hash(usually *.*)  -w  300  -d "|"  -o  SRCE.out   -R  -1 logfile1
C:>hash     -p  x:\destination1_folder   -f  files_to_hash(usually *.*)  -w  300  -d "|"  -o  DEST.out   -R  -1 logfile1  
hashcmp  source.out  destination.out -o  mismatch.out  -h 
There are some minor drawbacks or requirements to the hashcmp process. To see a more complete explanation on how to use hashcmp, check out the hashcmp section of the article. Top

HASH_DUP

The fourth and final part of our hash buffet is the hashdup program.

Hashdup is designed to analyze a single fixed length record file on the hash field and produce an output file showing which files have duplicate hash values.

This operation could be used to see how many copies of a suspect fileX the person may have in different locations on the drive under different, or the same name, or when run against your own drives hash list, it could point out how many copies you have of pictures of your family in different locations. This is how I use it every time I download pictures from my cell phone to my main storage location. I rehash all the photos and then run hashdup to see how many duplicates show up. I guarantee you will always find some.

Now, back to our hash buffet.
Below is a sample of a few hash records.

D:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\sed1.exe      E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\Total.exe     7B29E04B436F3D581F0144DB0CA04FF3   159432 01/01/2019 11:30w EST
E:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
E:\TMP\TEST_FILES\EXES\sed2exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
G:\TMP\TEST_FILES\EXES\Search.exe    5E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
G:\TMP\TEST_FILES\EXES\sed3exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST

Now, let's sort on hash and find out if there are duplicates. (this sorting is done internal by the program)

D:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
E:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
G:\TMP\TEST_FILES\EXES\Search.exe    5E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\Total.exe     7B29E04B436F3D581F0144DB0CA04FF3   159432 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\sed1.exe      E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
E:\TMP\TEST_FILES\EXES\sed2exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
G:\TMP\TEST_FILES\EXES\sed3exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
Notice which hashes show up more than once. The search.exe shows up in two locations, while the sedX.exe shows up in three locations.

After running the program, this is the file that is created with the following stats shown on the screen.
Here are the statistics from the run. (blank lines added for legibility)
Processed 7 (files)
There were 2 duplicate sets found

E:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST

G:\TMP\TEST_FILES\EXES\sed3exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
E:\TMP\TEST_FILES\EXES\sed2exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\sed1.exe      E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST

The program indicates that two files exist with duplicate or more copies.
Now it's up to you to determine what to do with this list.
When I run it on my own volumes, I remove from the list those copies I wish to retain,
(in the display above, I would remove from the list those items on the E: drive that I want to keep in place, on the other drives)
leaving me with a list like: (notice only the D: and G: drive items remain in the output file.)

 
D:\TMP\TEST_FILES\EXES\Search.exe    4E8DC094BD055248C406D6A0814A9C4D   198344 01/01/2019 11:30w EST
G:\TMP\TEST_FILES\EXES\sed3exe       E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST
D:\TMP\TEST_FILES\EXES\sed1.exe      E26824B098033E9682850673AB548B7E    82944 01/01/2019 11:30w EST

Then I run the duplicate file list file thru the Maresware: rm or rmd program, using the -S option thereby removing (deleting) the duplicates. But for evidence processing, you might wish to have a different approach of first isolating or copying the duplicates to another location and specifically identifying them as such. The upcopy program does an excellent job of performing such a copy operation.

Top


VERIFY WITH HASH

Many people are concerned with the possibility of a virus or ransomware inserting itself or changing key system files, such as those within the \WINDOWS tree, which on my machine contains thousands of files. Or any nefarious actions which may cause what should be static trees to be altered.

It has been suggested to me that one way to determine if such corruption occurr is to see if any of the key system files have been changed. Or if suspect programs have been added to these directories. Not to say, that normal changes occassionaly occur, but file content changes, or files added to the tree might be something to look at.

Take for instance a standard exe called MS.EXE which might be a very important system file. The virus or ransomware renames MS.EXE to MS1.EXE and inserts the corrupt file called MS.EXE. So when you run MS.EXE you actually initialize the bad file which does its thing, then calls MS1.EXE to accomplish what you asked for.

On thing that could be done, is to create a static time checkpoint of the tree. Lets call it hash_test.txt. Then periodically run another hash output of the same tree, hash_test2.txt. Compare both the original hashes and the new hashes, and the number of original files with the new number. Any difference might be cause to take a look at why the change occurred. Possibly it was a simple install of a new program, or it was in fact something quite bad which caused the hash and file count alteration.

This simple process is contained within this batch script    and with minimal alterations to set the target tree, can be run periodically, or scheduled each nite to see what changes are there.

NOTE: THIS PROCESS WILL NOT DETECT CURRENT REAL TIME PROBLEMS, BUT MAY HELP IN DETERMINING SOMETHING IS AMISS

Top

VERIFY WITH ADS

This is actually the fifth step in the hash_buffet. It shows how to be able to quickly determine if a file, whether executable, jpg, evidence or any other type of file has not been changed. The above sections like hashcmp make use of the output files created by the hash program to create various output files, which can then be used to test and compare various operations.

This process uses the NTFS alternate data streams to store and check the MD5 of any ONE (or many) files as the program is run. Lets say you have a directory of executables which you want to keep an eye on that none of them have been altered (possible virus infected), or maybe you have a single evidence file in storage that you want to make certain the next time you look at it, the contents haven't been altered. This process can confirm a files alteration on either a single file, or any number of files the user chooses.

How is this accomplished? A simple two step process on the NTFS file system.
Step 1: Run hash on any of the files you choose using the option: --ADDADS. This option creates an alternate data stream containing various information relating to the file. The piece of information we are looking at for this pupose is that the program adds to the alternate data stream the current MD5 and SHA value of the file. You now end up with an alternate data stream with a data stream name format of: filename_hash.txt see (hash --ADDADS) below example:

ads.htm                    Parent name
ads.htm:ads_hash.txt       ADS name
Among other things in the ADS file is a line like this, which contains the current MD5:
HASH: 5ADB6FF13BD2905A03BDD14611A348AB

Now at a later date, or an time you wish, run the hash program with the following base command, using other options as appropriate:
C:>hash -p ...  -f  ...   --ADSONLY  --ADS_COMP
This run produces output of files containing ADS's and it checks the parent MD5 value, compares it against the MD5 contained in the ADS, and if a mismatch is found, it produces a line in the output, similar to:
ads.htm                        11351CF9A1A93A7022223BFDC7578D70
ads.htm:ads_hash.txt           HASH MISMATCH
The user then should examine the reason the current hash is not what was stored in the original alternate data stream. This process is easily implemented into a batch file to constantly check the original hash value of a file(s). A little practice and testing makes this a simple test for a lot of files, or a single file integrity.

Top

 

That concludes out discussion on the buffet of Maresware hash related software. Read and practice often. Hope you enjoyed the meal.

 

Take a look at these related articles.

Inventory/Catalog files  Creating an inventory of evidentiary files
Forensic file copying  Article tests over 40 "forensic" file copiers
Forensic Hashing  Article tests over 30 "forensic" hash programs.
ZIP-IT for forensic retention  Article test a few zipping programs and
ZIP_IT_TAKE2  More tests for your zipping capabilities.
MATCH FILE HASHES  Demonstrates hash matches using Maresware.

 


copyright © 2022-2024 by Dan Mares

I would appreciate any comment or input you have regarding this article. Thank you. dan at dmares dot com,