HASHCMPV

Special modification of hashcmp.

PURPOSE   OPERATION   OPTIONS   COMMAND LINES   RELATED PROGRAMS


Author: Dan Mares, dm@dmares.com
Portions Copyright © 2005-2021 by Mares and Company, LLC
Phone: 770-242-6687 X 119

top

PURPOSE

The program HASHCMP is designed to display the differences in output files produced by the HASH program. The HASHCMPV program is a special modification created that will process the output of the MD5 and other program which generate variable length data records that contain an MD5 or other hash (SHA) field.

The MD5 (and HASH) program create a listing of MD5, SHA or CRC32 totals for files. These output files can be analyzed by HASHCMPV to determine if there are any records in "file1" that are not in "file2".

This analysis can be used to show what files have been altered from the time the 1st and 2nd outputs were created indicating possible file alteration.

Under certain circumstances the original HASHCMP can also compare 2 outputs of any other program which produces a fixed length record output. Two such programs are: DISKCAT, CRCKIT. Other programs which produce fixed records can also create files which hashcmp can use as inputs (such as the MD5 program).

HASHCMP is designed to comare the contents, line by line, of two files with similar records. However, for HASHCMP to work properly, the records in the files MUST be of a fixed length. If not, they can be made fixed length using the fix_recl program from Mares and Company.

When the program finds records in one file that do not have a match in the other file, the program displays the mismatch to the screen (or output file if the -o option is used). Each line MUST be 100% identicle. HASHCMP, except in very special circumstances will not parse the record for comparisons, and uses the entire record. (except if the -d, -h or -l options are used).

The HASHCMPV program was designed to compare records of files whose records (rows) are not similar in length (variable length records), but have a common hash or MD5 field. There are limitations to the capability of HASHCMPV to determine which field is the hash field, and how to compare it to the other file. The limitations, or rather requirements of the data record layout are explained below.

In its simplist operation, we assume MD5 or HASH was run on a disk drive at two different times. In order to determine if any files had changed you would want to compare the specific HASH records for each file. On some systems this could mean as many as 75000 or more files. HASHCMP is designed to take the outputs of two different runs which have essentially identicle information, record for record and compare the two files. If any records are found in one file that is not 100% matching that record in the other file, the record is printed to the screen. (and with appropriate option, printed to an output file).

Because HASHCMP expects the records in both input files to be identicle in format it can be used to compare records in files that were created by different programs. Providing the records were identicle in size and content. HASHCMP will then compare the records and show which ones don't match.

Because it attempts to handle both files in memory, there is an arbitrary limit of 250,000 records in each input file. If you need alterations let me know.

Recently, users have been generating hash values and placing them in csv files which generate a record of variable length. For this reason, HASHCMPV was modified to allow for variable length records. Providing the hash field is always in the same location.


top

OPERATION

The Hashcmpv program has the enhancement that it will attempt to find the hash value in the file and key on it. This is because the enhanced MD5 program can produce an output where the hash value is the first item on the line. OR: if you are using Excel or other program to generate a file where the hash value is output as the 1st item in the record. In cases where the hash value is NOT the first item, maybe it is the 2nd item after the filename, then HASHCMPV attempts to identify this location and work with it. It uses some artificial intelligence, which is often an oxy moron. So if the hash value is NOT the first item on the line, caution should be used when viewing the outputs.

When the hash is not the 1st item on the line, the positioning of the hash value is not in this default location and must be located by the programs internal tests. Or the user can explicitely specify the locations with the -d and -l options only if the position of the hash is in a fixed postition relative to the first character of the record.

The default program operation is to show mismatches from both input files. Meaning it will show all lines in file 1 not found in file 2, and it will also show all lines in file 2 not found in file 1.

There is an option to cause it to show only records contained in file 1 and not in file 2. (the -1 option, that’s a one, not ell)

There is also an option to show only records contained in file 2 and not in file 1. (the -2 option)

Depending on the needs and reasons for running the HASH program, any of the three above comparisons could be used.

Next file one is read into memory and sorted. The entire record length is used for the sort.

Then file two is read into memory and sorted.

Then the two files are compared first file 1 is compared to file 2, and next file 2 is compared to file 1. This cross comparison produces output which shows records existing in either file and NOT existing in the other file.

Don't forget, if we have the same filename in both files being compared, but the hashes have changed, we would get an error for both passes. This is because the hash value of the first run, can't be found in the second run, and appropriately so, the hash value associated with the filename in the second run is not identical to the one in the first run. So there are 2 errors output, while they are referencing the same filename.

By default the HASHCMPV program ONLY compares on the hash value field and not the full record length (which would include the filename). If you need to compare on the entire record, you need a fixed length record and you need to use either HASHCMP or the COMPARE program.

NOTE:

If it becomes necessary to hard code in the location and length of the hash field, use the -d option which sets the displacement of the hash field. When using the -d option, you start counting from 0, not 1. Adjust the value (-d X) to however many characters you wish to pass. If you are using the -d option, you should also consider the -l option which forces a length restriction on the number of characters to check.

A SIMPLE HASHCMP PROCEEDURE

1. Run HASH on the system and create a "reference" file.

2. Run HASH on the system at a later time to create a "current" file. This reflects the state of the system at the current time. If any file has been altered in any way, the HASH value will change and show up in the output.

3. Run HASHCMPV to compare the reference file with the current file.

HASHCMPV will show on the screen its progress, and indicate which lines in file #1 (current) are not found in file #2 (reference), and vice versa.

Don't forget, the HASHCMPV program attempts to identify and compare only the hash field. If the entire record is wished to compare, use either HASHCMP, COMPARE. But both of these other programs require fixed length records, or identical format. (i.e. record layout structure)


Processing Hints

Here are some scenarios that you might want to follow or adopt.

Scenario 1. Forensic analysis. -

The object here is to be able to testify that the contents of files were not altered from the time of seizure to court. You would run HASH on the suspect system as soon as possible after the seizure. This creates a reference file. (reference.fil). Then, at a later time you run HASH again on the system and create a current output file. (current.fil). This records the state of the files now. Then run HASHCMP against both files.

C:>HASHCMP  current.fil  reference.fil

You should not see any differences in the output.

Scenario 2. Your own system references (virus detection/file alteration)-

Run HASH at some point to get a reference output.

At later times run HASH against either the entire system, or selected files. Then run HASHCMP with the -1 or -2 option (depending on which file you input first on the command line).

This will show you if the current files have been changed. If the files have been changed, the output will reflect a different hash value, and you might need to investigate the reason for alteration.

Scenario 3. General File Comparison

You must have two files with the exact same record layout (line structure).

Run HASHCMP on the two files (with or without the -12 option) to see which lines show up in one and not the other.


top

HASHCMP OPTIONS

-1  (that’s a one, not ELL) Only show output of lines in file 1 that do not appear in file 2.

-2  Only show output of lines in file 2 that do not appear in file 1. (note the option -12 is the same as the default of show both file mismatches)

-i:  'I'gnore the case of the records.This is useful if comparing two files created with Windows 95 or Windows NT, since both these program maintain case in their paths.

-[Hh]:  Compare ONLY on the hash field. With this option, if the hash field is NOT the first field, the hash field is best defined using the -d xx -l xx field location/width options

-[oO] + outputfile:   Write output to "outputfile" creating a log of process. If uppercase (-O) is used then output file is also sent to default printer 75 characters per line with no formatting.

-d + #:  Replace # with a number from 1-xx. Where # is the displacement in the record to start the comparison. Use this if the two files have different drive letters as the 1st section of the path. (i.e. D:\path\....., C:\path\.....).

-l + #:  (that’s an ELL, not one) Restrict length of compare field to this many characters. This can be used to restrict the length of the field to the correct number of characters of the MD5 or SHA field length. Otherwise, the entire record from the first character, or the character identified in the -d option is used. If the filesize, data, time, are not important, and only the hash or SHA is needed, this is a very useful option. It is best used with the -d, pointing to the first character of the SHA or MD5 field. (ie: -d 80 -l 40 ). See the -h option next for a shortcut to the -d -l option.


top

COMMAND LNIES

C:>HASHCMPV file1.new file2.ref
compare file1.new to file2.ref

Following is the only way to get the output to go to an output file. You MUST redirect the output.
C:>HASHCMP file1.new file2.ref -o output.fle
compare file1.new to fiel2.ref and redirect output to output.fle A logfile is also created with some statistics. named: output.fle.log

C:>HASHCMPV file1.new file2.ref -1 -o outputfile
compare same 2 files, and show only items in file1 not in file2

C:>HASHCMP file1.new file2.ref -2
compare same 2 files, and show only items in file2 not in file1

C:>HASHCMP file1.new file2.ref -2 -d 1 -o output.fle
compare same 2 files, and show only items in file2 not in file1 starting at the 2nd character of the record. This -d might be necessary if you have a csv file which quoted the first hash field. "1234567890ABCDEF","etc"

C:>HASHCMP file1.ref file2.new -2 -d 1 -l 32
This is starting the compare at character 1 (assuming that is the start of the hash field), and compares for only 32 characters, the length of the field. This forces the program to ignore all other parts of the record.


top

RELATED PROGRAMS

HASHCMP
CRCKIT
HASH
DISKCAT
MD5
top