Compare

PURPOSE   OPERATION   COMMAND LINES   PARAMETER FILE   OPTIONS   RELATED PROGRAMS


Author: Dan Mares, dmares @ maresware . com
Portions Copyright © 1998-2021 by Dan Mares and Mares and Company, LLC
Phone: 678-427-3275
Last update: May 2012

One liner: compares two files on a sorted key field for matches.

Sample Maresware Batches  an executable with data that demonstrates various Maresware software. Download and run the appropriate _11_xx batch for compare demo.

All programs are command line programs.
MUST be run within a command window as administrator.
If you downloaded from the web, and find it needs registering. I apoligize, please contact me for a registered version.


top

Purpose

This program works only on fixed length files.

This program will compare two files that are sorted on the same sort field. It compares records based on the sorted field of each file.

The records can be of a different structure, but the sorted or compare field must be the same.

It will compare the records in a file designated as the “A” file with records in a file designated as the “B” file.


top

Operation

Compare reads a parameter file. The parameter file provides to the program the following information:

1. record length of each of the files. Each file can have different record content, but must be fixed length;
2. the location in each record of the sorted field. This is the field to compare on, IE: phone number.;
3. the length of the sorted field.
4. The specific fields (or data) from each record that is to be written to the output record.

When the fields in “A” and “B” are equal, the program selects fields (based on the parameter file contents) from both the “A” and "B" file and places these fields into an output record. Thus, the format of the output record is designed by the user. The output record can contain any amount of information from either the A or B record (unless the -u option is chosen in which case only information from the B record is reliable).

The program increases A more rapidly than B, so when a match is made, it only extracts data from the first occurrence of the B record with the key field in it.

It sits on the "B" file record until all of the “A” records with a matching key field are used to build successive output records. Then when an “A” record is found with a new key field, the “B” file is increased until it finds a new key that matches the "A" key to work on.

Thus, if subsequent "B" records contain the same key as was matched in the first "B" record, those records are not processed, and output records are not formed using the subsequent information.

If you need output records with all the "B" file matches, switch the input files around and make what was the "B" file the "A" file and vice versa.

A second option available is an unequal -u compare. This option compares the “B” file with keys in the “A” file. The “A” file should have unique (one) occurrences of records containing the key field. When a record is found in the B file for which there is NO “A” key match, then the output is taken from the “B” record and placed in the output file. Therefore, if you are choosing the unequal (-u: option), your output should be designed from only the "B" file.

The totals given for number of records read may not reflect the size of the entire input files. Depending on the last sort keys, it may not always be necessary to read both input files to conclusion.

Also, depending on the sort sequence and option chosen (=or u) the total read may reflect one more records read than actually used. This will most always happen if you are using blocked input files.

The format of the input files, output file, and the record structures are set up by the user in a parameter file.

NOTE on final record counts

Because the input files are sorted, and the program ends after reading one of the input files is exhausted, the final record count on the screen and in the accounting file may not reflect the total records of the files. It will only reflect how many records were necessary to be read until all comparisons were satisfied.


top

Command Line

You must provide the program with four (4) file names: Input A, Input B, the output, and the parameter file names.

C:> compare item2  item3 item4 item5 -[options]
C:> compare input_A input_B output param_file -[options]
C:> compare inputA.fle inputB.fle output.fle parameter.fle
C:> compare infileA infileB outfile param.fle -=   -r
C:> compare inA inB outparam -u
C:> compare inA inB outparam -i
  Item 1:    Program Name.
  Item 2:    Input file A.
  Item 3:    Input file B.
  Item 4:    Output file.
  Item 5:    Parameter file.
  Item 6:    Options [-][=,u,i,r,R,A].

top

Parameter File

Line 1:  Output blocking factor [which is the number of output records per output block, NOT the output blocksize which has a maximum of 32768 characters] The program calculates your output record length and automatically builds the output block. This number really only has significance when putting your output to blocked tape. However, you must place a number other than zero on line 1. Just put any small number. It is merely a place holder for most people.

Line 2:  Record length of the “A” file. No leading zeros.
    (Use findrecl.exe if you don’t know what the record length is).
Line 3:  Blocksize of the “A” input file, [max of 32768].
    (A small multiple of the record length, or the record length itself).
  
Line 4:  Record length of the “B” file. No leading zeros. (Same logic as line 2 & 3)
Line 5:  Blocksize of the “B” input file, [max of 32768].
  
Line 6:  Displacement to first character of field to begin compare of “A” file.
  (Displacement starts counting at 0. First character of record is 0)
  Can have up to five (5) displacements, seperate by spaces here. But make sure the file is sorted on all 5
Line 7:  Displacement to first character of field to begin compare of “B” file.
  Can have up to five (5) displacements. But make sure the file is sorted on all 5

Line 6 & 7: You can compare on up to five (5) fields. But make sure the file is sorted on all 5, and the number of lines in this segment have matching displcement items.

Line 8:  Number of characters to compare on. (for each field identified)
Line 9: - End.  Have 4 items per line. No spaces.

Item1: Position 1:   an "A" or “B”: to designate which file to take the output from.
Item2: Position 2-5:  Displacement to first character of the field to put to output.
Item3: Position 6:  an “=” equal sign.
Item4: Position 7-9:  Number of characters to put to output record. (max 999)

After the last line of the parameter file, and two blank lines, comments may be included.

Sample Parameter File:

5           /* 5, or number to represent output blocking factor */ 
90          /* file A record length */
900         /* file A block size, multiple of record length */
100         /* file B record length */
1000        /* file B block size, multiple of record length */ 
5           /* begin compare for file A at displacement 5 */
0           /* begin compare for file B at displacement 0 */
10          /* compare for 10, ten characters */ 
A0000=015   /*If a match, take the first output field from file A begin at position 0, and copy 15 characters.*/
B0003=020   /*Then begin at displacement 3 of file B and copy 20 characters to the output. The output record will be 40 characters long, and have fields that were obtained from both the A and B files.*/

Enhanced compare file for two field compare.
5           /* 5, or number to represent output blocking factor */ 
90          /* file A record length */
900         /* file A block size, multiple of record length */
100         /* file B record length */
1000        /* file B block size, multiple of record length */ 
5  20       /* begin compare for file A at displacement 5, then 2nd field is displacement 20 */
0  30       /* begin compare for file B at displacement 0, then 2nd field is displacement 30 */
10  5       /* compare for 10, ten characters on field 1, and 5 characters on field 2 */ 
A0000=015   /*If a match, take the first output field from file A begin at position 0, and copy 15 characters.*/
B0003=020   /*Then begin at displacement 3 of file B and copy 20 characters to the output. The output record will be 40 characters long, and have fields that were obtained from both the A and B files.*/

NOTE: if the record is longer than 999 characters, and you wish to copy the entire record, split up the lines like:

A0000=999
A0999=remaining character count
Download sample files.

top

Options

-=     Is for equal compare (default is equal).

-u     Is for unequal compare.

-r     Add newlines. On unix newline =0x0a, DOS = 0x0d0x0a

-R    Add newlines. On DOS = 0x0a, on UNIX = 0x0d0x0a

-A    Invoke the accounting file option. Create an accounting file in current directory called acct-ing.

-1 + logfile   (that's a one, not ell) Invoke the accounting file option. Create an accounting file identified by the logfile name. Use this in place of the -A.

-i     Make the comparison case sensitive. The default is to ignore case in the comparison.


Related Programs

Hashcmp

top