Bsearch

PURPOSE   OPERATION   COMMAND LINES   PARAMETER FILE   OPTIONS   RELATED PROGRAMS


Author: Dan Mares, dmares @ maresware . com (you will be asked for e-mail address confirmation)
Portions Copyright © 1998-2021 by Dan Mares and Mares and Company, LLC
Phone: 678-427-3275

One liner: bsearch searches key field of a sorted file.

Sample Maresware Batches  an executable with data that demonstrates various Maresware software. Download and run the appropriate _07_xx batch for bsearch demo.

This is a command line program.
MUST be run within a command window as administrator.


Purpose

Bsearch will search a file using binary search routines. Binary searching of a file is extremely fast.

The file to search must be a fixed length record file. Since the input file MUST be of a fixed length record, carriage returns at the end of each record are unnecessary.

The input file MUST also be sorted on the field in which the key is located.

The program obtains from the user (on the command line) the name of an ascii text file (called a parameter file) containing the following: information about the structure of the input file; its record length; location of the key field to search; length of the key field (which must be fixed in size); and the keys to search for.

This program is mainly designed as a data file analysis tool to analyse data obtained from mainframe and Unix data base files.


top

Operation

When the input file is sorted on the key field, the program does a binary search for records that contain the search keys specified in the parameter file. If records are found, they are placed in the output file.

The number of keys permitted to search for is unlimited. Because the program only reads one search key at a time, there is theoretically no limit to the number of search keys the parameter file can contain. But if more than 50 are used, the program speed degrades to the point that the Search program would be more efficient.

The fact that the program does a binary search to find records with the search keys means that it is much faster than the Search program. However, the speed slows down in a linear fashion as the number of keys increases. If you have a very large sorted file, and a few search keys, this program is remarkably fast. The program can search for one key in about 2 seconds regardless of the size of the input file.

It will also allow you to search a second field of the record for a second key match. The binary search is done only on the first field. When the first field matches, it then goes through the keys in a second key file to determine if any of these keys match the one in the record.

There are two types of searches you might want to conduct. The first is "logical AND", meaning that one of the keys in the first key file must be present, AND one of the keys in the second key file must be present. The second type of search is "logical NOT", meaning that a key in the first parameter file must be present, and there are NOT any keys matching the second parameter key field(so, a key match will be found in field 1 but not in field 2).

The -k and -K options are used to designate that a second field is also to be searched. The second key parameter file is identical in structure to the first parameter key file. It must have the blocksize and record length the same as the parameter file. The displacement and length of the key field are naturally going to have to reflect the proper size and placement in the record of the second field to check. See parameter file descriptions and sample command lines in the Search program.


top

Command Line

C:>bsearch input output paramfile -[options]

Item 1: Program name[bsearch]
Item 2: Input file
Item 3: Output file
Item 4: Parameter file name
Item 5: 0ptions -[aAcilkKnrR1]

Samples:

C:>   bsearch    input.fle    output.fle    param.fle    -aA
/* append to output, and create accounting file */

top

Parameter File

The parameter file is an ascii text file created with a text editor, not word processor. It has the following structure:

LINE 1: Input File Blocksize (must be A multiple of record length, MAX 32000)
LINE 2: Input File Record Length
LINE 3: Displacment (from 0) to field to search
LINE 4: Length of field to search
LINE 5-X: Keys to search for, one per line, with a blank line at end

Sample:

32000
80
21
9
123456789
abcdefghi

CAUTION NOTE:
When building the parameter file for either bsearch or search it is very important that there is only one occurance of each search key. If by accident you duplicate a search key, ie:
123456789
123456789
Then the output will contain multiple instances of the records which are hit upon. And thus give eroneous output total counts. In testing, it took me two days to figure out what was the problem. And it turned out, a simple error in forming the parameter file.


top

Options

-a   (a)ppends output to existing output file

-A   (A)ppends/creates accounting file (ACCT-ING) containing accounting information/statistics

/A   Turns off auto-accounting set in environment

-c   (c)onverts all unprinting characters in input record to hex 7f == ~

-1 + #   Converts unprinting characters to the decimal character indicated by #. (i.e. upper case A is 64)

-1 + Z    Converts unprinting characters to a ZERO

-1 + B   Converts unprinting characters to a BLANK

-k + filename  Filename is name of file containing keys to search for in a second field. See note below on 2nd key.

-K + #  Indicates which logical search to perform on the second key field. They are:

1==   Field 1 and field 2 must be matched with keys in the respective key file. Logical (AND) (field 1 AND field 2)

2==   Field one matches and none of the keys in field 2 match. Logical (NOT) (field 1 and NOT 2)

-I    (i)gnore case of search keys

-n    Converting unprinting characters, leave carriage returns alone

-r or -R  Adds carriage returns at end of output records. The small ‘r’ means add the carriage return that is native to that computer system, and the ‘R’ means add the carriage return that is foreign to that system. The native carriage return for IBM is a carriage return line feed pair which is 0x0d 0x0a. The native return for UNIX is single carriage return 0x0a. The foreign return for IBM is 0x0a, and for UNIX is 0x0d 0x0a.


top

Related Programs

Disksort

Search

TOP