|
|||||
Software | Training | Services | |||
Services
About
Us
Order: online
|
DATABASE RECORD EXTRACTION
Detective Bill Moylan,Nassau County, NY Police Dept Computer Crime Section In December 1998, I posted a question to the IACIS list requesting help with sorting and extracting data from a large database. I received several suggestions that unfortunately did not work out, and I struggled with SQL queries until I was cross-eyed. The timing of my posting was faulty, however, because it was right around that time that Dan Mares was busy retiring from Civil Service and entering the private arena, and he missed my first request for assistance. When I had occasion to speak to Dan directly, he was able to describe a procedure using some Maresware programs that accomplished exactly what I wanted to do. I have used the procedure several times since, and decided to write it down because I cannot easily call it to mind when I need it (Alzheimer’s, I guess). The problem involved an 8-mb master database file, containing business records. I wanted to extract only the records of transactions in a set of 1100 zip codes. The database contained 7,000 records, and the zip codes were not contiguous, so that any kind of Excel sort was not helpful. I was able to export the database to a DBASE type file, which is a FIXED-LENGTH RECORD type database file, which is necessary for this procedure. The first step was to isolate the data, by removing the first part of the file, which contained the database header, leaving the actual data records in the file. This was accomplished using Maresware Filsplit. To extract the data, Filsplit is run using parameters for the offset value (-d) (number of bytes to pass before the start of data), the record length, (-r) and the number of bytes to copy (This parameter is not exact, the value in this case should exceed the total number of bytes in the file). In order to find the record length, I used Maresware Hexdump to display the data in HEX mode, and determined the structure of each record, including the record length, and also the position within the record of the zip code field Filsplit input.fil output.fil –r 1545 -d 286 –D 9000000 Where –r = record length–d = displacement to start of data -D = number of bytes to copy Once the data is isolated, it must be sorted on any one field. In this case I used the first field, which contained the customer number. I imported the data into Excel, and sorted on the first field, saving the file as a text file. This file MUST be properly sorted in order for the remaining processes to work correctly. In order to check that the file is properly sorted, I ran Maresware Sortchek, inputting the record length, the position in the record of the first character of the sort field and the number of characters to check. Sortchek input.fil –r 1545 –p 0 –l 6 Where –r = record length-p = pass this many characters before comparing (in this case, 0 bytes) -l = number of characters to compare (in this case, 6 character customer number) When executed this program will display the number of records being checked, and will show if the records are properly sorted. It will display the location of the first missorted record, if any. Once the file is properly sorted. The actual extraction can be done. The program used for the extraction is Maresware Search, and is executed using a parameter file to hold the option values for the extraction. In this case a parameter file "param.txt" was prepared using the following values. (They must be entered in this order) 15450 (number of bytes to work on at a time, it is a multiple of record length) |