PURPOSE OPERATION OPTIONS COMMAND LINES RELATED PROGRAMS
Author: Dan Mares, dmares @ maresware . com (you will be asked for e-mail address confirmation)
Portions Copyright © 1998-2020 by Dan Mares and Mares and Company, LLC
Phone: 678-427-3275
The Catalog program is designed to be run on UNIX and LINUX systems. Currently it is only compiled for Linux i386. However, a port to other platforms should be rather quick and easy.
It uses the same algorithm to search directories as the LINUX hashl and strsrch programs.
It is designed to traverse the entire file system and produce a list of all files on the system. (thereby making a "catalog" of the system.)
You will note that the ls or find program will also accomplish these tasks. However, the output of ls or find is not fixed, and is not conducive to importing into a data base for further analysis. Catalog is a fast, quick and simple to use program that will provide output in a standard fixed format that is easily manipulated by databases or other programs. It is easy to learn, and you won’t be seeing differences from UNIX to UNIX manufacturer.
The output of Catalog is a fixed length record, which is easily processed by data manipulation programs(ex.,databases).
By default it will provide a listing of all files. This can be customized by use of the -f (file) option to selectively identify only files meeting specific file name criteria. (ex., .*.c).
The user provides Catalog with appropriate options on the command line. Catalog can run from a script file which means that for forensic purposes it can run unattended.
The program should be run while the user has root privilege. This is so that the program will be able to search directories that are restricted to normal users.
The user must supply, as a minimum search criterion, a file type or path. Options are available for modifying how the program identifies files.
Catalog can search for specific file types (exs., *.c, *.doc), or search down selected paths. More than one file type, and more than one path can be used at once. The file types and paths provided by the user on the command line are used to build a matrix which Catalog uses to identify the files.
After Catalog has determined that it has enough information, it proceeds to find and list all the files fitting the criteria. It then prints the output to the screen. Alternately, if an output file was requested, it writes to the output file. Catalog does NOT write to the hard disk unless specificially requested to do so by the user. (i.e., you asked for an output file).
The output is a fixed length record that can be imported into a database for reference and cross matching with an output which is generated later. Depending on the type of output chosen, the length of the record changes. But within any one run of the program, the records will be the same size.
In tests cataloging a basic LINUX box on a 166 MHZ 586 it took about 3 minutes to find and list over 25000 files.
Since this program only lists the files, there is no alteration of the files' last access date which may be important in some instances.
The output of the program is intended to be placed in an output file for future reference.
Here is a sample of the default output to a file. The column headings are shown just for legibillity and do not actually show up in the output file.
NAME |SIZE |UID| MODE |Access time |CREATE time |WRITE time u_base.c 11205 500 100644 06/15/1998 07:19a 06/15/1998 11:53c 06/15/1998 11:53w
The items in the output file are:
1: NAME: the complete filename, including path. (Path is not shown here
to conserve space.)
2: File size:
3: UID, User ID of the owner of the file. This can be crossmatched with the password file to find out who this is.
4: Mode: The file permissions. Notice there is more information than you are
normally used to seeing. This is because there is actually more information
in the file mode field than is usually indicated by the ls -l command.
5: Three file date/times: Access; Create; and Modify(w) time.
The default name/path is roughly 80 characters long, making the overall record length about 150 characters long. The -w option allows you to alter the width of the path/name field for larger or smaller output records.
The various file modes are listed here as hex masks. Notice that the low order 3 characters reflect the normal file permission locations, and the others are more specific file types.
st_mode 100644 (this mode indicates a regular file) S_IFSOCK 0140000 socket S_IFLNK 0120000 symbolic link S_IFREG 0100000 regular file S_IFBLK 0060000 block device S_IFDIR 0040000 directory S_IFCHR 0020000 character device S_IFIFO 0010000 fifo S_ISUID 0004000 set UID bit S_ISGID 0002000 set GID bit S_ISVTX 0001000 sticky bit S_IRWXU 0000700 user (file owner) has read, write, execute S_IRUSR 0000400 user has read permission S_IWUSR 0000200 user has write permission S_IXUSR 0000100 user has execute permission S_IRWXG 0000070 group has read, write and execute permission S_IRGRP 0000040 group has read permission S_IWGRP 0000020 group has write permission S_IXGRP 0000010 group has execute permission S_IRWXO 0000007 others have read, write and execute permission S_IROTH 0000004 others have read permission S_IWOTH 0000002 others have write permisson S_IXOTH 0000001 others have execute permission
Usage:
catalog path/filetype -[options]/P>
At least 1 initial file or path is recommended but not necessary.
For additional paths or filetypes use -p and/or -f options. If only a file name used, current default path is used, and recursed from there.
If more than one path is required to be checked, the -p option is the only way to do it.
-d + delimiter Insert a delimiter between fields. The delimiter should be a single character. If it is not a printable ASCII character, you can enter a numeric ascii value of the character. If you want to enter the pipe symbol ( | ) you might have to enclose it in quotes or enter the decimal value 124. ( -d 124, or -d “|” ) (NOTE: Windows help does not display the surrounding quotes, but they are there.)
-f + filetype Additional filetypes to search for separated by spaces. This option allows for more than one file type to be searched for during the same run. (ex., \*.c \*.bat ). If only a path is provided on the command line, then file type defaults to \*.
Note: Because most users have global wildcard expansion set in their shell, if you want to search for wildcard type filenames, you must do one of the following when entering wildcards on the command line:
Either escape the wildcard with a backslash, ( \*) or quote the entire file type ( “*.c” ). Otherwise the shell will attempt to expand the wildcard, and you may not find all you are looking for. No -f option will result in all files (*) being identified.
-g + #
-l + # Thats an (ell) not a (one). Linux only.
Use these options to locate files only (g)reater than or (l)ess than a specific
size in bytes. Replace the # with a value. Currently a max value of 4 gig
file size is allowed.
-G + #
-L + # Thats an (ELL) not a one). Linux only.
Use these options to locate files greater than or equal to, or less than or
equal to # days old. Replace the # with a value. Currently the calculation
is done on a 24 hour calculations based on current system time. So if current
system time is 1300 hours, and a file was made yesterday at 1200 hours, it
would be 2 days old rather than 1 day old. This is because yesterday at 1300
hours would be 24 hours prior to current time, and yesterday at 1200 hours
would be 25 hours prior; this equates to 2 days.
The calculations are defaulted to the time listed by the ls command (which is modification time). To get another listing, use the -t options.
-N Print ONLY the full path and filename to the output record. This is an ideal option for obtaining only a list of files. No other information is printed.
-n In the output file, print only filename in the record and not the entire path. The other information, (exs., hash, date, time, etc.) is also printed. This is different from the -N in that -N only prints filename and path; it gives no other information.
-o + output Filename for output. Output can be redirected using > output. If redirection (>) is used, then this option is unnecessary.
-p + path(s) Additional paths to search. Can include multiple paths separated by spaces. (ex., -p /work /bin /etc). This option allows for searching more than one path at a time.
-r DO NOT recurse through path provided, default is to recurse. Use this option to do a single directory.
-v Silent run. NO VERBOSE. Do not print normal column headings above numbers. This provides cleaner screen output for redirection to a file. This can also be accomplished by settting an environment variable called silent to ON. (set SILENT=ON). The SILENT environment variable is used by Crckit also.
-t[acm] Print time as [a]ccess time, [c]hange time, [m]odification time. Linux uses a funny way of representing the ‘c’hange and ‘m’odification times. The ‘c’hange is listed as status change times. (ls -cl) The ‘m’odification should show last write time. The modification is the default time listed by ls -l. And ls -ul gets the last ‘u’pdate or access time.
-z Display time using (ZULU) GMT time format. This is useful for keeping file times consistent. (Be certain that CMOS, TZ, and time zone settings are correct).
-w + # Limit filename length to # characters. If the full path is being used, the default is 80 characters of path+filename. If the -n option is used, the default filename printed is 15 characters (without the path). If the filename including path is more than 80 characters, the path is truncated at the front. (Notice that the -W and -w upper and lowercase will produce slightly different outputs. Experiment with those to find what suits your purposes best).
$ catalog / -o outputfilename
Do catalog of files for entire drive.
$ catalog /work
Do catalog of files in /work.
$ catalog /work -r
Do /work path without recursion.
$ catalog /work/\*.c
Do /work path with for all *.c files (add -r for no recursion).
$ catalog /work -n
Do /work printing only 15 characters of filename.
$ catalog /work -w 30
Do /work printing 30 characters of filename.
Since this documentation was cloned from another program running under DOS,
you may occasionally find a reference that looks like a DOS path or drive
name. However, you must be certain, when using Catalog, to use the correct
slash on the Linux system to delineate paths.
Related Programs