=pod Code submission from: Paul Schaap (schaappa@ozemail.com.au) busFinder 1.0.0 Release Notes WHY === WARNING the company dictionary "Company" is biased for Australia, Sorry. Somebody volunteer to modify for other countries ? Well I have often found myself wading through alot of data at work that is of dubious quality. Its really meant to be just people but Business data just sneaks in, so in my spare time I researched and wrote this little beauty to locate business names QUICKLY. Basically the Company file (Which MUST be sorted) is loaded into an array and we jump around the array mathematically for a quick search, easy-not. INSTALLATION ============ ï Install MacPerl 5.1.4r4 or greater.(Thanx Matthias) ï Get the text editor Alpha, it is awesome and very fast; you will not regret it. Once again strictly not required. USAGE ===== 1) Drop a file with a file with names in it onto 'businessFinder Droplet'. 2) businessFinder presumes a pipe "|" delimited file with the name field as the first field. If you want to use a different delimiter and/or field alter the $fieldDelimiter and $fieldPosition values (Remember perl counts from zero). CHANGES ======= V1.0.0 ï Well none really, first cut. CHEERS schaapp@ozemail.com.au - I Only read email once a fortnight, sorry. =cut #!/usr/bin/perl ##################################################################### # PROGRAM : busFinder.pl # BY : Paul Schaap # DATED : 05/01/1998 # PURPOSE : To find business names ##################################################################### print "STARTING busFinder.pl\n"; $start_time = time; $fieldDelimiter = '\174'; $fieldPosition = 0; # Load Company Words #==================== $c = 0; open(COMPANY,"company") or die "Cannot open company !"; while(){ chop; $C[$c]=$_; $c++; } $routime_time = $start_time; $current_time = time; $time_elapse = $current_time - $routime_time; $routime_time = $current_time; print "$c rows into company array ($time_elapse elapsed)\n"; # SEARCH STRINGS @path = split(/:/,$ARGV[0]); $fileName = $path[@path-1]; $inputFile = $ARGV[0]; $businessFile = $inputFile."Business"; $personalFile = $inputFile."Personal"; print "Processing '$fileName'\n"; open(BUSINESS,">$businessFile") or die "Cannot open $businessFile\n"; open(PERSONAL,">$personalFile") or die "Cannot open $personalFile\n"; while(<>){ $input++; if($input % 1000 == 0){ print "$input rows processed.\n"; } @IN = split(/$fieldDelimiter/); $row = $IN[$fieldPosition]; chomp $row; $row =~ tr/a-z/A-Z/; $result=-1; SWITCH: { # DO COMPANY SEARCHING #====================== @NAME = split(/\s/,$row); foreach $d (@NAME){ $result = &busfinder($d); if($result != -1){ last SWITCH; } } } if($result != -1){ $business++; print BUSINESS $_; } else { $personal++; print PERSONAL $_; } } close BUSINESS; MacPerl::SetFileInfo("ALFA", "TEXT", $businessFile); close PERSONAL; MacPerl::SetFileInfo("ALFA", "TEXT", $personalFile); print "Business Names - $business\n"; print "Personal Names - $personal\n"; print time - $start_time, " elapsed\n"; # BUSFINDER ROUTINE #=================== sub busfinder { $low = 0; $size = @C; $high = $size - 1; while($low <= $high){ $mid = int(($low + $high) / 2); if($C[$mid] eq $_[0]){ return $mid; } else { if($C[$mid] gt $_[0]){ $high = $mid - 1; } else { $low = $mid + 1; } } } return -1; } exit; __END__ # END