MacPerl used here
    

Perl, MacPerl, and MacJPerl

by Rich Morin and Andreas Marcel Riechert

HyperLib (Autumn 1998)


Perl is a powerful and increasingly popular programming language, used for CGI (Common Gateway Interface) scripts, scientific data analysis, system and network administration, and more. Perl has powerful data structures, dynamic memory allocation, regular expression handling, and string manipulation features. These allow Perl programmers to solve substantial problems with amazingly small amounts of code.

Although Perl was developed in the Unix community and is very popular there, Perl is available on many operating systems. In particular, it is available for Mac OS, under both the Finder and MPW (the Macintosh Programmer's Workshop).

Perl is available for essentially all versions of Unix and is distributed as a standard item on many. In particular, most open-source operating system distributions (e.g., FreeBSD and Linux) include a copy of Perl.

Because Perl is also available under MkLinux, it can be used as a bridge between the disparate MkLinux and Mac OS environments. You can even do Japanese text processing, as discussed below, in both environments.

Perl

Perl combines syntax and capabilities from a variety of sources. Its syntax is loosely derived from C, with additions from awk, sed, shell, and even BASIC-PLUS. Nonetheless, the language is consistent and regular.

Unlike some other modern programming languages, Perl does not attempt to be "orthogonal". That is, it does not provide only one tool for each given task. Rather, it provides may ways to do things, assuming that the programmer will select the desired tool.

This approach is very much in the spirit of Unix. Perl's motto, "There's More Than One Way To Do It" (TMTOWTDI), could easily have been coined to describe the hundreds of commands Unix systems provide.

This variety of commands can be a bit intimidating at first, but it soon becomes very comfortable. And, once a programmer gets used to having lots of nice tools, there is no turning back to less-powerful environments.

MacPerl

MacPerl is a very complete version of Perl 5, extended and optimized for the Mac OS environment. In particular, MacPerl provides support for Apple Events, AppleScript, AppleTalk, and more than a thousand Macintosh Toolbox calls.

MacPerl runs under the Mac OS Finder, as well as MPW (the Macintosh Programmer's Workshop). It is quite possible to write MacPerl scripts that run under standard Perl. Thus, you can write and test your Perl CGI scripts on your Macintosh, then move them to a MkLinux system for production use.

MacPerl has an integrated development environment, allowing easy access to online documentation and debugging features. It can also be used with the Alpha and BBEdit editors, yielding a very powerful and convenient programming environment.

Although MacPerl is not as popular as standard Perl, it is rapidly gaining attention. Apple's "Carbon Dater", used to check compatibility of applications with the new Carbon API, is written in MacPerl. Netscape has moved its "Mozilla" build environment from AppleScript to MacPerl.

Also, with the publication of "MacPerl: Power and Ease", MacPerl now has an introductory/reference book of its own. The book was written with the help and encouragement of MacPerl's active email list. Because it does not assume that the reader already knows how to program, it should encourage adventurous Macintosh users to try using MacPerl.

Japanese Text Processing

There are two plausible ways to do Japanese text processing in Perl. Standard Perl has no special support for Japanese, but it is quite capable of processing and displaying Japanese. JPerl (by Watanabe Hirufumi) has explicit support for Japanese, making many operations easier to perform.

Although JPerl would appear to be the obvious choice, it is not. JPerl has some real problems in compatibility and documentation; MacJPerl (MacPerl with the JPerl patch) has even more.

JPerl handles some things (e.g., regular expressions) quite differently than standard Perl does. So, JPerl code will often be incompatible with standard Perl. MacPerl also has a few differences from standard Perl and MacJPerl does not implement all of JPerl. On occasion, these differences may interact, yielding unexpected results.

And, although Perl, JPerl, and MacPerl are all well documented, MacJPerl (the combination) is not. In short, you might well encounter areas where MacJPerl differs in unexpected (and undocumented) ways from JPerl and/or MacPerl.

In addition, MacJPerl is still based on an old MacPerl release (5.1.4r4; November 1997). Using MacJPerl is thus likely to mean that you will miss bug-fixes and that newly released or updated modules won't work.

MacPerl, in contrast, is well supported and well documented. It retains compatibility with "standard" Perl and stays quite current with the Perl community. It has a very active user community and a history of regular updates and bug fixes.

It is much more difficult for a beginner to process Japanese on the standard ports of Perl than on the JPerl ports. If you choose this harder way, however, you will get portable scripts which will run on all systems, whether the JPerl patch is installed or not.

MacJPerl Advantages

MacJperl has various advantages over MacPerl, many of which will be particularly useful to beginners. For instance, it supports Japanese extensions to regular expressions, chop, split, tr, and format.

Perl's character classes only support one-byte characters, not two-byte characters as used in Kanji and Kana. Therefore

  m/[KaKb]KcKd/      # Match KaKcKd or KbKcKd

 
or
  m/[Ha-Hn]+/        # Match one or more Hiragana characters

 

will only match in the intended way using JPerl. The tr (character translation) function has the same problem. A MacJPerl command to translate all Hiragana to Katakana would look like

  tr/Ha-Hn/Ka-Kn/

  

Note: In the examples above, Hx and Kx are shown as placeholders for the actual (two-byte) Hiragana and Katakana characters.

In standard Perl, it is necessary to write a fairly complicated function to achieve the same result.

MacJPerl's chop function removes the last character in a string, rather than the last byte. So, one-byte ASCII characters and two-byte Japanese characters are both handled correctly. The split and format functions also honor two-byte characters.

MacJPerl also allows you to use Kanji without danger in double-quoted strings. (The two-byte values of some SJIS (Shift-JIS)-encoded Kanji characters include Perl metacharacters like "@". A single, seemingly harmless character could thus blow up your whole script in standard Perl.)

In short, we recommend MacJPerl to Perl beginners and to programmers who are unfamiliar with regular expressions. MacJPerl's handling of Japanese characters will serve them well. But, because of the disadvantages described above, experienced users may well wish to move on to standard Perl.

Jedit

The MacPerl application has a build-in text editor which works well with Japanese fonts. (Just use the "Format" menu to define Osaka or some other Japanese font as your default font.)

Unfortunately, the built-in editor has a 32 KB limit and isn't that comfortable as a programming editor. Because BBEdit and Alpha don't support Japanese characters, we recommend Jedit for editing Japanese-related Perl code.

It is easy to define Jedit as the default editor for MacPerl. Register Jedit as the editor helper in the InternetConfig application which comes with the standard distribution of MacPerl. Jedit will then show up in MacPerl's menu bar.

Jedit is a powerful editor with many nice features. It has a grep-style (regular-expression-based) search engine, a Macro-Menu for Apple Scripts, and diverse character conversion facilities.

Jedit is helpful in porting between Macintosh and Unix scripts The Macintosh uses CR to separate lines of text and the SJIS-code to process Japanese; Unix machines use LF and EUC, respectively. Jedit allows you to save and open files in both ways.

The ZENKAKU (two-byte) space to HANKAKU (one-byte) space conversion routine is also helpful. One unquoted ZENKAKU space -- which is impossible to detect with your eyes -- will result in the error message "# Unrecognized character \201." and prevent your script from running.

Jedit doesn't have the same support for MacPerl as BBEdit or Alpha have, but the AppleScript Macro menu makes it easy to extend Jedit to fully support MacPerl.

  tell application "Jedit2"
    activate
    tell document 1
      set myKeyword to selection
    end tell
  end tell

  tell application "Shuck" to Show myKeyword

This little script, saved in Jedit's "Macro Menu Items" folder, will allow you to access the online help facility (Shuck) for the selected keyword in Jedit's front window. Extending Jedit to fully support MacPerl could be your chance to become a member of Jedit's "Hall of Fame".

It is also possible to extend Jedit with the MacPerl Word Services Extension (Michael Schuerig, www.uni-bonn.de/~uzs90z/). The following script will allow an item in Jedit's Tool-menu to translate letters between lower case and upper case. (Save the script as a "Word Services Server" in MacPerl and register it in Jedit's Tool-Menu as a Word Service checker.)

  while (<STDIN>) {
    tr/a-z/A-Z/;
    print STDOUT;
  }

Resources

For general information on Perl, start with www.perl.com and the books published by O'Reilly and Associates (www.oreilly.com).

MacPerl has a centralized resource in the MacPerl Pages (www.ptf.com/macperl). The Pages include pointers to information, the email list, and an online copy of "MacPerl: Power and Ease", an introductory/reference book on MacPerl.

The book itself is available from Prime Time Freeware (www.ptf.com) and many professional and technical bookstores:

MacPerl: Power and Ease
Vicki Brown and Chris Nandor
Prime Time Freeware, 1998
ISBN 1-881957-32-2
400 pp., HFS CD-ROM; $40 MSRP

Ken Lunde's "Perl & Multiple-byte Characters" (ftp://ftp.oreilly.com/pub/examples/nutshell/ujip/perl/perl97.pdf) explains the most common issues for using Japanese together with standard Perl. In the same directory, you will find a lot of ready-to-use scripts for Japanese. Look in ftp://ftp.iij.ad.jp/pub/IIJ/dist/utashiro/perl/" for other useful scripts and packages, including Utashiro Kazumasa's jcode.pl.

The Kconv.pm module allows you to convert between JIS, SJIS, and EUC-JP encodings. Because it is written in C (actually, XS code), it is much faster than other algorithmic code conversion routines written in Perl. See cybaba.kek.jp/~yosimoto/MacPerl/ for a port which you can use with MacPerl or MacJPerl.