NetBSD/gnu/dist/gawk/doc/awkforai.txt

Draft for ACM SIGPLAN Patterns (Language Trends)

1996

Why GAWK for AI?

Ronald P. Loui

Most people are surprised when I tell them what language we use in our
undergraduate AI programming class.  That's understandable.  We use
GAWK.  GAWK, Gnu's version of Aho, Weinberger, and Kernighan's old
pattern scanning language isn't even viewed as a programming language by
most people.  Like PERL and TCL, most prefer to view it as a "scripting
language."  It has no objects; it is not functional; it does no built-in
logic programming.  Their surprise turns to puzzlement when I confide
that (a) while the students are allowed to use any language they want;
(b) with a single exception, the best work consistently results from
those working in GAWK.  (footnote:  The exception was a PASCAL
programmer who is now an NSF graduate fellow getting a Ph.D. in
mathematics at Harvard.) Programmers in C, C++, and LISP haven't even
been close (we have not seen work in PROLOG or JAVA).

Why GAWK?

There are some quick answers that have to do with the pragmatics of
undergraduate programming.  Then there are more instructive answers that
might be valuable to those who debate programming paradigms or to those
who study the history of AI languages.  And there are some deep
philosophical answers that expose the nature of reasoning and symbolic
AI.  I think the answers, especially the last ones, can be even more
surprising than the observed effectiveness of GAWK for AI.

First it must be confessed that PERL programmers can cobble together AI
projects well, too.  Most of GAWK's attractiveness is reproduced in
PERL, and the success of PERL forebodes some of the success of GAWK.
Both are powerful string-processing languages that allow the programmer
to exploit many of the features of a UNIX environment.  Both provide
powerful constructions for manipulating a wide variety of data in
reasonably efficient ways.  Both are interpreted, which can reduce
development time.  Both have short learning curves.  The GAWK manual can
be consumed in a single lab session and the language can be mastered by
the next morning by the average student.  GAWK's automatic
initialization, implicit coercion, I/O support and lack of pointers
forgive many of the mistakes that young programmers are likely to make.
Those who have seen C but not mastered it are happy to see that GAWK
retains some of the same sensibilities while adding what must be
regarded as spoonsful of syntactic sugar.  Some will argue that
PERL has superior functionality, but for quick AI applications, the
additional functionality is rarely missed.  In fact, PERL's terse syntax
is not friendly when regular expressions begin to proliferate and
strings contain fragments of HTML, WWW addresses, or shell commands.
PERL provides new ways of doing things, but not necessarily ways of
doing new things.

In the end, despite minor difference, both PERL and GAWK minimize
programmer time.  Neither really provides the programmer the setting in
which to worry about minimizing run-time.

There are further simple answers.  Probably the best is the fact that
increasingly, undergraduate AI programming is involving the Web.  Oren
Etzioni (University of Washington, Seattle) has for a while been arguing
that the "softbot" is replacing the mechanical engineers' robot as the
most glamorous AI testbed.  If the artifact whose behavior needs to be
controlled in an intelligent way is the software agent, then a language
that is well-suited to controlling the software environment is the
appropriate language.  That would imply a scripting language.  If the
robot is KAREL, then the right language is "turn left; turn right." If
the robot is Netscape, then the right language is something that can
generate "netscape -remote 'openURL(http://cs.wustl.edu/~loui)'" with
elan.

Of course, there are deeper answers.  Jon Bentley found two pearls in
GAWK:  its regular expressions and its associative arrays.  GAWK asks
the programmer to use the file system for data organization and the
operating system for debugging tools and subroutine libraries.  There is
no issue of user-interface.  This forces the programmer to return to the
question of what the program does, not how it looks.  There is no time
spent programming a binsort when the data can be shipped to /bin/sort
in no time.  (footnote:  I am reminded of my IBM colleague Ben Grosof's
advice for Palo Alto:  Don't worry about whether it's highway 101 or 280.
Don't worry if you have to head south for an entrance to go north.  Just
get on the highway as quickly as possible.)

There are some similarities between GAWK and LISP that are illuminating.
Both provided a powerful uniform data structure (the associative array
implemented as a hash table for GAWK and the S-expression, or list of
lists, for LISP).  Both were well-supported in their environments (GAWK
being a child of UNIX, and LISP being the heart of lisp machines).  Both
have trivial syntax and find their power in the programmer's willingness
to use the simple blocks to build a complex approach.

Deeper still, is the nature of AI programming.  AI is about
functionality and exploratory programming.  It is about bottom-up design
and the building of ambitions as greater behaviors can be demonstrated.
Woe be to the top-down AI programmer who finds that the bottom-level
refinements, "this subroutine parses the sentence," cannot actually be
implemented.  Woe be to the programmer who perfects the data structures
for that heapsort when the whole approach to the high-level problem
needs to be rethought, and the code is sent to the junkheap the next day.

AI programming requires high-level thinking.  There have always been a few
gifted programmers who can write high-level programs in assembly language.
Most however need the ambient abstraction to have a higher floor.

Now for the surprising philosophical answers.  First, AI has discovered
that brute-force combinatorics, as an approach to generating intelligent
behavior, does not often provide the solution.  Chess, neural nets, and
genetic programming show the limits of brute computation.  The
alternative is clever program organization.  (footnote: One might add
that the former are the AI approaches that work, but that is easily
dismissed:  those are the AI approaches that work in general, precisely
because cleverness is problem-specific.)  So AI programmers always want
to maximize the content of their program, not optimize the efficiency
of an approach.  They want minds, not insects.  Instead of enumerating
large search spaces, they define ways of reducing search, ways of
bringing different knowledge to the task.  A language that maximizes
what the programmer can attempt rather than one that provides tremendous
control over how to attempt it, will be the AI choice in the end.

Second, inference is merely the expansion of notation.  No matter whether
the logic that underlies an AI program is fuzzy, probabilistic, deontic,
defeasible, or deductive, the logic merely defines how strings can be
transformed into other strings.  A language that provides the best
support for string processing in the end provides the best support for
logic, for the exploration of various logics, and for most forms of
symbolic processing that AI might choose to call "reasoning" instead of
"logic."  The implication is that PROLOG, which saves the AI programmer
from having to write a unifier, saves perhaps two dozen lines of GAWK
code at the expense of strongly biasing the logic and representational
expressiveness of any approach.

I view these last two points as news not only to the programming language
community, but also to much of the AI community that has not reflected on
the past decade's lessons.

In the puny language, GAWK, which Aho, Weinberger, and Kernighan thought
not much more important than grep or sed, I find lessons in AI's trends,
AI's history, and the foundations of AI.  What I have found not only
surprising but also hopeful, is that when I have approached the AI
people who still enjoy programming, some of them are not the least bit
surprised.


R. Loui (loui@ai.wustl.edu) is Associate Professor of Computer Science,
at Washington University in St. Louis.  He has published in AI Journal,
Computational Intelligence, ACM SIGART, AI Magazine, AI and Law, the ACM
Computing Surveys Symposium on AI, Cognitive Science, Minds and
Machines, Journal of Philosophy, and is on this year's program
committees for AAAI (National AI conference) and KR (Knowledge
Representation and Reasoning).