There exists a large body of Artificial Intelligence (AI) research on generating plans, i.e. linear or non-linear sequences of actions, to transform an initial world state to some desired goal state. However, much of the planning research to date has been complicated, ill-understood, and unclear. Only a few of the developers of these planners have provided a thorough description of their research products, and those descriptions that exist are usually unrealistically favorable since the range of applications for which the systems are tested is limited to those for which they were developed. As a result, it is difficult to evaluate these planners and to choose the best planner for a different domain. To make a planner applicable to different planning problems, it should be domain independent. However, one needs to know the circumstances under which a general planner works so that one can determine its suitability for a specific domain.
This paper presents criteria for evaluating AI planners; these criteria fall into three categories: (1) performance issues, (2) representational issues, and (3) communication issues. This paper also assesses four non-linear AI planners (NOAH, NONLIN, SIPE and TWEAK) based on a study of the published literature and on communication (via electronic mail, meetings and correspondence) with their developers.