The LOLITA natural language processor is an example of one of the
ever-increasing number
of large-scale systems written entirely in a functional programming language.
The system
consists of over 47,000 lines of Haskell code (excluding comments) and
is able to perform a
wide range of tasks such as semantic and pragmatic analysis of text, information
extraction
and query analysis. The efficiency of such a system is critical;
interactive tasks (such as query
analysis) must ensure that the user is not inconvenienced by long pauses,
and batch mode tasks
(such as information extraction) must ensure that an adequate throughput
can be achieved.
For the past three years the profiling tools supplied with GHC and HBC
have been used
to analyse and reason about the complexity of the LOLITA system. There
have been good
results, however experience has shown that in a large system the profiling
life-cycle is often
too long to make detailed analysis possible, and the results are often
misleading. In response
to these problems a profiler has been developed which allows the complete
set of program
costs to be recorded in so-called cost-centre stacks. These program costs
are then analysed
using a post-processing tool to allow the developer to explore the costs
of the program in
ways that are either not possible with existing tools or would require
repeated compilations
and executions of the program. The modifications to the Glasgow Haskell
compiler based
on detailed cost semantics and an efficient implementation scheme are discussed.
The results
of using this new profiling tool in the analysis of a number of Haskell
programs are also
presented. The overheads of the scheme are discussed and the benefits of
this new system are
considered. An outline is also given of how this approach can be modified
to assist with the
tracing and debugging of programs.