A fleet of aircraft can be seen as a set of degrading systems that undergo variable loads as they fly missions and require maintenance throughout their lifetime. Optimal fleet management aims to maximise fleet availability while minimising overall maintenance costs. To achieve this goal, individual aircraft, with variable age and degradation paths, need to operate cooperatively to maintain high fleet availability while avoiding mechanical failure by scheduling preventive maintenance actions. In recent years, reinforcement learning (RL) has emerged as an effective method to optimise complex sequential decision-making problems. In this paper, an RL framework to optimise the operation and maintenance of a fleet of aircraft is developed. Three cases studies, with varying number of aircraft in the fleet, are used to demonstrate the ability of the RL policies to outperform traditional operation/maintenance strategies. As more aircraft are added to the fleet, the combinatorial explosion of the number of possible actions is identified as a main computational limitation. We conclude that the RL policy has potential to support fleet management operators and call for greater research on the application of multi-agent RL for fleet availability optimisation.