Four observers were trained in lameness assessment using a subjective scoring system with five categories, and observer agreement was investigated four times at different stages of training and experience. Inter-observer reliability increased with time and reached acceptable levels in the last session. Retrospectively simplified versions of the scoring system were satisfactorily reliable already at a fairly low training level. For experienced raters, the original scoring system with five categories is suitable in terms of reliability for on-farm welfare assessment.