Variations in human DNA, most frequently single-nucleotide polymorphisms (SNPs), can have functional consequences ranging from severe to none. Variations in outcome (phenotype) can be compared, from cystic fibrosis through haemochromatosis to general familial risks in, for example, colo-rectal cancer (CRC). Cystic fibrosis and haemochromatosis have severe phenotypes with high penetrance, with signs and symptoms always or mostly present; thus, they have been easy to identify from family studies. However, the familial risks that are known to contribute markedly to CRC are unknown. The sequencing of the human genome has now made possible the identification of these and other disease variants. Knowing the DNA sequence in an idealised individual adds little unless variants that increase (or decrease) disease risk from the norm can be identified. Such variants can be expected to be very common in the general population, but have low penetrance and only change risk to a limited extent. Many patients will not have the risk variant and many ‘normal’ patients will have the risk variant. Thus, very large case–control cohorts are essential. These case–control cohorts can be analysed at three different levels: (1) individual SNPs; (2) individual genes; (3) genome-wide analysis (GWA). Level 1 looks for case–control differences for specific SNPs. Alternatively, new technology can be applied to examine a range of SNPs within a gene to track differences in its regulation as well as in function. Finally, the whole genome with ≥0·5×106 SNPs could be marked. The first two approaches involve selecting ‘candidate’ SNPs or genes, while GWA looks for any variation in the genome that is enriched in the cases. All three approaches carry the certainty that significant associations will be found by statistical chance, for which correction must be made. This latter issue is helped by large numbers and by independent replication cohorts.