1 Introduction
Consider two scenarios:
-
1. A consultant is asked to advise on a country’s choice of electoral system, and specifically a proposal for a legislature with 125 members ( $S=125$ ) and a mean district magnitude of 5 ( $M=5$ ). She knows that such a system is likely to feature $MS^{1/4} = 5$ seat-winning parties and that the seat share of the largest party, $s_1$ , is $MS^{-1/8} \approx 45\%$ (Shugart and Taagepera Reference Shugart and Taagepera2017, 149), but legislators want to know the probability that a single party will have a majority.
-
2. A political scientist wishes to test a generative model of coalition formation (Golder, Golder, and Siegel Reference Golder, Golder and Siegel2012). She wishes to compare predicted coalition outcomes to observed outcomes in party systems with different numbers of seat-winning parties $N_{S0}$ —but must first simulate distributions of seat shares.
In both scenarios, researchers lack ways of simulating realistic outcomes for party systems of different sizes. Although it is possible to perturb existing outcomes (Laver and Benoit Reference Laver and Benoit2015, 282–3), this approach cannot handle situations where researchers need to simulate party systems ex nihilo.
Here, we show how realistic party systems of a given size can be simulated using ordered and unordered Dirichlet distributions. The mean vector of these Dirichlet distributions is given by a formula for seat shares proposed by Taagepera and Allik (Reference Taagepera and Allik2006) which relies only on two parameters (party rank and the number of seat winning parties $N_{S0}$ ). We show that the fit of these simulations to real-world data is almost as good as a saturated model where the seat share of the ith ranked party in a system of size $N_{S0}$ is given by the empirical mean for parties of that rank in systems of that size.
2 Theory
To the best of our knowledge, the only attempt to predict the size of vote- or seat-winning parties in a party system of size $N_{0}$ comes from Taagepera and Allik (Reference Taagepera and Allik2006), who suggest that the seat (vote) share of the largest party is equal to the geometric mean of two logical extremes: one extreme where the largest party wins $[100 - \epsilon ]$ % of seats (votes), where $\epsilon $ is some tiny share divided between the remaining parties, and another extreme where the largest party wins $[\frac {100}{N_{0}} + \epsilon ]$ % of seats (votes), and is only fractionally larger than the remaining parties. By repeatedly appealing to the geometric mean of logical extremes, they construct the following formula for the seat (vote) share of the ith largest party:
Thus, for a system with five parties, the seat share of the first party is 45% (one divided by the square root of 5), the seat share of the second party is 27% (the remaining 55% divided by the square root of 4), and so on.
Taagepera and Allik (Reference Taagepera and Allik2006) also propose a second, “politically adjusted” model, which is like the model above, except that the seat share of “small parties” is half what it would be under the probabilistic model, with this surplus distributed between larger parties. Small parties are defined as parties whose rank is greater than $1/s_1$ . Thus,
The predictions of these two models are compared visually to binned averages of election results for the nth largest parties taken from Mackie and Rose (Reference Mackie and Rose1997). The authors conclude that the politically adjusted model fits the data better. Whether this conclusion is sound or not, these models remain deterministic. As such, they make it difficult to answer questions of the form, “what is the probability that a party system with five seat-winning parties will have a single party majority,” even if we know our best guess as to the seat share of the largest party remains 45%.
3 Methods
Modeling party systems is difficult because seat and vote shares are ordered compositional data. They are ordered data because, since different parties compete in different countries at different times, we typically lack any way of referring to parties except by their rank within the system, and so we refer to the seat share of the first-largest party, the seat share of the second-largest party, and so on. They are compositional data because both seat and vote shares add up to one. Compositional data can be modeled by transforming d-dimensional compositions into a $(d-1)$ -dimensional data through appropriate transforms (Aitchison Reference Aitchison1986), or by using probability distributions defined on the simplex. The Dirichlet distribution is the most common such distribution.
A Dirichlet distribution is typically governed by a vector of nonnegative concentration parameters $\theta $ . These parameters hold two different pieces of information. First, their relative magnitude determines the location of each element of the probability distribution. For instance, both three-dimensional simplexes $\mathbf {s}_A \sim \text {Dir}(\theta _A)$ with $\theta _A = \begin {bmatrix} 15 & 7.5 & 2.5 \end {bmatrix}^{\prime }$ and $\mathbf {s}_B \sim \text {Dir}(\theta _B)$ with $\theta _B = \begin {bmatrix} 0.30 & 0.15 & 0.05 \end {bmatrix}^{\prime }$ yield expected values of $\mathbb {E}[\mathbf {s}_A] = \mathbb {E}[\mathbf {s}_B] = \begin {bmatrix} 0.6 & 0.3 & 0.1 \end {bmatrix}^{\prime }$ . Second, the absolute magnitude determines the scale of the corresponding distributions: while both simplexes $\mathbf {s}_A$ and $\mathbf {s}_B$ have identical expected values, the low values of $\theta _B$ result in high dispersion, and high density near the extremes of 0 and 1 for the elements of $\mathbf {s}_B$ . In contrast, the high values of $\theta _A$ result in high concentration such that there is high density around the expected values of the elements of $\mathbf {s}_A$ (see Figure 1).
We therefore characterize the Dirichlet distribution in terms of a scalar concentration parameter $\alpha $ and a location vector of probabilities $\mathbf {p} = (p_1, p_2, \ldots {}, p_{N_{0}})$ , $\sum \mathbf {p} = 1$ :
Parameterizing $\theta $ in terms of a product of a general concentration parameter $\alpha $ and a location vector $\mathbf {p}$ has attractive properties. It allows us to use past work which has formulated (deterministic) expectations regarding party seat (vote) shares $\mathbf {p}$ , while quantifying the dispersion around those expectations through the concentration parameter $\alpha $ , which can be estimated from real-world data.
We have described seat and vote share data as ordered data, but draws from Dirichlet distributions described by Equation (3) need not be ordered. Although Equation (1) gives us an ordered location vector ( $\mathbf {p}$ ), whether or not draws from this distribution will be ordered will depend on the concentration parameter $\alpha $ . If $\alpha $ is very large, draws from the distribution will more closely approximate the ordered location vector, and will in turn be more likely to be ordered. If $\alpha $ is small, as in our discussion above, values of all components will be more highly dispersed, and it becomes less likely that the resulting draws from a Dirichlet distribution $Dir(\alpha \mathbf {p})$ will be ordered.
It is possible to guarantee an ordered draw by using an ordered Dirichlet distribution (van Dorp and Mazzuchi Reference van Dorp and Mazzuchi2004):
where $\mathbf {p^{\star {}}}$ is an increasing ordered vector with length $N_{0} + 1$ , and values equal to the differences between successive values of $[0, \mathbf {p}, 1]$ . If our value of $\mathbf {p}$ for the five-party case is [0.03, 0.08, 0.16, 0.28, 0.45], then our value of $\mathbf {p^{\star {}}}$ is [0.03, 0.05, 0.08, 0.12, 0.17, 0.55]. Phrased slightly differently, the ordered Dirichlet is the result of generating Dirichlet-distributed differences between party shares and taking the cumulative sum. The parameter $\alpha $ acts as a concentration parameter, and can be interpreted in the same way as in the standard Dirichlet distribution.
The ordered Dirichlet distribution respects the ordered property of the data, but poses practical problems. First, the ordered Dirichlet distribution requires shares to be strictly, not weakly ordered. While vote shares in national elections are almost always strictly ordered, seat-winning parties sometimes win exactly the same number of seats. We deal with this by adding or subtracting negligible values from the seat shares of tied parties. Second, using the ordered Dirichlet means, we cannot (directly) use certain useful analytic properties of the standard Dirichlet distribution, such as the expression for the variance of each component $s_i$ : $Var\left [s_i\right ] = \frac {s_i (1 - s_i)}{1 + \alpha }$ (Aitchison Reference Aitchison1986, 59). This may not be a problem if our sole focus is simulation. We note these problems now, and return to them later when we discuss the performance of our models.
4 Models
We fit Dirichlet and ordered Dirichlet distributions to data drawn from parliamentary elections around the world. We estimate four different models:
-
• The null model: $\mathbf {p}$ is given by the equation $\frac {1}{N_{0}}$ , and $\alpha $ is estimated.Footnote 1
-
• The logical model: $\mathbf {p}$ is given by Equation (1), and $\mathbf {p^{\star {}}}$ by taking differences, with $\alpha $ estimated.
-
• The political model: $\mathbf {p}$ is given by Equation (2), and $\mathbf {p^{\star {}}}$ by taking differences, with $\alpha $ estimated.
-
• The saturated model: $\mathbf {p}$ is estimated for each size of party system ( $N_0 = 2,\ldots ,20$ ); $\alpha $ is estimated.
Note that the null model is the only model which is not estimated using an ordered Dirichlet distribution. In the null model, all components have the same expected value, and so the differences between these components are equal to zero. Because Dirichlet-distributed values must be greater than zero, it is not possible to estimate an ordered Dirichlet version of the null model.
Our focus is understandably on the second and third models. The null and saturated models provide performance benchmarks, but it seems unlikely that the null model will ever capture the patterns in the data. Each model is estimated on vote- and seat-share data, for both the Dirichlet and ordered Dirichlet distributions, for a total of 14 models. We estimate these models in Stan (Stan Development Team 2022); Stan code is given in the Supplementary Material, together with further details on the generation of ordered Dirichlet deviates.
5 Evaluation metrics
We evaluate models using the following metrics:
-
• Root mean squared error (RMSE): root mean squared error is calculated at the election level and then averaged across elections.
-
• Calibration: we calculate, for each election, the proportion of seat (vote) shares which were greater than or equal to the corresponding 5th percentile and less than or equal to the corresponding 95th percentile in the posterior distribution. We then average this across elections. Calibration ranges between 0% and 100%; values closer to 90% indicate a better model.
-
• Proportional error on $N_S$ (or $N_V$ ): we calculate for each simulation the effective number of simulated parties. We then subtract the actual effective number for each election. To draw meaningful comparisons across party systems with different effective numbers, we then divide this difference by the actual effective number. This quantity, expressed in percentages, ranges from $-$ 100 to +100. Values greater than zero indicate the effective number of parties was overestimated; values closer to 0 indicate a better model.
-
• Proportional error on $s_1$ (or $v_1$ ): we take the share of the largest party in each simulation and subtract the actual share for each corresponding election. To enable comparison, we once again divide this difference by the share of the largest party. This quantity ranges from $-$ 100 to +100. Values greater than zero indicate the seat share of the largest party was over-estimated; values closer to 0 indicate a better model.
-
• Proportional error on $s_2$ (or $v_2$ ): as above, but for $s_2$ instead of $s_1$ .
We calculate these quantities because each taps an important aspect of party systems. RMSE is closest to an overall measure of fit to the data. Calibration is important because ours is a probabilistic model, and in order to improve on deterministic models like that proposed by Taagepera and Allik (Reference Taagepera and Allik2006), we need to show that the set of shares to which we assign 90% probability actually occurs 90% of the time. Proportional error on $N_S$ is important because $N_S$ is the key continuous property of party systems, and arguably more important than the discrete measure of party system size $N_{S0}$ . Finally, proportional error on $s_1$ and $s_2$ is necessary to assess the claim that “political adjustments” are necessary to explain whether small parties lose a portion of the seat (or vote) share they would gain under a probabilistic model, and because the share of the largest party is arguably the second most important quantitative feature of a party system (Magyar Reference Magyar2022).
6 Data
We estimate our models using data from ParlGov (Döring and Manow Reference Döring and Manow2021). ParlGov collects comprehensive information on electoral outcomes in a number of parliamentary and semi-presidential regimes. Information is recorded for all elections after 1945 or after full democratization, and for a limited number of countries from 1900. Parties are included if they won more than 1% of the vote or two seats or more. ParlGov covers 813 elections in 37 unique countries, far exceeding Mackie and Rose (Reference Mackie and Rose1997). The raw number of seat- and vote-winning parties ranges from 2 to 20; the modal number of seat-winning parties is 5.
We use ParlGov data because its coverage of seat- and vote-shares in included elections is more complete than any other source we are aware of. ParlGov does, however, have certain limitations. Most notably, it lacks information on seat- and vote-shares in presidential regimes. It also does not cover elections in smaller parliamentary regimes located outside of Europe, such as the Westminster-model democracies in the Caribbean. We claim, however, that it would be unlikely, when conditioning on the number of seat- or vote-winning parties, for these systems to have very different expected seat- or vote-shares $\mathbf {p}$ (Shugart and Taagepera Reference Shugart and Taagepera2017, 187–92), or to alter substantially our parameter estimates for concentration $\alpha $ .Footnote 2
7 Results
Table 1 shows evaluation metrics for models of seat shares. The null model performs poorly, with a large RMSE and an effective number of parties that is 13% too high (i.e., the model predicts more fragmentation than there really is). The logical models provide much better fit, as measured by the RMSE, and calibration that is close to nominal. The (unordered) logical model does give values of $N_{S}$ which are roughly 9% too high. However, this is not due to systematically underestimating the share of the two largest components: our average estimates of $s_1$ and $s_2$ are close to zero, and the 90% credible interval encompasses zero. The political models, which might address the issue of over-estimating $N_S$ , provide a worse fit to the data, as measured by RMSE. The fit of the logical models is impressive, with RMSE within 7% of the value for the saturated model. When comparing between logical models, the ordered Dirichlet ends up giving a less realistic picture of the effective number of parties, and has a worse fit to the data as measured by RMSE. Given the greater ease of use of the unordered Dirichlet distribution, the ordered Dirichlet does not repay its greater complexity.
Table 2 presents the same metrics for vote share. As before, the political models are worse than the logical models, and the logical models are worse than the saturated model only by a small amount. Once again, the fit of the ordered Dirichlet models is inferior to the unordered models. All models save the political models over-estimate the effective number of parties, even the saturated models. Indeed, $N_V$ is very badly under-estimated in the saturated ordered model.
8 Conclusion
Our results show that realistic looking party systems of a given size can be simulated using a (standard, unordered) Dirichlet distribution where mean seat or vote shares are given using Equation (1), and where the concentration parameter is roughly 40 (for seat shares) or 50 (for vote shares). We can achieve similar results using an ordered Dirichlet distribution, but the ordered Dirichlet generally provides a worse fit to real-world data, and we know that the ordered Dirichlet is harder to work with than the standard Dirichlet. For these reasons, we recommend that researchers who are interested in simulating party systems use a standard Dirichlet distribution. Tools to simulate party systems of different sizes can be found in an accompanying R package sharesimulatoR and in an interactive web page.Footnote 3
The ability to simulate party systems allows researchers to answer practical questions (provided, of course, that they know, or have expectations regarding the number of seat- or vote-winning parties). To return to the questions asked in the Introduction: a consultant who knows that the most likely number of seat-winning parties under a proposed system is 5 can use our work to show, through simulation, that the probability of a single-party majority is roughly 1 in 4. Researchers interested in coalition formation can use our work to evaluate the probability that a party system with five, seven, or nine seat-winning parties is an “open” system (per Laver and Benoit (Reference Laver and Benoit2015), one where even the top-two parties do not have a majority). Because the number of seat- and vote-winning parties is strongly determined by the “seat product,” researchers evaluating proposed electoral systems can simulate likely distributions of seat and vote shares given predicted numbers of seat- and vote-winning parties (Shugart and Taagepera Reference Shugart and Taagepera2017, 149). In our view, the questions which we can now answer with this method of simulation (“what is the probability that a single party will have a majority?” and “what is the probability that no two parties will have a majority”) are simple questions which are fundamental to the operation of a party system, and which could not have been satisfactorily answered without the simulation methods given here.
Acknowledgment
The authors thank the anonymous reviewers whose suggestions materially improved the manuscript.
Funding Statement
There are no funding sources to report for this letter.
Conflict of Interest
The authors are not aware of any conflicts of interest.
Data Availability Statement
Replication code for this article is available in Cohen and Hanretty (Reference Cohen and Hanretty2023) at https://doi.org/10.7910/DVN/3WILXI.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2023.13.