1. Introduction
As polarised radiation from distant galaxies makes its way to us, magnetised plasma along the way can cause the polarisation angle to change due to the Faraday effect. The amount of rotation depends on the squared wavelength of the radiation, and the rotation per squared wavelength is called the Faraday depth. Multiple Faraday depths may exist along one line-of-sight, and if a polarised source is observed at multiple wavelengths then these multiple depths can be disentangled. This can provide insight into the polarised structure of the source or the intervening medium.
Faraday rotation measure synthesis (RM synthesis) is a technique for decomposing a spectropolarimetric observation into flux at its Faraday depths $\phi$ , the resulting distribution of depths being called a ‘Faraday dispersion function’ (FDF) or a ‘Faraday spectrum’. It was introduced by Brentjens & de Bruyn (Reference Brentjens and de Bruyn2005) as a way to rapidly and reliably analyse the polarisation structure of complex and high-Faraday depth polarised observations.
A ‘Faraday simple’ observation is one for which there is only one Faraday depth, and in this simple case, the Faraday depth is also known as a ‘rotation measure’ (RM). All Faraday simple observations can be modelled as a polarised source with a thermal plasma of constant electron density and magnetic field (a ‘Faraday screen’; Brentjens & de Bruyn Reference Brentjens and de Bruyn2005; Anderson et al. Reference Anderson, Gaensler, Feain and Franzen2015) between the observer and the source. A ‘Faraday complex’ observation is one which is not Faraday simple, and may differ from a Faraday simple source due to plasma emission or composition of multiple screens (Brentjens & de Bruyn Reference Brentjens and de Bruyn2005). The complexity of a source tells us important details about the polarised structure of the source and along the line-of-sight, such as whether the intervening medium emits polarised radiation, or whether there are turbulent magnetic fields or different electron densities in the neighbourhood. The complexity of nearby sources taken together can tell us about the magneto-ionic structure of the galactic and intergalactic medium between the sources and us as observers. O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) show examples of simple and complex sources, and Figures 1 and 2 show an example of a simulated simple and complex FDF, respectively.
Identifying when an observation is Faraday complex is an important problem in polarised surveys (Sun et al. Reference Sun2015), and with current surveys such as the Polarised Sky Survey of the Universe’s Magnetism (POSSUM) larger than ever before, methods that can quickly characterise Faraday complexity en masse are increasingly useful. Being able to identify which sources are simple lets us produce a reliable rotation measure grid from background sources, and being able to identify which sources might be complex allows us to find sources to follow-up with slower polarisation analysis methods that may require manual oversight, such as QU-fitting (as seen in, e.g. Miyashita et al. Reference Miyashita, Ideguchi, Nakagawa, Akahori and Takahashi2019; O’Sullivan et al. Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017). In this paper, we introduce five simple, interpretable features representing polarised spectra, use these features to train machine learning classifiers to identify Faraday complexity, and demonstrate their effectiveness on real and simulated data. We construct our features by comparing observed polarised sources to idealised polarised sources. The features are intuitive and can be estimated from real FDFs.
Section 2 provides a background to our work, including a summary of prior work and our assumptions on FDFs. Section 3 describes our approach to the Faraday complexity problem. Section 4 explains how we trained and evaluated our method. Finally, Section 5 discusses these results.
2. Faraday complexity
Faraday complexity is an observational property of a source: if multiple Faraday depths are observed within the same apparent source (e.g. due to multiple lines-of-sight being combined within a beam), then the source is complex. A source composed of multiple Faraday screens may produce observations consistent with many models (Sun et al. Reference Sun2015), including simple sources, so there is some overlap between simple and complex sources. Faraday thickness is also a source of Faraday complexity: when the intervening medium between a polarised source and the observer also emits polarised light, the FDF cannot be characterised by a simple Faraday screen. As discussed in section 2.2, we defer Faraday thick sources to future work. In this section, we summarise existing methods of Faraday complexity estimation and explain our assumptions and model of simple and complex polarised FDFs.
2.1. Prior work
There are multiple ways to estimate Faraday complexity, including detecting non-linearity in $\chi(\lambda^2)$ (Goldstein & Reed Reference Goldstein and Reed1984), change in fractional polarisation as a function of frequency (Farnes, Gaensler, & Carretti Reference Farnes, Gaensler and Carretti2014), non-sinusoidal variation in fractional polarisation in Stokes Q and U (O’Sullivan et al. Reference O’Sullivan2012), counting components in the FDF (Law et al. Reference Law2011), minimising the Bayesian information criterion (BIC) over a range of simple and complex models (called ‘QU-fitting’; O’Sullivan et al. Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017), the method of Faraday moments (Anderson et al. Reference Anderson, Gaensler, Feain and Franzen2015; Brown Reference Brown2011), and deep convolutional neural network classifiers (CNNs; Brown et al. Reference Brown2018). See Sun et al. (Reference Sun2015) for a comparison of these methods.
The most common approaches to estimating complexity are QU-fitting (e.g. O’Sullivan et al. Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) and Faraday moments (e.g. Anderson et al. Reference Anderson, Gaensler, Feain and Franzen2015). To our knowledge, there is currently no literature examining the accuracy of QU-fitting when applied to complexity classification specifically, though Miyashita et al. (Reference Miyashita, Ideguchi, Nakagawa, Akahori and Takahashi2019) analyse its effectiveness on identifying the structure of two-component sources. Brown (Reference Brown2011) suggested Faraday moments as a method to identify complexity, a method later used by Farnes et al. (Reference Farnes, Gaensler and Carretti2014) and Anderson et al. (Reference Anderson, Gaensler, Feain and Franzen2015), but again no literature examines the accuracy. CNNs are the current state-of-the-art with an accuracy of 94.9% (Brown et al. Reference Brown2018) on simulated ASKAP Band 1 and 3 data, and we will compare our results to this method.
2.2. Assumptions on Faraday dispersion functions
Before we can classify FDFs as Faraday complex or Faraday simple, we need to define FDFs and any assumptions we make about them. An FDF is a function that maps Faraday depth $\phi$ to complex polarisation. It is the distribution of Faraday depths in an observed polarisation spectrum. For a given observation, we assume there is a true, noise-free FDF F composed of at most two Faraday screens. This accounts for most actual sources (Anderson et al. Reference Anderson, Gaensler, Feain and Franzen2015) and extension to three screens would cover most of the remainder—O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) found that 89% of their sources were best explained by two or less screens, while the remainder were best explained by three screens. We model the screens by Dirac delta distributions:
$A_0$ and $A_1$ are the polarised flux of each Faraday screen, and $\phi_0$ and $\phi_1$ are the Faraday depths of the respective screens. With this model, a Faraday simple source is one which has $A_0 = 0$ , $A_1 = 0$ , or $\phi_0 = \phi_1$ . By using delta distributions to model each screen, we are assuming that there is no internal Faraday dispersion (which is typically associated with diffuse emission rather than the mostly compact sources we expect to find in wide-area polarised surveys). F generates a polarised spectrum of the form shown in Equation (2):
Such a spectrum would be observed as noisy samples from a number of squared wavelengths $\lambda^2_j, j \in [1, \dots, D]$ . We model this noise as a complex Gaussian with standard deviation $\sigma$ and call the noisy observed spectrum $\hat P$ :
The constant variance of the noise is a simplifying assumption which may not hold for real data, and exploring this is a topic for future work. By performing RM synthesis (Brentjens & de Bruyn Reference Brentjens and de Bruyn2005) on $\hat P$ with uniform weighting we arrive at an observed FDF:
Examples of F, $\hat F$ , P, and $\hat P$ for simple and complex observations are shown in Figures 1 and 2, respectively. Note that there are two reasons that the observed FDF $\hat F$ does not match the groundtruth FDF F. The first is the noise in $\hat P$ . The second arises from the incomplete sampling of $\hat P$ .
We do not consider external or internal Faraday dispersion in this work. External Faraday dispersion would broaden the delta functions of Equation (1) into peaks, and internal Faraday dispersion would broaden them into top-hat functions. All sources have at least a small amount of dispersion as the Faraday depth is a bulk property of the intervening medium and is subject to noise, but the assumption we make is that this dispersion is sufficiently small that the groundtruth FDFs are well-modelled with delta functions. Faraday thick sources would also invalidate our assumptions, and we assume that there are none in our data as Faraday thickness can be consistent with a two-component model depending on the wavelength sampling (e.g. Ma et al. Reference Ma, Mao, Stil, Basu, West, Heiles, Hill and Betti2019; Brentjens & de Bruyn Reference Brentjens and de Bruyn2005). Nevertheless some external Faraday dispersion would be covered by our model, as depending on observing parameters Faraday thick sources may appear as two screens (Van Eck et al. Reference Van Eck2017).
To simulate observed FDFs we follow the method of Brown et al. (Reference Brown2018), which we describe in Appendix E.
3. Classification approach
The Faraday complexity classification problem is as follows: Given an FDF $\hat F$ , is it Faraday complex or Faraday simple? In this section we describe the features that we have developed to address this problem, which can be used in any standard machine learning classifier. We trained two classifiers on these features, which we describe here also.
3.1. Features
Our features are based on a simple idea: all simple FDFs look essentially the same, up to scaling and translation, while complex FDFs may deviate. A noise-free peak-normalised simple FDF $\hat F_{\mathrm{simple}}$ has the form
where R is the rotation measure spread function (RMSF), the Fourier transform of the wavelength sampling function which is 1 at all observed wavelengths and 0 otherwise. $\phi_s$ traces out a curve in the space of all possible FDFs. In other words, $\hat F_{\mathrm{simple}}$ is a manifold parametrised by $\phi_s$ . Our features are derived from relating an observed FDF to the manifold of simple FDFs (the ‘simple manifold’). We measure the distance of an observed FDF to the simple manifold using distance measure $D_f$ , that take all values of the FDF into account:
We propose two distances that have nice properties:
-
invariant over changes in complex phase,
-
translationally invariant in Faraday depth,
-
zero for Faraday simple sources (i.e. when $A_0 = 0$ , $A_1 = 0$ , or $\phi_0 = \phi_1$ ) when there is no noise,
-
symmetric in components (i.e. swapping $A_0 \leftrightarrow A_1$ and $\phi_0 \leftrightarrow \phi_1$ should not change the distance),
-
increasing as $A_0$ and $A_1$ become closer to each other, and
-
increasing as screen separation $|\phi_0 - \phi_1|$ increases over a large range.
Our features are constructed from this distance and its minimiser. In other words we look for the simple FDF $\hat{F}_{\mathrm{simple}}$ that is ‘closest’ to the observed FDF $\hat{F}$ . The minimiser $\phi_s$ is the Faraday depth of the simple FDF.
While we could choose any distance that operates on functions, we used the 2-Wasserstein ( $W_2$ ) distance and the Euclidean distance. The $W_2$ distance operates on probability distributions and can be thought of as the minimum cost to ‘move’ one probability distribution to the other, where the cost of moving one unit of probability mass is the squared distance it is moved. Under $W_2$ distance, the minimiser $\phi_s$ in Equation (6) can be interpreted as the Faraday depth that the FDF $\hat F$ would be observed to have if its complexity was unresolved (i.e. the weighted mean of its components). The Euclidean distance is the square root of the least squares loss which is often used for fitting $\hat{F}_{\mathrm{simple}}$ to the FDF $\hat F$ . Under Euclidean distance, the minimiser $\phi_s$ is equivalent to the depth of the best-fitting single component under assumption of Gaussian noise in $\hat F$ . We calculated the $W_2$ distance using Python Optimal Transport (Flamary & Courty Reference Flamary and Courty2017), and we calculated the Euclidean distance using scipy.spatial.distance.euclidean (Virtanen et al. Reference Virtanen2020). Further intuition about the two distances is provided in section 3.2.
We denote by $\phi_w$ and $\phi_e$ , the Faraday depth of the simple FDF that minimises the respective distances (2-Wasserstein and Euclidean).
These features are depicted on an example FDF in Figure 3. For simple observed FDFs, the fitted Faraday depths $\phi_w$ and $\phi_e$ both tend to be close to the peak of the observed FDF. However, for complex observed FDFs, $\phi_w$ tends to be at the average depth between the two major peaks of the observed FDF, being closer to the higher peak. For notation convenience, we denote the Faraday depth of the observed FDF that has largest magnitude as $\phi_a$ , i.e.
Note that in practice $\phi_a \approx \phi_e$ . For complex observed FDFs, the values of Faraday depths $\phi_w$ and $\phi_a$ tend to differ (essentially by a proportion of the location of the second screen). The difference between $\phi_w$ and $\phi_a$ therefore provides useful information to identify complex FDFs. When the observed FDF is simple, the 2-Wasserstein fit will overlap significantly, hence the observed magnitudes $\hat F(\phi_w)$ and $\hat F(\phi_a)$ will be similar. However, for complex FDFs $\phi_w$ and $\phi_a$ are at different depths, leading to different values of $\hat F(\phi_w)$ and $\hat F(\phi_a)$ . Therefore, the magnitudes of the observed FDFs at the depths $\phi_w$ and $\phi_a$ indicate how different the observed FDF is from a simple FDF.
In summary, we provide the following features to the classifier:
-
$\log |\phi_w - \phi_a|$ ,
-
$\log \hat F(\phi_w)$ ,
-
$\log \hat F(\phi_a)$ ,
-
$\log D_{W_2}{(\hat F(\phi)}||{\hat F_{\mathrm{simple}}(\phi;\ \phi_w))}$ ,
-
$\log D_{E}{(\hat F(\phi)}||{\hat F_{\mathrm{simple}}(\phi;\ \phi_e))}$ ,
where $D_E$ is the Euclidean distance, $D_{W_2}$ is the $W_2$ distance, $\phi_a$ is the Faraday depth of the FDF peak, $\phi_w$ is the minimiser for $W_2$ distance, and $\phi_e$ is the minimiser for Euclidean distance.
3.2. Interpreting distances
Interestingly, in the case where there is no RMSF, Equation (6) with $W_2$ distance reduces to the Faraday moment already in common use:
See Appendix A for the corresponding calculation. In this sense, the $W_2$ distance can be thought of as a generalised Faraday moment, and conversely an interpretation of Faraday moments as a distance from the simple manifold in the case where there is no RMSF. Euclidean distance behaves quite differently in this case, and the resulting distance measure is totally independent of Faraday depth:
See Appendix B for the corresponding calculation.
3.3. Classifiers
We trained two classifiers on simulated observations using these features: logistic regression (LR) and extreme gradient boosted trees (XGB). These classifiers are useful together for understanding Faraday complexity classification. LR is a linear classifier that is readily interpretable by examining the weights it applies to each feature, and is one of the simplest possible classifiers. XGB is a powerful off-the-shelf non-linear ensemble classifier, and is an example of a decision tree ensemble which are widely used in astronomy (e.g. Machado Poletti Valle et al. Reference Machado Poletti Valle, Avestruz, Barnes, Farahi, Lau and Nagai2020; Hložek et al. Reference Hložek2020). We used the scikit-learn implementation of LR and we use the XGBoost library for XGB. We optimised hyperparameters for XGB using a fork of xgboost-tuner Footnote a as utilised by Zhu, Ong, & Huttley (Reference Zhu, Ong and Huttley2020). We used $1\,000$ iterations of randomised parameter tuning and the hyperparameters we found are tabulated in Table C.1. We optimised hyperparameters for LR using a fivefold cross-validation grid search implemented in sklearn.model_selection.GridSearchCV. The resulting hyperparameters are tabulated in Table D.1 in Appendix C.
4. Experimental method and results
We applied our classifiers to classify simulated (Sections 4.2 and 4.3) and real (Section 4.4) FDFs. We replicated the experimental setup of Brown et al. (Reference Brown2018) for comparison with the state-of-the-art CNN classification method, and we also applied our method to 142 real FDFs observed with the Australia Telescope Compact Array (ATCA) from Livingston et al. (Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021) and O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017).
4.1. Data
4.1.1. Simulated training and validation data
Our classifiers were trained and validated on simulated FDFs. We produced two sets of simulated FDFs, one for comparison with the state-of-the-art method in the literature and one for application to our observed FDFs (described in Section 4.1.2). We refer to the former as the ‘ASKAP’ dataset as it uses frequencies from the Australian Square Kilometre Array Pathfinder 12-antenna early science configuration. These frequencies included 900 channels from 700 to $1\,300$ and $1\,500$ to $1\,800$ MHz and were used to generate simulated training and validation data by Brown et al. (Reference Brown2018). We refer to the latter as the ‘ATCA’ dataset as it uses frequencies from the 1 to 3 GHz configuration of the ATCA. These frequencies included 394 channels from 1.29 to 3.02 GHz and match our real data. We simulated Faraday depths from $-50$ to 50 rad m–2 for the ‘ASKAP’ dataset (matching Brown) and $-500$ to 500 for the ‘ATCA’ dataset.
For each dataset, we simulated $100\,000$ FDFs, approximately half simple and half complex. We randomly allocated half of these FDFs to a training set and reserved the remaining half for validation. Each FDF had complex Gaussian noise added to the corresponding polarisation spectrum. For the ‘ASKAP’ dataset, we sampled the standard deviation of the noise uniformly between 0 and $\sigma_{\max} = 0.333$ , matching the dataset of Brown et al. (Reference Brown2018). For the ‘ATCA’ dataset, we fit a log-normal distribution to the standard deviations of O’Sullivan’s data (O’Sullivan et al. Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) from which we sampled our values of $\sigma$ :
4.1.2. Observational data
We used two real datasets containing a total of 142 sources: 42 polarised spectra from Livingston et al. (Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021) and 100 polarised spectra from O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017). These datasets were observed in similar frequency ranges on the same telescope (with different binning), but are in different parts of the sky. The Livingston data were taken near the Galactic Centre, and the O’Sullivan data were taken away from the plane of the Galaxy. There are more Faraday complex sources near the Galactic Centre compared to more Faraday simple sources away from the plane of the Galaxy (Livingston et al. Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021). The similar frequency channels used in the two datasets result in almost identical RMSFs over the Faraday depth range we considered ( $-500$ to 500 rad m–2), so we expected that the classifiers would work equally well on both datasets with no need to retrain. We discarded the 26 Livingston sources with modelled Faraday depths outside of this Faraday depth range, which we do not expect to affect the applicability of our methods to wide-area surveys because these fairly high depths are not common.
Livingston et al. (Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021) used RM-CLEAN (Heald Reference Heald2008) to identify significant components in their FDFs. Some of these components had very high Faraday depths up to $2\,000$ rad m–2, but we chose to ignore these components in this paper as they are much larger than might be expected in a wide-area survey like POSSUM. They used the second Faraday moment (Brown Reference Brown2011) to estimate Faraday complexity, with Faraday depths determined using scipy.signal.findpeaks on the cleaned FDFs, with a cut-off of seven times the noise of the polarised spectrum. Using this method, they estimated that 89% of their sources were Faraday complex, i.e. had a Faraday moment greater than 0.
O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) used the QU-fitting and model selection technique as described in O’Sullivan et al. (Reference O’Sullivan2012). The QU-fitting models contained up to three Faraday screen components as well as a term for internal and external Faraday dispersion. We ignore the Faraday thickness and dispersion for the purposes of this paper, as most sources were not found to have Faraday thickness and dispersion is beyond the scope of our current work. Thirty-seven sources had just 1 component, 52 had 2, and the remaining 11 had 3.
4.2. Results on ‘ASKAP’ dataset
The accuracy of the LR and XGB classifiers on the ‘ASKAP’ testing set was 94.4 and 95.1%, respectively. The rates of true and false identifications are summarised in Table 1. These results are very close to the CNN presented by Brown et al. (Reference Brown2018), with a slightly higher true negative rate and a slightly lower true positive rate (recalling that positive sources are complex, and negative sources are simple). The accuracy of the CNN was 94.9, slightly lower than our XGB classifier and slightly higher than our LR classifier. Both of our classifiers therefore produce similar classification performance to the CNN, with faster training time and easier interpretation.
4.3. Results on ‘ATCA’ dataset
The accuracy of the LR and XGB classifiers on the ‘ATCA’ dataset was 89.2 and 90.5%, respectively. The major differences between the ‘ATCA’ and the ‘ASKAP’ experiments are the range of the simulated Faraday depths and the distribution of noise levels. The ‘ASKAP’ dataset, to match past CNN work, only included depths from $-50$ to 50 rad m–2, while the ‘ATCA’ dataset includes depths from $-500$ to 500 rad m–2. The rates of true and false identifications are again shown in Table 1.
As we know the true Faraday depths of the components in our simulation, we can investigate the behaviour of these classifiers as a function of physical properties. Figure 4 shows the mean classifier prediction as a function of component depth separation and minimum component amplitude. This is tightly related to the mean accuracy, as the entire plot domain contains complex spectra besides the left and bottom edge: by thresholding the classifier prediction to a certain value, the accuracy will be 100% on the non-edge for all sources with higher prediction values.
4.4. Results on observed FDFs
We used the LR and XGB classifiers, which were trained on the ‘ATCA’ dataset to estimate the probability that our 142 observed FDFs (Section 4.1.2) were Faraday complex. As these classifiers were trained on simulated data, they face the issue of the ‘domain gap’: the distribution of samples from a simulation differs from the distribution of real sources, and this affects performance on real data. Solving this issue is called ‘domain adaptation’ and how to do this is an open research question in machine learning (Zhang Reference Zhang2019; Pan & Yang Reference Pan and Yang2010). Nevertheless, the features of our observations mostly fall in the same region of feature space as the simulations (Figure 5) and so we expect reasonably good domain transfer.
Two apparently complex sources in the Livingston sample are classified as simple with high probability by XGB. These outliers are on the very edge of the training sample (Figure 5) and the underdensity of training data here is likely the cause of this issue. LR does not suffer the same issue, producing plausible predictions for the entire dataset, and these sources are instead classified as complex with high probability.
With a threshold of 0.5, LR predicted that 96 and 83% of the Livingston and O’Sullivan sources were complex, respectively. This is in line with expectations that the Livingston data should have more Faraday complex sources than the O’Sullivan data due to their location near the Galactic Centre. XGB predicted that 93 and 100% of the Livingston and O’Sullivan sources were complex, respectively. Livingston et al. (Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021) found that 90% of their sources were complex, and O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017) found that 64% of their sources were complex. This suggests that our classifiers are overestimating complexity, though it could also be the case that the methods used by Livingston and O’Sullivan underestimate complexity. Modifying the prediction threshold from 0.5 changes the estimated rate of Faraday complexity, and we show the estimated rates against threshold for both classifiers in Figure 6. We suggest that this result is indicative of our probabilities being uncalibrated, and a higher threshold should be chosen in practice. We chose to keep the threshold at 0.5 as this had the highest accuracy on the simulated validation data. The very high complexity rates of XGB and two outlying classifications indicate that the XGB classifier may be overfitting to the simulation and that it is unable to generalise across the domain gap.
Figures D.1 and D.2 show every observed FDF ordered by estimated Faraday complexity, alongside the models predicted by Livingston and O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017), for LR and XGB, respectively. There is a clear visual trend of increasingly complex sources with increasing predicted probability of being complex.
5. Discussion
On simulated data (Section 4.3), we achieve state-of-the-art accuracy. Our results on observed FDFs show that our classifiers produce plausible results, with Figures D.1 and D.2 showing a clear trend of apparent complexity. Some issues remain: we discuss the intrinsic overlap between simple and complex FDFs in Section 5.1 and the limitations of our method in Section 5.2.
5.1. Complexity and seeming ‘not simple’
Through this work, we found our methods limited by the significant overlap between complex and simple FDFs. Complex FDFs can be consistent with simple FDFs due to close Faraday components or very small amplitudes on the secondary component, and vice versa due to noise.
The main failure mode of our classifiers is misclassifying a complex source as simple (Table 1). Whether sources with close components or small amplitudes should be considered complex is not clear, since for practical purposes, they can be treated as simple: assuming the source is simple yields a very similar RM to the RM of the primary component, and thus would not negatively impact further data products such as an RM grid. The scenarios where we would want a Faraday complexity classifier rather than a polarisation structure model—large-scale analysis and wide-area surveys—do not seem to be disadvantaged by considering such sources simple. Additional sources similar to these are likely hidden in presumably ‘simple’ FDFs by the frequency range and spacing of the observations, just as how these complex sources would be hidden in lower resolution observations. Note also that misidentification of complex sources as simple is intrinsically a problem with complexity estimation even for models not well-represented by a simple FDF, as complex sources may conspire to appear as a wide range of viable models including simple (Sun et al. Reference Sun2015).
Conversely, high-noise simple FDFs may be consistent with complex FDFs. One key question is how Faraday complexity estimators should behave as the noise increases: should high noise result in a complex prediction or a simple prediction, given that a complex or simple FDF would both be consistent with a noisy FDF? Occam’s razor suggests that we should choose the simplest suitable model, and so increasing noise should lead to predictions of less complexity. This is not how our classifiers operate, however: high-noise FDFs are different to the model simple FDFs and so are predicted to be ‘not simple’. In some sense our classifiers are not looking for complex sources, but are rather looking for ‘not simple’ sources.
5.2. Limitations
Our main limitations are our simplifying assumptions on FDFs and the domain gap between simulated and real observations. However, our proposed features (Section 3.1) can be applied to future improved simulations.
It is unclear what the effect of our simplifying assumptions are on the effectiveness of our simulation. The three main simplifications that may negatively affect our simulations are (1) limiting to two components, (2) assuming no external Faraday dispersion, and (3) assuming no internal Faraday dispersion (Faraday thickness). Future work will explore removing these simplifying assumptions, but will need to account for the increased difficulty in characterising the simulation with more components and no longer having Faraday screens as components. Additionally, more work will be required to make sure that the rates of internal and external Faraday dispersion match what might be expected from real sources, or risk making a simulation that has too large a range of consistent models for a given source: for example, a two-component source could also be explained as a sufficiently wide or resolved-out Faraday thick source or a three-component source with a small third component. This greatly complicates the classification task.
Previous machine learning work (e.g. Brown et al. Reference Brown2018) has not been run before on real FDF data, so this paper is the first example of the domain gap arising in Faraday complexity classification. This is a problem that requires further research to solve. We have no good way to ensure that our simulation matches reality, so some amount of domain adaptation will always be necessary to train classifiers on simulated data and then apply these classifiers to real data. But with the low source counts in polarisation science (high-resolution spectropolarimetric data currently numbers in the few hundreds) any machine learning method will need to be trained on simulations. This is not just a problem in Faraday complexity estimation, and domain adaptation is also an issue faced in the wider astroinformatics community: large quantities of labelled data are hard to come by, and some sources are very rare (e.g. gravitational wave detections or fast radio bursts; Zevin et al. Reference Zevin2017; Gebhard et al. Reference Gebhard, Kilbertus, Harry and Schölkopf2019; Agarwal et al. Reference Agarwal, Aggarwal, Burke-Spolaor, Lorimer and Garver-Daniels2020). LR seems to handle the domain adaptation better than XGB, with only a slightly lower accuracy on simulated data. Our results are plausible and the distribution of our simulation well overlaps the distribution of our real data (Figure 5).
6. Conclusion
We developed a simple, interpretable machine learning method for estimating Faraday complexity. Our interpretable features were derived by comparing observed FDFs to idealised simple FDFs, which we could determine both for simulated and real observations. We demonstrated the effectiveness of our method on both simulated and real data. Using simulated data, we found that our classifiers were 95% accurate, with near perfect recall (specificity) of Faraday simple sources. On simulated data that matched existing observations, our classifiers obtained an accuracy of 90%. Evaluating our classifiers on real data gave the plausible results shown in Figure D.1, and marks the first application of machine learning to observed FDFs. Future work will need to narrow the domain gap to improve transfer of classifiers trained on simulations to real, observed data.
Acknowledgements
This research was conducted in Canberra, on land for which the Ngunnawal and Ngambri people are the traditional and ongoing custodians. M. J. A. and J. D. L. were supported by the Australian Government Research Training Program. M. J. A. was supported by the Astronomical Society of Australia. The Australia Telescope Compact Array is part of the Australia Telescope National Facility which is funded by the Australian Government for operation as a National Facility managed by CSIRO. We acknowledge the Gomeroi people as the traditional owners of the Observatory site. We thank the anonymous referee for their comments on this work.
A. 2-Wasserstein begets Faraday moments
Minimising the 2-Wasserstein distance between a model FDF and the simple manifold gives the second Faraday moment of that FDF. Let $\tilde F$ be the sum-normalised model FDF and let $\tilde S$ be the sum-normalised simple model FDF:
The $W_2$ distance, usually defined on probability distributions, can be extended to one-dimensional complex functions A and B by normalising them:
where $\Gamma(A, B)$ is the set of couplings of A and B, i.e. the set of joint probability distributions that marginalise to A and B; and $\inf_{\gamma \in \Gamma(A, B)}$ is the infimum over $\Gamma(A, B)$ . This can be interpreted as the minimum cost to ‘move’ one probability distribution to the other, where the cost of moving one unit of probability mass is the squared distance it is moved.
The set of couplings $\Gamma(\tilde F, \tilde S)$ is the set of all joint probability distributions $\gamma$ such that
The coupling that minimises the integral in Equation (A3) will be the optimal transport plan between $\tilde F$ and $\tilde S$ . Since $\tilde F$ and $\tilde S$ are defined in terms of delta functions, the optimal transport problem reduces to a discrete optimal transport problem and the optimal transport plan is
In other words, to move the probability mass of $\tilde S$ to $\tilde F$ , a fraction $A_0/(A_0 + A_1)$ is moved from $\phi_w$ to $\phi_0$ and the complementary fraction $A_1/(A_0 + A_1)$ is moved from $\phi_w$ to $\phi_1$ . Then:
To obtain the $W_2$ distance to the simple manifold, we need to minimise this over $\phi_w$ . Differentiate with respect to $\phi_w$ and set equal to zero to find
Substituting this back in, we find
which is the Faraday moment.
B. Euclidean distance in the no-RMSF case
In this section, we calculate the minimumised Euclidean distance evaluated on a model FDF (Equation (1)). Let $\tilde F$ be the sum-normalised model FDF and let $\tilde S$ be the normalised simple model FDF:
The Euclidean distance between $\tilde F$ and $\tilde S$ is then
Assume $\phi_0 \neq \phi_1$ (otherwise, $D_E$ will always be either 0 or 2). If $\phi_e = \phi_0$ , then
and similarly for $\phi_e = \phi_1$ . If $\phi_e \neq \phi_0$ and $\phi_e \neq \phi_1$ , then
The minimised Euclidean distance when $\phi_0 \neq \phi_1$ is therefore
If $\phi_0 = \phi_1$ , then the minimised Euclidean distance is 0.
C. Hyperparameters for LR and XGB
This section contains tables of the hyperparameters that we used for our classifiers. Tables C.1 and D.1 tabulate the hyperparameters for XGB and LR, respectively, for the ‘ATCA’ dataset. Tables D.2 and D.3 tabulate the hyperparameters for XGB and LR, respectively, for the ‘ASKAP’ dataset.
D. Predictions on real data
This section contains Figures D.1 and D.2, which shows the predicted probability of being Faraday complex for all real data used in this paper, drawn from Livingston et al. (Reference Livingston, McClure-Griffiths, Gaensler, Seta and Alger2021) and O’Sullivan et al. (Reference O’Sullivan, Purcell, Anderson, Farnes, Sun and Gaensler2017).
E. Simulating observed FDFs
We simulated FDFs by approximating them by arrays of complex numbers. An FDF F is approximated on the domain $[\!-\phi_{\max}, \phi_{\max}]$ by a vector $\textbf{\textit{F}} \in \mathbb R^d$ :
where $\delta\phi = (\phi_{\max} - \phi_{\min}) / d$ and d is the number of Faraday depth samples in the FDF. $\textbf{\textit{F}}$ is sampled by uniformly sampling its parameters:
We then generate a vector polarisation spectrum $\textbf{\textit{P}} \in \mathbb R^m$ from $\textbf{\textit{F}}$ using Equation (E4):
$\lambda^2_\ell$ is the discretised value of $\lambda^2$ at the $\ell$ th index of $\textbf{\textit{P}}$ . This requires a set of $\lambda^2$ values, which depends on the dataset being simulated. These values can be treated as the channel wavelengths at which the polarisation spectrum was observed. We then add Gaussian noise with variance $\sigma^2$ to each element of $\textbf{\textit{P}}$ to obtain a discretised noisy observation $\hat{\textbf{\textit{P}}}$ . Finally, we perform RM synthesis using the Canadian Initiative for Radio Astronomy Data Analysis RM packageFootnote b , which is a Python module that implements a discrete version of RM synthesis: