Hostname: page-component-cd9895bd7-fscjk Total loading time: 0 Render date: 2024-12-23T03:42:12.230Z Has data issue: false hasContentIssue false

Discrimination of fish populations using parasites: Random Forests on a ‘predictable’ host-parasite system

Published online by Cambridge University Press:  06 July 2010

A. PÉREZ-DEL-OLMO*
Affiliation:
Department of Applied Zoology/Hydrobiology, University of Duisburg-Essen, Universitätsstrasse 5, D-45141 Essen, Germany
F. E. MONTERO
Affiliation:
Department of Animal Biology, Plant Biology and Ecology, Autonomous University of Barcelona, Campus Universitari, 08193 Bellaterra, Barcelona, Spain
M. FERNÁNDEZ
Affiliation:
Fundación General de la Universitat de València & Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Parc Científic, Universitat de Valencia, PO Box 22 085, 46071 Valencia, Spain
J. BARRETT
Affiliation:
IBERS, University of Aberystwyth, Ceredigion SY23 3DA, UK
J. A. RAGA
Affiliation:
Instituto Cavanilles de Biodiversidad y Biología Evolutiva, Parc Científic, Universitat de València, PO Box 22 085, 46071 Valencia, Spain
A. KOSTADINOVA
Affiliation:
Institute of Parasitology, Biology Centre v.v.i., Academy of Sciences of the Czech Republic, Branišovská 31, 370 05 České Budějovice, Czech Republic Central Laboratory of General Ecology, Bulgarian Academy of Sciences, 2 Gagarin Street, 1113 Sofia, Bulgaria
*
*Corresponding author: University of Duisburg-Essen, Department of Applied Zoology/Hydrobiology, Universitätsstrasse 5, D-45141 Essen, Germany. Tel: +49 2011832250. Fax: +49 2011832179. E-mail: [email protected]

Summary

We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2–5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2010

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

Arias, A. M. and Drake, P. (1994). Structure and production of the benthic macroinvertebrate community in a shallow lagoon in the Bay of Cádiz. Marine Ecology Progress Series 115, 151167.CrossRefGoogle Scholar
Atmar, W. and Patterson, B. D. (1995). The Nestedness Temperature Calculator: a Visual Basic Program, including 294 Presence-Absence Matrices. AICS Res. Inc., University Park, New Mexico, and The Field Mus., Chicago, USA. (http://aicsresearch.com/ nestedness/tempcalc.html)Google Scholar
Breiman, L. (2001). Random forests. Machine Learning 45, 532.CrossRefGoogle Scholar
Bush, A. O., Lafferty, K. D., Lotz, J. M. and Shostak, A. W. (1997). Parasitology meets ecology in its own terms: Margolis et al. revisited. Journal of Parasitology 83, 575583.CrossRefGoogle Scholar
Fabrizio, M. C. (2005). Experimental design and sampling strategies for mixed-stock analysis. In Stock Identification Methods. Applications in Fishery Science (ed. Cadrin, S. X., Friedland, K. D. and Waldman, J. R.), pp. 467498. Elsevier Academic Press, San Diego, CA, USA.CrossRefGoogle Scholar
Ferrer-Castelló, E., Raga, J. A. and Aznar, F. J. (2007). Parasites as fish population tags and pseudoreplication problems: the case of striped red mullet Mullus surmuletus in the Spanish Mediterranean. Journal of Helminthology 81, 169178.CrossRefGoogle ScholarPubMed
Koprinska, I., Poon, J., Clark, J. and Chan, J. (2007). Learning to classify e-mail. Information Sciences 177, 21672187.CrossRefGoogle Scholar
Liaw, A. and Wiener, M. (2002). Classification and regression by Random-Forest. R News 2, 1822. (http://CRAN.R-project.org/doc/Rnews/)Google Scholar
Liaw, A. and Weiner, M. (2007). randomForest (R software for random forest). Fortran original (L. Breiman and A. Cutler), R port (A. Liaw and M.Wiener) Version 4.5–19 and 4.5–25. (http://cran.r-project.org/web/ packages/randomForest /index.html)Google Scholar
Lunetta, K. L., Hayward, L. B., Segal, J. and Eerdewegh, P. V. (2004). Screening large-scale association study data: exploiting interactions using random forests. BMC Genetics 5, 32.CrossRefGoogle ScholarPubMed
MacKenzie, K. (2002). Parasites as biological tags in population studies of marine organisms: An update. Parasitology 124, S153S163.CrossRefGoogle ScholarPubMed
MacKenzie, K. and Abaunza, P. (2005). Parasites as biological tags. In Stock Identification Methods. Applications in Fishery Science (ed. Cadrin, S. X., Friedland, K. D. and Waldman, J. R.), pp. 211226. Elsevier Academic Press, San Diego, CA, USA.CrossRefGoogle Scholar
Meyer, D., Leisch, F. and Hornik, K. (2003). The support vector machine under test. Neurocomputing 55, 169186.CrossRefGoogle Scholar
Okun, O. and Priisalu, H. (2007). Random Forest for gene expression based cancer classification: Overlooked issues. In Pattern Recognition and Image Analysis. Lecture Notes in Computer Science, Vol. 4478 (ed. Martí, J., Benedí, J. M., Mendonça, A. M. andSerrat, J.), pp. 483490. Springer-Verlag, Berlin-Heidelberg, Germany.Google Scholar
Perdiguero-Alonso, D., Montero, F. E., Kostadinova, A., Raga, J. A. and Barrett, J. (2008). Random forests, a novel approach for discrimination of fish populations using parasites as biological tags. International Journal for Parasitology 38, 14251434.CrossRefGoogle ScholarPubMed
Pérez-del-Olmo, A., Fernández, M., Gibson, D. I., Raga, J. A. and Kostadinova, A. (2007). Descriptions of some unusual digeneans from Boops boops L. (Sparidae) and a complete checklist of its metazoan parasites. Systematic Parasitology 66, 137158.CrossRefGoogle Scholar
Pérez-del-Olmo, A., Fernández, M., Raga, J. A., Kostadinova, A. and Poulin, R. (2008). Halfway up the trophic chain: development of parasite communities in the sparid fish Boops boops. Parasitology 135, 257268.CrossRefGoogle ScholarPubMed
Pérez-del-Olmo, A., Fernández, M., Raga, J. A., Kostadinova, A. and Morand, S. (2009). Not everything is everywhere: Similarity-decay relationship in a marine host-parasite system. Journal of Biogeography 36, 200209.CrossRefGoogle Scholar
Peters, J., Samson, R. and Verhoest, N. E. C. (2005). Predictive ecohydrological modelling using the random forest algorithm. Communications in Agricultural and Applied Biological Sciences 70, 207211.Google ScholarPubMed
Peters, J., De Baets, B., Verhoest, N. E. C., Samson, R., Degroeve, S., De Becker, P. and Huybrechts, W. (2007). Random forests as a tool for ecohydrological distribution modelling. Ecological Modelling 207, 304318.CrossRefGoogle Scholar
Pietrock, M. and Marcogliese, D. J. (2003). Free-living endohelminth stages: at the mercy of environmental conditions. Trends in Parasitology 19, 293299.CrossRefGoogle ScholarPubMed
Power, A. M., Balbuena, J. A. and Raga, J. A. (2005). Parasite infracommunities as predictors of harvest location of bogue (Boops boops L.): a pilot study using statistical classifiers. Fisheries Research 72, 229239.CrossRefGoogle Scholar
Prinzie, A. and Van den Poel, D. (2008). Random Forests for multiclass classification: Random MultiNomial Logit. Expert Systems with Applications 34, 17211732.CrossRefGoogle Scholar
R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (http://www.R-project.org).Google Scholar
Rueda, J. L. and Salas, C. (2003). Seasonal variation of a molluscan assemblage living in a Caulerpa prolifera meadow within the inner Bay of Cádiz (SW Spain). Estuarine Coastal and Shelf Science 57, 909918.CrossRefGoogle Scholar
Siroky, D. (2009). Navigating Random Forests and related advances in algorithmic modeling. Statistics Surveys 3, 147163.CrossRefGoogle Scholar
Sokal, R. R. and Rohlf, F. J. (1995). Biometry. Principles and Practice of Statistics in Biological Research, 3rd Edn. W.H. Freeman and Company, New York, USA.Google Scholar
Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P. and Feuston, B. P. (2003). Random forest: a classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Modelling 43, 19471958.Google ScholarPubMed
Timi, J. (2007). Parasites as biological tags for stock discrimination in marine fish from South American Atlantic waters. Journal of Helminthology 81, 107111.CrossRefGoogle ScholarPubMed
Venables, W. N. and Ripley, B. D. (2002) Modern Applied Statistics with S. 4th Edn. Springer, New York, USA.CrossRefGoogle Scholar