algorithms for prediction in agriculture: An application of Feature importance and Feature Selection

Authors

Keywords:

Artificial Intelligence, Machine Learning, BORUTA

Abstract

The Agriculture sector has created and collected large amounts of data. It can be gathered, stored, and analyzed to assist in decision making generating competitive value, and the use of Machine Learning techniques has been very effective for this market. In this work, a Machine Learning study was carried out using supervised classification models based on boosting to predict disease in a crop, thus identifying the model with the best areas under curve metrics. Light Gradient Boosting Machine, CatBoost Classifier, Extreme Gradient, Gradient Boosting Classifier, Adaboost models were used to qualify the crop as healthy or sick. One can see that the LightGBM algorithm provided a better fit to the data with an area under the curve of 0.76 under the use of BORUTA variable selection.

References

BROWNLEE, Jason. Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery, 2020.

CHEN, Tianqi; GUESTRIN, Carlos. Xgboost: A scalable tree boosting system. In: Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016. p. 785-794.

FERENTINOS, Konstantinos P. Deep learning models for plant disease detection and diagnosis. Computers and electronics in agriculture, v. 145, p. 311-318, 2018.

FERNANDES, Fernando Timoteo et al. A multipurpose machine learning approach to predict COVID-19 negative prognosis in São Paulo, Brazil. Scientific reports, v. 11, n. 1, p. 3343, 2021.

GERON, Aurélien. Mãos `a Obra: Aprendizado de Máquina com Scikit-Learn & TensorFlow. Alta Books, 2019.

KE, Guolin et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, v. 30, 2017.

KHAN, Shahzad Ali; RANA, Zeeshan Ali. Evaluating performance of software defect prediction models using area under precision-Recall curve (AUC-PR). In: 2019 2nd International Conference on Advancements in Computational Sciences (ICACS). IEEE, 2019. p. 1-6.

KURSA, Miron B.; RUDNICKI, Witold R. Feature selection with the Boruta package. Journal of statistical software, v. 36, p. 1-13, 2010.

LI, Chunhua; WANG, Meihong. Pest and disease management in agricultural production with artificial intelligence: Innovative applications and development trends. Advances in Resources Research, v. 4, n. 3, p. 381-401, 2024.

LUBO-ROBLES, David et al. Machine learning model interpretability using SHAP values: Application to a seismic facies classification task. In: SEG international exposition and annual meeting. SEG, 2020. p. D021S008R006.

LUNDBERG, Scott M.; LEE, Su-In. Consistent feature attribution for tree ensembles. arXiv preprint arXiv:1706.06060, 2017.

LUNDBERG, Scott M.; ERION, Gabriel G.; LEE, Su-In. Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888, 2018.

MOLNAR, Christoph. Interpretable machine learning. Lulu. com, 2020.

MOSHOU, Dimitrios et al. Plant disease detection based on data fusion of hyper-spectral and multi-spectral fluorescence imaging using Kohonen maps. Real-Time Imaging, v. 11, n. 2, p. 75-83, 2005.

MOSHOU, Dimitrios et al. Simultaneous identification of plant stresses and diseases in arable crops using proximal optical sensing and self-organising maps. Precision Agriculture, v. 7, n. 3, p. 149-164, 2006.

MOSHOU, Dimitrios et al. Automatic detection of ‘yellow rust’in wheat using reflectance measurements and neural networks. Computers and electronics in agriculture, v. 44, n. 3, p. 173-188, 2004.

MOSHOU, Dimitrios et al. Water stress detection based on optical multisensor fusion with a least squares support vector machine classifier. Biosystems Engineering, v. 117, p. 15-22, 2014.

PANTAZI, Xanthoula Eirini et al. Detection of biotic and abiotic stresses in crops by using hierarchical self organizing classifiers. Precision agriculture, v. 18, p. 383-393, 2017.

PEDREGOSA, Fabian et al. Scikit-learn: Machine learning in Python. the Journal of machine Learning research, v. 12, p. 2825-2830, 2011.

PROKHORENKOVA, Liudmila et al. CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, v. 31, 2018.

VURRO, Maurizio; BONCIANI, Barbara; VANNACCI, Giovanni. Emerging infectious diseases of crop plants in developing countries: impact on agriculture and socio-economic consequences. Food security, v. 2, p. 113-132, 2010.

RAMESH, Shima et al. Plant disease detection using machine learning. In: 2018 International conference on design innovations for 3Cs compute communicate control (ICDI3C). IEEE, 2018. p. 41-45.

SHARMA, Ajay; MISHRA, Pramod Kumar. Performance analysis of machine learning based optimized feature selection approaches for breast cancer diagnosis. International Journal of Information Technology, v. 14, n. 4, p. 1949-1960, 2022.

STEFFEN, Gerusa Pauli Kist; STEFFEN, Ricardo Bemfica; ANTONIOLLI, Zaida Inˆes. Contaminação do solo e da água pelo uso de agrotóxicos. Tecnológica, v. 15, n. 1, p. 15-21, 2011.

SHRUTHI, U.; NAGAVENI, V.; RAGHAVENDRA, B. K. A review on machine learning classification techniques for plant disease detection. In: 2019 5th International conference on advanced computing & communication systems (ICACCS). IEEE, 2019. p. 281-284.

SUJATHA, Radhakrishnan et al. Performance of deep learning vs machine learning in plant leaf disease detection. Microprocessors and Microsystems, v. 80, p. 103615, 2021.

YING, Cao et al. Advance and prospects of AdaBoost algorithm. Acta Automatica Sinica, v. 39, n. 6, p. 745-758, 2013.

ZHANG, Yanru; HAGHANI, Ali. A gradient boosting method to improve travel time prediction. Transportation Research Part C: Emerging Technologies, v. 58, p. 308-324, 2015.

R CORE TEAM. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2012. ISBN 3-900051-07-0, URL http://www.R-project.org/.

Downloads

Published

07-01-2025

How to Cite

Silva, V. C., Silva Rocha, M., Amorim Faria, G., Xavier Junior, S. F. A., Almeida de Oliveira, T., & Bastos Peixoto, A. P. (2025). algorithms for prediction in agriculture: An application of Feature importance and Feature Selection. Sigmae, 13(4), 308–317. Retrieved from https://publicacoes.unifal-mg.edu.br/revistas/index.php/sigmae/article/view/2524

Issue

Section

Data Science & Machine Learning