Forecasting rare hydrological events by machine learning methods: case study of ice jams on the Pechora river
https://doi.org/10.55959/MSU0579-9414.5.80.1.6
Abstract
Rare hydrological events, as the name suggests, occur quite infrequently, but are often catastrophic for humans. They are also inadequately provided with measurements (the so-called class imbalance). In its turn, this hinders the creation of reliable models for predicting such processes. This is especially evident when constructing models of natural processes using machine learning algorithms, which are particularly sensitive to class-imbalanced samples. The study attempts to overcome the above-mentioned limitations by supplementing a series for model training with artificially generated events.
The subject and object of the study were long-term forecasts of ice jams occurring at the mouth of the Pechora River in the Arctic area of the European Russia.
Data on ice jams were collected over a long period of observations, and applicable predictors and models were selected. The following machine learning algorithms were used: k-nearest neighbors (KNN), logistic regression, gradient boosting (CatBoost), and multilayer perceptron (MLP). As a result all models demonstrated higher quality of modeling after supplementing artificial events to a series. This confirms the prospects of the method of series supplementing for training models of rarely occurring processes.
About the Authors
S. M. IglinRussian Federation
Laboratory of Hydroinformatics
Moscow
V. M. Moreido
Russian Federation
Laboratory of Hydroinformatics
Moscow
K. I. Golovnin
Russian Federation
Laboratory of Hydroinformatics
Moscow
References
1. Agafonova S.A., Frolova N.L. Specific features of ice regime in rivers of the Northern Dvina Basin], Water Resources, 2007, no. 2, p. 123–131.
2. Agafonova S.A., Vasilenko A.N., Frolova N.L. Faktory obrazovanija ledovyh zatorov na rekah bassejna Severnoj Dviny v sovremennyh uslovijah [The present-day factors of ice jam formation on the rivers of the Severnaya Dvina River Basin], Vestn. Mosk. un-ta, Ser. 5, Geogr., 2016, no. 2, p. 82–90. (In Russian)
3. Bourel M., Segura A.M., Crisci C. et al. Machine learning methods for imbalanced data set for prediction of faecal contamination in beach waters, Water Research, 2021, vol. 202, DOI: 10.1016/j.watres.2021.117450.
4. Buzin V.A. Zatory l’da i zatornye navodnenija na rekah [Ice Jams and Ice Jam Floods on Rivers], St Petersburg, Gidrometeoizdat Publ., 2004, 203 p. (In Russian)
5. Chawla N.V., Bowyer K.W., Hall L.O. et al. SMOTE: Synthetic minority over-sampling technique, J. Artif. Intell. Res., 2002, vol. 16, p. 321–357, DOI: 10.1613/jair.953.
6. Goroshkova N.I., Zavadskij A.S., Krylenko I.N. et al. Opasnye ledovye javlenija na rekah i vodohranilishhah Rossii : Monografija [Dangerous Ice Phenomena on Rivers and Reservoirs of Russia : A Monography], D.V. Kozlov (еd.), Moscow, RGAU-MSHA im. K.A. Timirjazeva Publ., 2015, 348 p. (In Russian)
7. Graf R., Kolerski T., Zhu S. Predicting ice phenomena in a river using the artificial neural network and extreme gradient boosting, Resources, 2022, vol. 11, no. 12, 29 p.
8. Guo X., Wang T., Fu H. et al. Ice-jam forecasting during river breakup based on neural network theory, Journal of Cold Regions Engineering, 2018, vol. 32, iss. 3, p. 04018010, DOI: 10.1061/(ASCE)CR.1943-5495.0000168.
9. Lupachev V.G. Ledovye zatory na reke Pechore i ih prognozirovanie [Ice Jams on the Pechora River and Their Forecasting], Meteorologija i gidrologija, 1979, no. 4, p. 45–51. (In Russian)
10. Magrickij D.V., Agafonova S.A., Banshhikova L.S. et al. Gidrologicheskie opasnosti v ust’e Pechory [Hydrological hazards at the mouth of the Pechora River], Problemy Arktiki i Antarktiki, 2024, vol. 70, no. 2, p. 185–209, DOI: 10.30758/0555-2648-2024-70-2-185-209. (In Russian)
11. Malygin I.V. Metodika prognoza obrazovanija ledovyh zatorov na rekah na osnove teorii raspoznavanija obrazov [Forecasting of ice clogging in rivers using the theory of image recognition], Vestn. Mosk. un-ta, Ser. 5, Geogr., 2014, no. 3, p. 43–47. (In Russian)
12. Malygin I.V., Aleshin I.M. Prognozirovanie zatorov l’da na r. Lene metodami mashinnogo obuchenija [Forecasting ice jams on the Lena River using machine learning methods], Geofizicheskie processy i biosfera, 2022, vol. 21, no. 3, p. 18–26, DOI: 10.21455/GPB2022.3-3. (In Russian)
13. Massie D.D., White K.D., Daly S.F. Application of neural networks to predict ice jam occurrence, Cold Reg. Eng., 2002, vol. 35, no. 2, p. 0–122, DOI: 10.1016/s0165-232x(02)00056-3.
14. Metodicheskie rekomendacii po predotvrashheniju obrazovanija ledovyh zatorov na rekah Rossijskoj Federacii i bor’be s nimi [Methodological recommendations for preventing and mitigating ice jams formation on the rivers of the Russian Federation and its counteraction], Moscow, FC VNII GOChS Publ., 2004, 234 p. (In Russian)
15. Mihajlov V.N. Zatornye javlenija na rekah Rossii [Ice jam phenomena on the rivers of Russia], Vodnye resursy, 1997, vol. 24, no. 3, p. 345–353. (In Russian)
16. Mihajlov V.N., Magrickij D.V. Novoe v issledovanii ekstremalnyh gidrologicheskih protsessov [New approaches in the investigation of extreme hydrological processes], Vestn. Mosk. un-ta, Ser. 5, Geogr., 2011, no. 6, p. 108–109. (In Russian)
17. Pedregosa F., Varoquaux G., Gramfort A. et al. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 2011, vol. 12, no. 85, p. 2825–2830.
18. Prirodnye opasnosti Rossii, t. 5, Gidrometeorologicheskie opasnosti [Natural hazards of Russia, vol. 5, Hydrometeorological hazards], G.S. Golicyn, A.A. Vasil’ev (еds.), Moscow, KRUK Publ., 2001, 480 p. (In Russian)
19. Semenova N.K., Sazonov A.A., Krylenko I.N. [Forecasting the possibility of ice jam formation using machine learning methods], Chetvertye Vinogradovskie chtenija. Gidrologija ot poznanija k mirovozzreniju [Hydrology from understanding to thinking], St Peterburg, 23–31 oktjabrja 2020 g. St Peterburg, Izdatel’stvo VVM Publ., 2020, р. 358–361. (In Russian)
20. Sumachev A.Je., Banshhikova L.S. Ledovyj rezhim reki Pechory v sovremennyh klimaticheskih uslovijah i principy prognozirovanija vysshego urovnja vody za period vesennego ledohoda [Ice regime of the Pechora River under modern climatic conditions and principles of forecasting the highest water level during the spring ice drift], Uspehi sovremennogo estestvoznanija, 2021, no. 10, p. 75–80, DOI: 10.17513/use.37701. (I n Russian)
21. Sumachev A.Je., Banshhikova L.S., Griga S.A. Primenenie metodov obuchenija iskusstvennyh nejronnyh setej pri prognozirovanii vysshih urovnej vody na primere rek Dvinsko-Pechorskogo bassejnovogo okruga [Using neural network methods for peak water level prediction: a case study for the Dvina-Pechora basin rivers], Meteorologija i gidrologija, 2024, no. 4, p. 104–115, DOI: 10.52002/0130-2906-2024-4-104-115. (In Russian)
22. Vasilenko N.G., Banshhikova L.S. Operativnaja ocenka uchastkov obrazovanija zatorov l’da i ih parametrov [Operational Evaluation of Ice Jams Origin Sites and Their Parameters], Led i sneg., 2010, no. 2(110), p. 61–65. (In Russian)
23. Wang H., Meng Y., Xu H. et al. Prediction of flood risk levels of urban flooded points through using machine learning with unbalanced data, J. Hydrol., 2024, vol. 630, p. 130742, DOI: 10.1016/j.jhydrol.2024.130742.
24. Wang T., Guo X., Fu H. et al. Breakup ice jam forecasting based on neural network theory and formation factor, E-proceedings of the 38<sup>th</sup> IAHR World Congress, Panama City, September 1–6, 2019, Panama, IAHR, 2019, p. 2488, DOI: 10.3850/38WC092019-0641.
25. Wu Y., Ding Y., Feng J. SMOTE-Boost-based sparse Bayesian model for flood prediction, J. Wireless Com. Network, 2020, vol. 78, DOI: 10.1186/s13638-020-01689-2.
26. Wu Y., Yukai D., Feng J. Sparse Bayesian flood forecasting model based on SMOTEBoost, 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Atlanta, GA, USA, 2019, p. 279–284, DOI: 10.1109/iThings/GreenCom/CPSCom/SmartData.2019.00067.
27. Arhiv pogody v Narian-Mare [Weather archive in Narian-Mar] (In Russian), URL: http://www.pogodaiklimat.ru/weather.php?id=23205 (access date 01. 06. 2024).
28. Catboost – high-performance open source library for gradient boosting on decision trees, URL: https://catboost.ai/ (access date: 02. 08. 2024).
29. Imblearn: library, URL: https://imbalanced-learn.org/ (access date 02. 08. 2024).
30. Optuna: a hyperparameter optimization framework, URL: https://optuna.readthedocs.io/ (access date 02. 08. 2024).
31. Sklearn: Machine Learning in Python, URL: https://scikitlearn.org/stable/ (access date 02. 08. 2024).
32. Github, URL: https://github.com/SergeyIglin/ML_rarehy-droevents_with_smote (access date 16. 10. 2024).
Review
For citations:
Iglin S.M., Moreido V.M., Golovnin K.I. Forecasting rare hydrological events by machine learning methods: case study of ice jams on the Pechora river. Lomonosov Geography Journal. 2025;(1):87-97. (In Russ.) https://doi.org/10.55959/MSU0579-9414.5.80.1.6