Effective nonpoint source (NPS) pollution management requires frequent water quality monitoring, which is, however, often costly to be implemented in practice. Statistical techniques and machine learning methods allow us to identify and focus on funda...
Effective nonpoint source (NPS) pollution management requires frequent water quality monitoring, which is, however, often costly to be implemented in practice. Statistical techniques and machine learning methods allow us to identify and focus on fundamental environmental variables that have close relationships with NPS pollutants of interest. This study developed surrogate models to predict the concentrations of suspended sediment (SS) and total phosphorus (T-P) from turbidity and runoff discharge rates using multiple linear regression (MLR) and random forest (RF) methods. The RF models provided acceptable performance in predicting SS and T-P, especially when runoff discharge rates were high. The RF models outperformed the MLR models in all the cases. Such finding highlights the potential of RF techniques and models as a tool to identify fundamental environmental variables that are measured in relatively inexpensive ways or freely available but still able to provide information required to quantify the concentrations of NPS pollutants. The analysis of relative importance rates showed that the temporal variations of SS and T-P concentrations could be more effectively explained by that of turbidity than runoff discharge rate. This study demonstrated that the advanced statistical techniques such as machine learning could help to improve the efficiency of NPS pollutants monitoring.