Abstract
Using large image data sets has been the conventional strategy to improve object detection, but it increases annotation effort and training cost and does not guarantee robust transfer to new sites. Here we quantify the value of a small, diverse training set for floating macroplastic detection by jointly evaluating performance, computational cost, annotation effort, and cross-site transferability. We compile four river-camera data sets from Indonesia, The Netherlands, and Vietnam (training/internal validation) and Italy (external validation, single site and single day data from a long-term camera monitoring system), harmonized into 13 litter classes and a five-level tiering scheme (progressive class aggregation). We train YOLOv7 and YOLOv8 models and compare site-specific data sets with a merged “Mixed” data set (999 images) spanning heterogeneous environmental conditions. Results show that data sets with more diverse backgrounds (Type II; e.g., The Netherlands) achieve higher performance per annotation than homogeneous data sets (Type I; e.g., Indonesia, Vietnam), whereas naïvely merging data sets can degrade internal validation unless accompanied by feature-aware filtering. Class aggregation substantially increases overall detection skill, with gains consistent across data sets when moving from fine (Tier 4) to coarse (Tier 0) label spaces. Finally, internal validation does not reliably predict external-site performance, underscoring the need for transferability-aware data set design and evaluation. Overall, our findings emphasize that data diversity and curation, rather than data set size alone, are key levers to scale river plastic detection toward broader deployment.

Keywords: River plastic detection, image-based algorithms, remote sensing, environmental monitoring, machine learning, data transferability, small datasets
How to cite: Saddi, K. C., van Emmerik, T. H. M., Miglino, D., Poggi, M., Isgr., F., Tasseron, P. F., et al. (2026). Exploring the transferability of image‐based algorithms for river plastic detection: The value of small mixed data sets. Water Resources Research, 62, e2025WR040605. https://doi.org/10.1029/2025WR040605 [pdf]
