
๐ Summary Ultralytics v8.4.40 introduces per-image precision/recall/F1 tracking during validation (led by PR #24089 from @Laughing-q), making it much easier to see exactly which images your model handles well or poorly. ๐๐ผ๏ธ ๐ Key Changes New per-image validation metrics added to results: precision, recall, f1, tp, fp, fn for each image. Exposed via metrics.box.image_metrics (and also for seg and pose where applicable). โ Detection validation pipeline updated to store image name and compute image-level stats consistently with validation matching logic. ๐ Distributed (multi-GPU) validation support now gathers and merges image_metrics correctly across ranks, so results remain complete in larger training setups. ๐ง โ๏ธ Metrics classes extended with: image_metrics storage update helpers clear/reset helpers to prevent stale metrics between runs. Docs updated across validation/task guides (detect, segment, pose, OBB, insights, custom trainer) with examples showing how to access per-image metrics. ๐ Version bump: 8.4.39 โ 8.4.40 ๐ ๐ฏ Purpose & Impact Faster debugging of weak samples: You can now pinpoint problematic images directly instead of relying only on dataset-wide averages. ๐ฏ Better dataset curation: Find images causing high false positives/false negatives and decide whether to relabel, augment, or rebalance. ๐งน More actionable model evaluation: Teams get practical, image-level insight for error analysis and iterative improvement. ๐ Reliable at scale: Works cleanly in multi-GPU validation, so enterprise and research workflows benefit too. ๐๏ธ Broad usability: Useful for both beginners and advanced users working with YOLO models, especially YOLO26 validation workflows. ๐ค What's Changed ultralytics 8.4.40 Per-image Precision and Recall by @Laughing-q in https://github.com/ultralytics/ultralytics/pull/24089 Full Changelog: https://github.com/ultralytics/ultralytics/compare/v8.4.39...v8.4.40
