Object Detection in Street Images

 


This group project was for my Deep Learning course. We tuned and trained the latest YOLO (v13) model for street images in order to detect objects such as cars, people, bicycles, traffic lights, and traffic signs. We had limited GPU resources, so our best training results are from using a single GPU for less than 8 hours. Our best model, trained on the BDD100K detection dataset and tuned, is significantly improved from the base model (pre-trained on the COCO dataset). 





Some of the parameters that we changed from the default values are:

  • Batch Size (batch). Larger batch sizes sped up the training and improved accuracy but required more memory. Our best run had an image size 960, a batch size of 8, and 25 epochs.
  • Optimizer. Keeping optimizer='auto' gave us the best mAP values. When optimizer was set to 'auto', it seemed that we couldn't tune the momentum and starting learning rates. 
  • Conf. Confidence Threshold. Increasing it from 0 to 0.1 gave the best results. 
  • IOU. Intersection over union threshold. Decreasing it from 0.7 to 0.6 gave the best results.
  • Warmup bias learning rate. Increasing it from 0 to 0.1 gave the best results.
  • Box. Bounding box loss weight in calculating the total loss function to balance box regression with classification. Decreasing from 7.5 to 0.07 gave the best results. For the purpose of identifying objects for drivers, we are more interested in naming the objects correctly than we are at defining the precise location of them.
  • Mixup. Probability of using augmentation that blends two images. Increasing it from 0 to 0.1 gave the best results.

There is evidence that we could possibly get better results if we do larger batch sizes for more epochs.



No comments:

Post a Comment