Introduction

If you're training models for object detection, you can interactively visualize bounding boxes in Weights & Biases. This short demo focuses on driving scenes, testing a YoloV3 net pretrained on MSCOCO on images from the Berkeley Deep Drive 100K dataset. The API for logging bounding boxes is flexible and intuitive. Below, I explain the interaction controls for this tool and a few ways you might use it to analyze your models.

This approach can help in object detection on many other kinds of images, from microscope slides to x-rays to satellite and beyond. You can read more about understanding driving scenes in this report and more about the Lyft's self-driving car dataset in this report

High-level view: Many examples on validation data

Logging a few validation images per run gives you an overall sense of the model's performance. This model does pretty well, especially considering the dataset transfer from MSCOCO to Berkeley Deep Drive without finetuning. You can use the controls to turn each class label visualization on or off and focus on the most relevant predictions. One issue with the current model is that it often labels larger cars (like vans or SUVs) as both "truck" and "car"—this might be easier to notice if you toggle the red "truck" and blue "car" labels on and off. Another pattern is that in some images, detection boxes are systematically lower than the ground truth. If you toggle the blue "car" label, you may notice that the bounding box sometimes leaves out the top parts of the car. This may be an issue with tuning anchor boxes on MSCOCO versus BDD. Reducing the stride between anchor boxes, increasing their total number, or learning a set specific to the BDD data and aspect ratio may help.

Zooming in: Different classes in a specific model

Setting up a full-screen media panel lets you see the details of individual photos (scroll down in the panel for more). Now it's easier to read the numerical scores and confirm that visually smaller/more distant objects tend to have lower confidence scores and notice what kinds of mistakes the model tends to make, like missing the vans in the first example. On the bright side, when focusing on the third photo, we can see that the overlapping truck/car detection might be not be a concern: the truck confidence score is only 48.96, while for the car (which we can read if we turn off the red "truck" label) it's 96.4. Also, in the very last image, we can find a bizarre false positive "train" on the left hand side.

Controls

If you click on the Settings icon in the top left corner of a media panel, you will see this pop-up for interacting with the images: Screen Shot 2020-04-27 at 10.02.27 AM.png

Code

You can find the full API documentation here. It enables flexible logging in many configurations:

Here is the logging code I use in this report, given three lists returned by the pretrained YoloV3 model, the filename of the input validation image, and its width & height. My pretrained YoloV3 code returns three lists:

# this is the order in which my classes will be displayed
display_ids = {"car" : 0, "truck" : 1, "person" : 2, "traffic light" : 3, "stop sign" : 4,
               "bus" : 5, "bicycle": 6, "motorbike" : 7, "parking meter" : 8, "bench": 9,
               "fire hydrant" : 10, "aeroplane" : 11, "boat" : 12, "train": 13}
# this is a revese map of the integer class id to the string class label
class_id_to_label = { int(v) : k for k, v in display_ids.items()}

def bounding_boxes(filename, v_boxes, v_labels, v_scores, log_width, log_height):
    # load raw input photo
    raw_image = load_img(filename, target_size=(log_height, log_width))
    all_boxes = []
    # plot each bounding box for this image
    for b_i, box in enumerate(v_boxes):
        # get coordinates and labels
        box_data = {"position" : {
          "minX" : box.xmin,
          "maxX" : box.xmax,
          "minY" : box.ymin,
          "maxY" : box.ymax},
          "class_id" : display_ids[v_labels[b_i]],
          # optionally caption each box with its class and score
          "box_caption" : "%s (%.3f)" % (v_labels[b_i], v_scores[b_i]),
          "domain" : "pixel",
          "scores" : { "score" : v_scores[b_i] }}
        all_boxes.append(box_data)

    # log to wandb: raw image, predictions, and dictionary of class labels for each class id
    box_image = wandb.Image(raw_image, boxes = {"predictions": {"box_data": all_boxes, "class_labels" : class_id_to_label}})
    return box_image