Section 1

Initial search

We are going to use a U-net style network.

The main idea is to replace the encoder part with a ResNet which are efficient at recognizing features. It will also allow us to use pre-trained networks for the encoder part.

We start by running several variants of the architecture on smaller images (256 x 256) to have a first idea of what works well:

We then look at all the runs and try to group them to understand which ones are the most valuable.

Conclusions

A few conclusions appear from this first batch of runs:

Finally, it seems that a reasonable target to achieve when using larger images will be 0.9 accuracy

Featured Report

This report is a saved snapshot of Boris' research. He's published this example so you can see how to use W&B to visualize training and keep track of your work. Feel free to add a visualization, click on graphs and data, and play with features. Your edits won't overwrite his work.

Project Description

Boris uses various approaches to parse street scenes. He's using a U-shaped network and varying the encoders, weight decay, learning rate, pre-training approach, and more.

Section 2

Mid-size images

We train a few models keeping the correct image ratio but with a reduced size of 320 x 180.

However the results are not as good, showing that:

Let's go to a higher resolution!

Section 3

Final model

For our final model, we use images of 640 x 360. Our goal is to get more than 90% accuracy.

While we quickly reach 89% accuracy, it is very difficult to get to 90%.

We finally succeed in getting above 90% mainly by adjusting the learning rate (both with ResNet 34 & ResNet 18). While those runs are longer, we actually reach early our target and never improve much.

A minor improvement (<1%) is also made by running the training in 2 phases:

Section 4

Sample predictions

The quality of the predictions changes a lot with a barely 1% difference!