In this project, I explore different models to identify plant and animal species from photos. I've trained small convnets and compared fine-tuning existing architectures (Inception V3, ResNet, Inception ResNet V2, Xception) with different freeze layers. I also analyzed performance on higher-level categories (e.g. mammals versus insects versus birds).

This is my log of progress as I try different approaches.

Here's a sample of the images from the iNaturalist dataset:

Fine-tune standard CNNs on small data


3-10 epochs of pre-training & more data helps

Notes Fri Jan 18

Got basic code running: load InceptionV3, add a fc layer on top of configurable size, then an output layer of size num_classes. Pretrain for pe epochs, currently using rmsprop. Freeze everything up to fl layer as untrainable. Train the rest of the layers for e epochs, currently using SGD with momentum and a lower learning rate. These defaults are directly from the Keras tutorial on finetuning InceptionV3 on a new set of classes.

Experiment 1


Finetuning Inception V3

Freeze fewer layers for higher accuracy

Notes Thurs Jan 24

per-class accuracy tbd: crashed, probably asking for way too many batches--consider dividing actual size of validation set in the keras callback

Visualizing differences between freeze layers

Per-class precision with InceptionV3

More data for kingdoms

How well do we perform on each of the 10 classes? Look at per-class precision for different models (5K examples, hence noisy/jagged plots)

Notes Mon Jan 28



Running commands

Per-class precision with InceptionV3

Varying base models

Use InceptionV3

Fine-tuning a well-known, high accuracy convnet (pretrained on ImageNet) is a great strategy for vision tasks, especially for nature photos (very similar to ImageNet). Which base network should we choose?


Visualizing the results of varying base models

Vary freeze layer and base model

Long runs; not much difference

This hyperparameter is theoretically interesting but would take a lot of compute to finetune— stick with InceptionV3 for now.

Results of varying freeze layer and base model