Fully Convolutional Neural Networks for Classification, Detection

1
Fully Convolutional Neural Networks for
Classification, Detection & Segmentation
or, all your computer wanted to know about horses
Iasonas Kokkinos
Ecole Centrale Paris / INRIA Saclay
& G. Papandreou, P.-A. Savalle, S. Tsogkas,
L-C Chen, K. Murphy, A. Yuille, A. Vedaldi
2
Fully convolutional neural networks
convolutional
fully connected
3
Fully convolutional neural networks
convolutional
Fully connected layers: 1x1 spatial convolution kernels
Allows network to process images of arbitrary size
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus and Y. LeCun, OverFeat, ICLR, 2014
M. Oquab, L. Bottou, I. Laptev, J. Sivic, Weakly Supervised Object Recognition with CNNs, TR2014
J. Long, E. Shelhamer, T. Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR 15
4
Fully convolutional neural networks
FCNN
5
Fully convolutional neural networks
FCNN
6
Fully convolutional neural networks
FCNN
7
Fully convolutional neural networks
FCNN
8
Fully convolutional neural networks
FCNN
Fast (shared convolutions)
Simple (dense)
9
Part 1: FCNNs for classification & detection (CVPR’15)
G. Papandreou, TTI /Google
P.-A. Savalle ECP /CISCO
Part 2: FCNNs for semantic segmentation (ICLR 15?)
G. Papandreou, TTI /Google
L-C. Chen, UCLA K. Murphy, Google A. Yuille, UCLA
Part 2.5: FCNNs for part segmentation (on-going)
G. Papandreou, TTI /Google
S. Tsogkas ECP
A. Vedaldi, Oxford
10
Part 1: FCNNs for classification & detection
G. Papandreou
P.-A. Savalle
G. Papandreou, I. Kokkinos and P. A. Savalle,
Untangling Local and Global Deformations in Deep Convolutional Networks for Image
Classification and Sliding Window Detection, arXiv:1412.0296, 2014 & CVPR 2015
11
Category-dependent
Scale-Invariant classification
Scale-dependent
x ! {xs1 , . . . , xsK }
F (x) ! {F (xs1 ), . . . , F (xsK )}
12
Scale Invariant classification
Object’s scale
Poor discriminative power
Invariant classifier
Classifier mixture
Smaller training sets
Scale-tuned classifier
Requires normalized data
x ! {xs1 , . . . , xsK }
MIL: ‘bag’ of features
F (x) ! {F (xs1 ), . . . , F (xsK )}
K
X
1
F 0 (x) =
F (xsk )
K
This work:
F 0 (x) = max F (xsk )
k
k=1
A. Howard. Some improvements on deep convolutional neural network based image classification, 2013.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition, 2014.
13
Multiple Instance Learning via Max-Pooling
220x220x3
I(x,y)
pyramid
1x1x20
stich
I(x,y,s)
FCNN
Patchwork(x,y)
F(x,y)
Max-Pooling
End to end training!
Class Score
F 0 (x) = max F (xsk )
k
Baseline: maxepitomic DCNN epitomic DCNN + MIL
pooled net ~1% gain
~2% gain
13.0%
11.9%
10.00%
Top-5 error. All DCNNs have 6 convolutional and 2 fully-connected layers.
14
Towards Object Detection
220x220x3
I(x,y)
pyramid
stich
I(x,y,s)
FCNN
Patchwork(x,y)
Search over position and scale: done!
Missing: aspect ratio
1x1x20
F(x,y)
15
‘Squeeze-Invariant’ classification
Hyberbolic mapping
Poor discriminative power
Invariant classifier
Classifier mixture
Smaller training sets
Ratio-tuned classifier
Requires normalized data
Aspect ratio
16
The Greeks did it first: Procrustes
F.L. Bookstein, Morphometric tools for landmark data, Cambridge University Press, (1991).
T.F. Cootes and C.J. Taylor and D.H. Cooper and J. Graham (1995). "Active shape models - their training
and application". Computer Vision and Image Understanding (61): 38–59
Detection on Procrustes’ bed
car
window
17
18
Explicit search over aspect ratio, scale & position
19
Explicit search over aspect ratio, scale & position
20
Explicit search over aspect ratio, scale & position
21
Explicit search over aspect ratio, scale & position
22
Pascal VOC: best sliding-window detector
1st row: us
+ VGG network, 56.4 mAP (6-10 seconds)
2nd row: RCNN + VGG network, 62.2 mAP (60 seconds)
3rd row: RCNN + AlexNet,
54.2 mAP (10 seconds)
4th row: end-to-end DPM
46.9 mAP
End-to-End Integration of a Convolutional Network, Deformable Parts Model and Non-Maximum
Suppression, Li Wan, David Eigen, Rob Fergus, Arxiv 14, CVPR 15
56.4 mAP: first shot
-no hinge loss
-no hard negative mining
-smaller (100x100) inputs, smaller network
lots of room for improvement!
Part 2: FCNNs for semantic segmentation
G. Papandreou
L-C. Chen, UCLA K. Murphy, Google A. Yuille, UCLA
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy and A. Yuille
Semantic Image Segmentation with Deep Convolutional Nets and Fully
Connected CRFs, http://arxiv.org/abs/1412.7062
23
Semantic segmentation task
24
System outline
J. Long, E. Shelhamer, T. Darrell, FCNNs for Semantic Segmentation, CVPR 15
P. Krähenbühl and V. Koltun, Efficient Inference in Fully Connected CRFs with
Gaussian Edge Potentials, NIPS 2011
25
26
Repurposing DCNNs for semantic segmentation
● Accelerate CNN evaluation by ‘hard dropout’ & finetuning
● In VGG: Subsample first FC layer 7x7 → 3x3
● Decrease score map stride (32->8) with ‘atrous’ (w. holes) algorithm
8 FPS
M. Holschneider, et al, A real-time algorithm for signal analysis with the help of the
wavelet transform, Wavelets, Time-Frequency Methods and Phase Space, 1989.
FCNN-DCRF: Full & densely connected
FCNN-based
labelling
from denselyconnected CRF
● 
Large CNN receptive field:
+ good accuracy
- worse performance near boundaries
●  Dense CRF: sharpen boundaries using image-based info
P. Krähenbühl and V. Koltun, Efficient Inference in Fully Connected CRFs with Gaussian Edge
Potentials, NIPS 2011
27
28
Indicative Results
Raw score maps
After dense CRF
29
Indicative Results
Raw score maps
After dense CRF
30
Indicative Results
Raw score maps
After dense CRF
31
Indicative Results
Raw score maps
After dense CRF
32
Improvements due to fully-connected CRF
Improvements due to Dense CRF
Krahenbuhl et. al. (TextonBoost unaries)
27.6 -> 29.1 (+1.5)
Our work (FCNN unaries)
61.3 -> 65.21 (+3.9)
33
Comparisons to Fully Convolutional Net
Ground-truth
FCN-8
Our work
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic
segmentation. arXiv:1411.4038, 2014.
Comparisons to TTI-Zoomout system
Ground-truth
TTI-Zoomout
Our work
M. Mostajabi, P. Yadollahpour, and G. Shakhnarovich. Feedforward semantic
segmentation with zoom-out features. arXiv:1412.0774, 2014
34
35
Comparison to state-of-the-art (Pascal VOC test)
Pre-CNN:
Up to 50%
G. Papandreou, et al, Weakly- and
Semi-Supervised Learning of a DCNN
for Semantic Image Segmentation,
arxiv 2015
CNN:
60-64%
CNN + CRF:
>67%
Pascal Train:
67%
Coco + Pascal
71%
Part 2.5: FCNNs for part segmentation (on-going)36
S. Tsogkas
G. Papandreou
A. Vedaldi
37
Part Segmentation data
• AeroplanOID
• PASCAL-Part
A. Vedaldi, S. Mahendran, S. Tsogkas, S. Maji, B. Girshick, J. Kannala, E. Rahtu, I. Kokkinos, M.
B. Blaschko, D. Weiss, B. Taskar, K. Simonyan, N. Saphra, and S. Mohamed, Understanding
Objects in Detail with Fine-grained Attributes, CVPR, 2014
X. Chen, R. Mottaghi, X. Liu, S. Fidler, R. Urtasun, and A.L. Yuille. Detect What You Can: Detecting
and Representing Objects using Holistic Models and Body Parts. CVPR. 2014
38
Part segmentation pipeline
head
Input image
neck
torso
DCNN
CRF
tail
legs
hooves
Full segmentation
39
Preliminary results
Input
Groundtruth
Our result
Input
Groundtruth
arms
legs
Our result
Thanks!
P.-A. Savalle
L-C. Chen
40
S. Tsogkas
G. Papandreou
K. Murphy
A. Yuille
A. Vedaldi