Skip to content

Commit

Permalink
Add ResNet-18 and ResNet-50 backbones, and add VOC 2007 Cat Dog
Browse files Browse the repository at this point in the history
… dataset
  • Loading branch information
potterhsu committed Dec 28, 2018
1 parent eb11069 commit 9f763ed
Show file tree
Hide file tree
Showing 5 changed files with 274 additions and 29 deletions.
202 changes: 175 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@ An easy implementation of Faster R-CNN in PyTorch.

* PASCAL VOC 2007

* Train: 2007 trainval (5011 samples)
* Eval: 2007 test (4952 samples)
* Train: 2007 trainval (5011 images)
* Eval: 2007 test (4952 images)

<table>
<tr>
Expand Down Expand Up @@ -168,6 +168,52 @@ An easy implementation of Faster R-CNN in PyTorch.
<td>0.1</td>
<td>70000</td>
</tr>
<tr>
<td>Ours</td>
<td>ResNet-18</td>
<td>GTX 1080 Ti</td>
<td>~ 19.4</td>
<td>~ 38.7</td>
<td>0.6783</td>
<td>600</td>
<td>1000</td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[128, 256, 512]</td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>300</td>
<td>0.001</td>
<td>0.9</td>
<td>0.0005</td>
<td>50000</td>
<td>0.1</td>
<td>70000</td>
</tr>
<tr>
<td>Ours</td>
<td>ResNet-50</td>
<td>GTX 1080 Ti</td>
<td>~ 8.7</td>
<td>~ 22.4</td>
<td>0.7402</td>
<td>600</td>
<td>1000</td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[128, 256, 512]</td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>300</td>
<td>0.001</td>
<td>0.9</td>
<td>0.0005</td>
<td>50000</td>
<td>0.1</td>
<td>70000</td>
</tr>
<tr>
<td>ruotianluo/pytorch-faster-rcnn</td>
<td>ResNet-101</td>
Expand Down Expand Up @@ -222,7 +268,7 @@ An easy implementation of Faster R-CNN in PyTorch.
</td>
<td>ResNet-101</td>
<td>GTX 1080 Ti</td>
<td>~ 6.3</td>
<td>5 ~ 6</td>
<td>~ 11.8</td>
<td>0.7538</td>
<td>600</td>
Expand All @@ -247,8 +293,8 @@ An easy implementation of Faster R-CNN in PyTorch.
* MS COCO 2017

* Train: 2017 Train = 2015 Train + 2015 Val - 2015 Val Sample 5k (117266 samples)
* Eval: 2017 Val = 2015 Val Sample 5k (formerly known as `minival`) (4952 samples)
* Train: 2017 Train = 2015 Train + 2015 Val - 2015 Val Sample 5k (117266 images)
* Eval: 2017 Val = 2015 Val Sample 5k (formerly known as `minival`) (4952 images)

<table>
<tr>
Expand Down Expand Up @@ -331,21 +377,21 @@ An easy implementation of Faster R-CNN in PyTorch.
<td>~ 5.1</td>
<td>~ 8.9</td>
<td>0.287</td>
<td>800</td>
<td>1333</td>
<td><b>800</b></td>
<td><b>1333</b></td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[64, 128, 256, 512]</td>
<td><b>[64, 128, 256, 512]</b></td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>1000</td>
<td><b>1000</b></td>
<td>0.001</td>
<td>0.9</td>
<td>0.0001</td>
<td>900000</td>
<td><b>0.0001</b></td>
<td><b>900000</b></td>
<td>0.1</td>
<td>1200000</td>
<td><b>1200000</b></td>
</tr>
<tr>
<td>ruotianluo/pytorch-faster-rcnn</td>
Expand Down Expand Up @@ -404,21 +450,21 @@ An easy implementation of Faster R-CNN in PyTorch.
<td>~ 4.7</td>
<td>~ 7.8</td>
<td>0.352</td>
<td>800</td>
<td>1333</td>
<td><b>800</b></td>
<td><b>1333</b></td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[64, 128, 256, 512]</td>
<td><b>[64, 128, 256, 512]</b></td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>1000</td>
<td><b>1000</b></td>
<td>0.001</td>
<td>0.9</td>
<td>0.0001</td>
<td>900000</td>
<td><b>0.0001</b></td>
<td><b>900000</b></td>
<td>0.1</td>
<td>1200000</td>
<td><b>1200000</b></td>
</tr>
<tr>
<td>
Expand All @@ -431,26 +477,128 @@ An easy implementation of Faster R-CNN in PyTorch.
<td>~ 4.5</td>
<td>~ 7.5</td>
<td>0.358</td>
<td>800</td>
<td>1333</td>
<td><b>800</b></td>
<td><b>1333</b></td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[32, 64, 128, 256, 512]</td>
<td><b>[32, 64, 128, 256, 512]</b></td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>1000</td>
<td><b>1000</b></td>
<td>0.001</td>
<td>0.9</td>
<td>0.0001</td>
<td>900000</td>
<td><b>0.0001</b></td>
<td><b>900000</b></td>
<td>0.1</td>
<td>1200000</td>
<td><b>1200000</b></td>
</tr>
</table>

> Scroll to right for more configurations
* PASCAL VOC 2007 Cat Dog

* Train: 2007 trainval drops categories other than cat and dog (750 images)
* Eval: 2007 test drops categories other than cat and dog (728 images)

<table>
<tr>
<th>Implementation</th>
<th>Backbone</th>
<th>GPU</th>
<th>Training Speed (FPS)</th>
<th>Inference Speed (FPS)</th>
<th>mAP</th>
<th>image_min_side</th>
<th>image_max_side</th>
<th>anchor_ratios</th>
<th>anchor_sizes</th>
<th>pooling_mode</th>
<th>train_pre_rpn_nms_top_n</th>
<th>train_post_rpn_nms_top_n</th>
<th>eval_pre_rpn_nms_top_n</th>
<th>eval_post_rpn_nms_top_n</th>
<th>learning_rate</th>
<th>momentum</th>
<th>weight_decay</th>
<th>step_lr_size</th>
<th>step_lr_gamma</th>
<th>num_steps_to_finish</th>
</tr>
<tr>
<td>Ours</td>
<td>ResNet-18</td>
<td>GTX 1080 Ti</td>
<td>~ 19.4</td>
<td>~ 56.2</td>
<td>0.3776</td>
<td>600</td>
<td>1000</td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[128, 256, 512]</td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>300</td>
<td>0.001</td>
<td>0.9</td>
<td>0.0005</td>
<td><b>700</b></td>
<td>0.1</td>
<td><b>1000</b></td>
</tr>
<tr>
<td>Ours</td>
<td>ResNet-18</td>
<td>GTX 1080 Ti</td>
<td>~ 19.4</td>
<td>~ 56.2</td>
<td>0.6175</td>
<td>600</td>
<td>1000</td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[128, 256, 512]</td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>300</td>
<td>0.001</td>
<td>0.9</td>
<td>0.0005</td>
<td><b>2000</b></td>
<td>0.1</td>
<td><b>3000</b></td>
</tr>
<tr>
<td>Ours</td>
<td>ResNet-18</td>
<td>GTX 1080 Ti</td>
<td>~ 19.4</td>
<td>~ 56.2</td>
<td>0.7639</td>
<td>600</td>
<td>1000</td>
<td>[(1, 2), (1, 1), (2, 1)]</td>
<td>[128, 256, 512]</td>
<td>align</td>
<td>12000</td>
<td>2000</td>
<td>6000</td>
<td>300</td>
<td>0.001</td>
<td>0.9</td>
<td>0.0005</td>
<td><b>7000</b></td>
<td>0.1</td>
<td><b>10000</b></td>
</tr>
</table>

> Scroll to right for more configurations

## Requirements

Expand Down
8 changes: 7 additions & 1 deletion backbone/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,19 @@

class Base(object):

OPTIONS = ['vgg16', 'resnet101']
OPTIONS = ['vgg16', 'resnet18', 'resnet50', 'resnet101']

@staticmethod
def from_name(name: str) -> Type['Base']:
if name == 'vgg16':
from backbone.vgg16 import Vgg16
return Vgg16
elif name == 'resnet18':
from backbone.resnet18 import ResNet18
return ResNet18
elif name == 'resnet50':
from backbone.resnet50 import ResNet50
return ResNet50
elif name == 'resnet101':
from backbone.resnet101 import ResNet101
return ResNet101
Expand Down
44 changes: 44 additions & 0 deletions backbone/resnet18.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from typing import Tuple, Callable

import torchvision
from torch import nn, Tensor
from torch.nn import functional as F

import backbone.base


class ResNet18(backbone.base.Base):

def __init__(self, pretrained: bool):
super().__init__(pretrained)

def features(self) -> Tuple[nn.Module, Callable[[Tensor], Tensor], nn.Module, Callable[[Tensor], Tensor], int, int]:
resnet18 = torchvision.models.resnet18(pretrained=self._pretrained)

# list(resnet18.children()) consists of following modules
# [0] = Conv2d, [1] = BatchNorm2d, [2] = ReLU, [3] = MaxPool2d,
# [4] = Sequential(Bottleneck...), [5] = Sequential(Bottleneck...),
# [6] = Sequential(Bottleneck...), [7] = Sequential(Bottleneck...),
# [8] = AvgPool2d, [9] = Linear
children = list(resnet18.children())
features = children[:-3]
num_features_out = 256

hidden = children[-3]
num_hidden_out = 512

for parameters in [feature.parameters() for i, feature in enumerate(features) if i <= 4]:
for parameter in parameters:
parameter.requires_grad = False

features = nn.Sequential(*features)

return features, self.pool_handler, hidden, self.hidden_handler, num_features_out, num_hidden_out

def pool_handler(self, pool: Tensor) -> Tensor:
return pool

def hidden_handler(self, hidden: Tensor) -> Tensor:
hidden = F.adaptive_max_pool2d(input=hidden, output_size=1)
hidden = hidden.view(hidden.shape[0], -1)
return hidden
44 changes: 44 additions & 0 deletions backbone/resnet50.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from typing import Tuple, Callable

import torchvision
from torch import nn, Tensor
from torch.nn import functional as F

import backbone.base


class ResNet50(backbone.base.Base):

def __init__(self, pretrained: bool):
super().__init__(pretrained)

def features(self) -> Tuple[nn.Module, Callable[[Tensor], Tensor], nn.Module, Callable[[Tensor], Tensor], int, int]:
resnet50 = torchvision.models.resnet50(pretrained=self._pretrained)

# list(resnet50.children()) consists of following modules
# [0] = Conv2d, [1] = BatchNorm2d, [2] = ReLU, [3] = MaxPool2d,
# [4] = Sequential(Bottleneck...), [5] = Sequential(Bottleneck...),
# [6] = Sequential(Bottleneck...), [7] = Sequential(Bottleneck...),
# [8] = AvgPool2d, [9] = Linear
children = list(resnet50.children())
features = children[:-3]
num_features_out = 1024

hidden = children[-3]
num_hidden_out = 2048

for parameters in [feature.parameters() for i, feature in enumerate(features) if i <= 4]:
for parameter in parameters:
parameter.requires_grad = False

features = nn.Sequential(*features)

return features, self.pool_handler, hidden, self.hidden_handler, num_features_out, num_hidden_out

def pool_handler(self, pool: Tensor) -> Tensor:
return pool

def hidden_handler(self, hidden: Tensor) -> Tensor:
hidden = F.adaptive_max_pool2d(input=hidden, output_size=1)
hidden = hidden.view(hidden.shape[0], -1)
return hidden
Loading

0 comments on commit 9f763ed

Please # to comment.