๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Computer Vision/๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

[DL ๋…ผ๋ฌธ๋ฆฌ๋ทฐ] ResNet(Deep Residual Learning for Image Recognition) ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ

by Glory_Choi 2023. 9. 20.
๋ฐ˜์‘ํ˜•

๐Ÿ“Œ๋“ค์–ด๊ฐ€๋ฉฐ

๋ณธ ํฌ์ŠคํŒ…์€ ๋”ฅ๋Ÿฌ๋‹ ๋…ผ๋ฌธ์ค‘ Deep Residual Learning for Image Recognition ๋…ผ๋ฌธ์„ ์ž…๋ฌธ์ž ์ž…์žฅ์—์„œ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ๋ฆฌ๋ทฐํ•ฉ๋‹ˆ๋‹ค. ํ˜น์—ฌ ์ž˜๋ชป๋œ ๋ถ€๋ถ„์ด ์žˆ๊ฑฐ๋‚˜ ์งˆ๋ฌธ์ด ์žˆ์œผ์‹œ๋ฉด ๋Œ“๊ธ€ ๋‚จ๊ฒจ์ฃผ์„ธ์š”.

 

๊ฐœ์š”

CNN์—์„œ ๋ ˆ์ด์–ด์˜ ๊นŠ์ด๋Š” ์ค‘์š”ํ•œ ์š”์†Œ๋กœ ์—ฌ๊ฒจ์กŒ๊ณ , ๋ ˆ์ด์–ด์˜ ๊นŠ์ด๊ฐ€ ๊นŠ์€ ๋ชจ๋ธ์ผ ์ˆ˜๋ก ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์™”๋‹ค. ๋•Œ๋ฌธ์— ๋”ฅ๋Ÿฌ๋‹ ์—ฐ๊ตฌ์ž๋“ค์€ ๋„คํŠธ์›Œํฌ์˜ ๊นŠ์ด๋ฅผ ์ค‘์š”ํ•œ ์š”์†Œ๋กœ ์—ฌ๊ฒผ๊ณ  DCNN์€ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜ ๋ถ„์•ผ์—์„œ ํš๊ธฐ์ ์ธ ๋ฐœ์ „์„ ์ด๋Œ์—ˆ๋‹ค. 

ResNetํŒ€์€ ๊นŠ์ด๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ ์‰ฝ๊ฒŒ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€ ์˜๋ฌธ์„ ํ’ˆ๊ณ  ์‹คํ—˜์„ ์ง„ํ–‰ํ•˜๊ฒŒ ๋œ๋‹ค.

 

 

์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” ์œ„ ๊ทธ๋ฆผ๊ณผ ๊ฐ™๋‹ค. ๊นŠ์€ ๋ชจ๋ธ์ผ ์ˆ˜๋ก ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ–๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ–ˆ๋˜ ์˜ˆ์ƒ๊ณผ ๋‹ค๋ฅด๊ฒŒ ๋‹จ์ˆœํžˆ ๊นŠ์ด๋ฅผ ๋Š˜๋ฆฌ๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋Š” ์‰ฝ๊ฒŒ ํ•™์Šต์ด ๋˜์ง€ ์•Š์•˜๋‹ค. ๊ทธ๋ฆผ์—์„œ ๋ณด์ด๋Š” ๊ฒƒ๊ณผ ๊ฐ™์ด 20๊ฐœ layer๋ฅผ ์Œ“์€ ๋ชจ๋ธ๋ณด๋‹ค 56๊ฐœ์˜ layer ์Œ“์€ ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์˜ค์ฐจ์™€ ํ…Œ์ŠคํŠธ ์˜ค์ฐจ๊ฐ€ ๋†’์•˜๊ณ  ์„ฑ๋Šฅ์ด ์ข‹์ง€ ์•Š์•˜๋‹ค.

 

ResNetํŒ€์€ ๊นŠ์€ ๋ชจ๋ธ์ด ์‰ฝ๊ฒŒ ํ•™์Šต ๋˜์ง€ ์•Š๋Š” ๋‘ ๊ฐ€์ง€ ์ด์œ ๋ฅผ ์„ค๋ช…ํ•œ๋‹ค.

  • Convergence Problem
    • ์ด์œ  : ๊ธฐ์šธ๊ธฐ์˜ ์†Œ๋ฉธ/ํญ๋ฐœ๋กœ ์ธํ•˜์—ฌ ๋ฐœ์ƒํ•˜์˜€๋‹ค.
    • ํ•ด๊ฒฐ : ์ •๊ทœ์ดˆ๊ธฐํ™”์™€ ์ค‘๊ฐ„ ์ •๊ทœํ™”์ธต์„ ์ด์šฉํ•˜๋ฉด ํ•ด๊ฒฐ๋˜์—ˆ๋‹ค.
  • Degradation Problem
    • ์ด์œ  : ๊ณผ์ ํ•ฉ์— ์˜ํ•ด ๋ฐœ์ƒ๋˜๋Š” ๋ฌธ์ œ๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์ง€๋งŒ ๊ณผ์ ํ•ฉ์— ์˜ํ•ด ๋ฐœ์ƒํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ์˜€๊ณ , ๋ชจ๋ธ์˜ ์ธต์„ ์ถ”๊ฐ€ํ•˜์—ฌ ๊นŠ์ด๊ฐ€ ์ฆ๊ฐ€ํ•˜๋ฉด ๊ธฐ์šธ๊ธฐ์˜ ์†Œ๋ฉธ/ํญ๋ฐœ์— ์˜ํ•ด ํ›ˆ๋ จ ์˜ค์ฐจ๊ฐ€ ๋†’์•„์ง€๋ฉฐ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•˜์—ฌ ๋ฐœ์ƒํ•˜์˜€๋‹ค.
    • ํ•ด๊ฒฐ : ์ž”์—ฌ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐํ•˜์˜€๋‹ค.

 

๋จผ์ € ๊ฐ„๋‹จํ•˜๊ฒŒ Identity mapping layer๋ฅผ ์ถ”๊ฐ€ํ•ด๋ดค์ง€๋งŒ ์‹คํ—˜์„ ํ†ตํ•ด ์ข‹์€ ํ•ด๊ฒฐ์ฑ…์ด ์•„๋‹ˆ๋ผ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ๋‹ค. ๋•Œ๋ฌธ์— ์Œ“์—ฌ์ง„ ๋ ˆ์ด์–ด๊ฐ€ ๊ทธ ๋‹ค์Œ ๋ ˆ์ด์–ด์— ๋ฐ”๋กœ ์ ํ•ฉ๋˜๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ residual mapping์— ์ ํ•ฉํ•˜๋„๋ก ๋งŒ๋“  ๊ตฌ์กฐ์ธ Deep residual learing framework๋ผ๋Š” ๊ฐœ๋…์ด ๋‚˜์™”๋‹ค.

 

๊ธฐ์กด์˜ mappingํ•˜๋˜ ํ•จ์ˆ˜๊ฐ€ H(x)๋ผ๊ณ  ํ•  ๋•Œ ์ด ๋…ผ๋ฌธ์—์„œ๋Š” ๋น„์„ ํ˜•์ ์ธ layer mapping์ธ F(x)๋ฅผ ์ œ์‹œํ•˜๋Š”๋ฐ F(x) = H(x) - x๋กœ ๊ธฐ์กด H(x)์—์„œ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ x๋ฅผ ๋บ€ ๊ฒƒ์ด๋‹ค. ๋”ฐ๋ผ์„œ H(x) = F(x) + x๊ฐ€ ๋˜๊ณ  ResNetํŒ€์€ ๊ธฐ์กด H(x)๋ฅผ ํ•™์Šตํ•˜๋Š” ๊ฒƒ ๋ณด๋‹ค ์ž”์ฐจํ•จ์ˆ˜ H(x) = F(x) + x๊ฐ€ ํ•™์Šต์— ๋” ์šฉ์ดํ•˜๋‹ค๊ณ  ์ œ์‹œํ•œ๋‹ค.

 

์‹ F(x) + x๋Š” Shortcut Connection๊ณผ ๋™์ผํ•˜๊ณ  Shortcut Connection์€ ํ•˜๋‚˜ ๋˜๋Š” ์ด์ƒ์˜ ๋ ˆ์ด์–ด๋ฅผ skipํ•˜๋Š” ํ˜•ํƒœ์ด๋ฉฐ ์ถ”๊ฐ€์ ์ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋„ ํ•„์š”ํ•˜์ง€ ์•Š๊ณ  ๋ณต์žกํ•œ ๊ณฑ์…ˆ ์—ฐ์‚ฐ๋„ ํ•„์š”ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์žฅ์ ์ด๋‹ค.

 

๋‹ค์‹œ ํ•œ๋ฒˆ ์„ค๋ช…ํ•˜๋ฉด x๋Š” ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์ด๊ณ  ๋ชจ๋ธ์„ ๊ฑฐ์ณ ์—ฐ์‚ฐ์ด ๋˜๋ฉด F(x)(weight layer์™€ relu๋“ฑ)์„ ํ†ต๊ณผ ํ•˜๊ฒŒ ๋˜์–ด F(x)๊ฐ€ ๋˜๋Š”๋ฐ ์ด๋•Œ ๊ธฐ์กด ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์ธ x(identity)๊ฐ€ ๋”ํ•ด์ ธ ์ถœ๋ ฅ์œผ๋กœ F(x) + x๊ฐ€ ๋‚˜์˜ค๋Š” ๊ตฌ์กฐ๋‹ค.

 

Deep Residual Learning

๊ธฐ์กด DCNN์—์„œ๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ํ•ด๋‹นํ•˜๋Š” x๋ฅผ ์กฐ๊ธˆ์”ฉ ๋ณ€ํ™”์‹œ์ผœ ์›ํ•˜๋Š” ์ถœ๋ ฅ H(x)๋ฅผ ๋งŒ๋“œ๋Š” ํ•จ์ˆ˜๋ฅผ ํ•™์Šตํ•œ๋‹ค. ํ•˜์ง€๋งŒ ResNet์—์„œ๋Š” H(x) = F(x) + x์—์„œ ์ถ”๊ฐ€ ํ•™์Šต๋Ÿ‰์— ํ•ด๋‹นํ•˜๋Š” F(x) = H(x) – x๊ฐ€ ์ตœ์†Œ๊ฐ’(0)์ด ๋˜๋„๋ก ํ•™์Šต์ด ์ง„ํ–‰ ๋œ๋‹ค. 

์‹(1)

์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ์ฐจ์›์ด ๊ฐ™๋‹ค๋ฉด ์‹(1)์„ ์‚ฌ์šฉํ•˜๊ณ 

์‹(2)

๋งŒ์•ฝ ๋‹ค๋ฅด๋‹ค๋ฉด Ws๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋‹จ์ˆœํžˆ ์ฐจ์›์„ ๋งž์ถฐ์ค€๋‹ค.

์œ„ ๊ทธ๋ฆผ์„ ๋ณด๋ฉด VGG19, 34-layer plain, 34-layer residual์˜ ๊ตฌ์กฐ๋ฅผ ์•Œ ์ˆ˜ ์žˆ๋‹ค. ์—ฌ๊ธฐ์„œ plain์€ VGG๋„ท์— ์˜๊ฐ์„ ๋ฐ›์•„ VGG๋ณด๋‹ค filters๊ฐ€ ์ ๊ณ  18% ์ ์€ ๋ณต์žก๋„๋ฅผ ๊ฐ–๋Š” ๋ชจ๋ธ์„ resnet๊ณผ ๋น„๊ต๋ฅผ ์œ„ํ•ด ์„ค๊ณ„๋œ ๋ชจ๋ธ์ด๋‹ค.

 

Deeper Bottleneck Architetures

์™ผ์ชฝ : buliding block, ์˜ค๋ฅธ์ชฝ bottleneck block

๋” ๊นŠ์€ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ตฌ์„ฑ ํ•  ๋•Œ building block์„ modifyํ•˜์—ฌ Bottleneck block์„ ๋งŒ๋“ค์—ˆ๋‹ค. Bottleneck Architectures๋ฅผ ์‚ฌ์šฉ ํ–ˆ์„ ๋•Œ Identity shortcuts์ด ๋” ํšจ๊ณผ์ ์ด๋ฉฐ resnet 34 layer์™€ resnet 50 layer๋ฅผ ๋น„๊ตํ–ˆ์„ ๋•Œ 50 layer๊ฐ€ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜์™€ ๊ณ„์‚ฐ ์†๋„์—์„œ ๊ฑฐ์˜ ์ฐจ์ด๊ฐ€ ๋‚˜์ง€ ์•Š๋Š”๋‹ค. 

๋˜ํ•œ 152-layer resnet์ด VGG๋ณด๋‹ค ์ž‘์€ ๋ณต์žก์„ฑ๊ณผ ์ ์€ ์—ฐ์‚ฐ์„ ๊ฐ€์ ธ์„œ ์œ ์˜๋ฏธํ•˜๋‹ค. ๋˜ํ•œ 50/101/151 layer๋Š” 34-layer๋ณด๋‹ค ํ™•์‹คํžˆ ๋” ์ •ํ™•ํ•˜๋‹ค.

 

๊ตฌํ˜„

  • image resized 224 * 224
  • Batch normalization BN ์‚ฌ์šฉ
  • Initialize Weights
  • SGD, mini batch 256
  • Learning rate 0.1
  • Iteration 60 * 10^4
  • weight decay 0.0001, momentum 0.9
  • No dropout

 

๊ฒฐ๋ก 

์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์…‹์—์„œ ํ…Œ์ŠคํŠธํ•œ ๋‚ด์šฉ์„ ๋ฐํžˆ๋ฉฐ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต๊ฐœํ•˜๊ณ  ์žˆ๋‹ค.

 

18-layer ResNet ๋ณด๋‹ค 34-layer ResNet์ด ๋” ๋‚ฎ์€ training error๋ฅผ ๊ฐ€์ง€๋ฉฐ ์„ฑ๋Šฅ์ด ์ข‹๋‹ค. 18-layer๋“ค ๋ผ๋ฆฌ ๋น„๊ตํ–ˆ์„ ๋•Œ accruracy๋Š” ๋น„์Šทํ–ˆ์ง€๋งŒ 18-layer ResNet์ด ๋” ์ˆ˜๋ ด์ด ๋นจ๋ž๋‹ค. (converges faster)

degradation ๋ฌธ์ œ๊ฐ€ ์ž˜ ํ•ด๊ฒฐ๋˜์—ˆ์œผ๋ฉฐ, depth๊ฐ€ ์ฆ๊ฐ€ํ•˜๋”๋ผ๋„ ์ข‹์€ ์ •ํ™•๋„๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Œ์„ ์˜๋ฏธํ•œ๋‹ค.

 

resnet34๋ฅผ ์„ธ๊ฐ€์ง€ ์˜ต์…˜์œผ๋กœ ๊ตฌ๋ถ„ํ•˜์˜€๋‹ค.

 

A) zero-padding shortcut์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ. (dimension matching์‹œ์— ์‚ฌ์šฉ) ์ด๋•Œ, ๋ชจ๋“  shortcut์€ parameter-free ํ•˜๋‹ค.

B) projection shortcut์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ. (dimension์„ ํ‚ค์›Œ์ค„ ๋•Œ์—๋งŒ ์‚ฌ์šฉ) ๋‹ค๋ฅธ ๋ชจ๋“  shortcut์€ identity ํ•˜๋‹ค.

C) ๋ชจ๋“  shortcut์œผ๋กœ projection shortcut์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ.

 

์ด๋•Œ, 3๊ฐ€์ง€ ์˜ต์…˜ ๋ชจ๋‘ plain model๋ณด๋‹ค ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€๊ณ , ๊ทธ ์ˆœ์œ„๋Š” A < B < C ์ˆœ์ด์˜€๋‹ค. ๋จผ์ € A < B๋Š” zero-padded ์ฐจ์›์ด residual learning์„ ์ˆ˜ํ–‰ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์ด๊ณ , B < C๋Š” projection shortcut์— ์˜ํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ์ถ”๊ฐ€๋˜์—ˆ๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

3๊ฐ€์ง€ ์˜ต์…˜์˜ ์„ฑ๋Šฅ์ฐจ๊ฐ€ ๋ฏธ๋ฏธํ–ˆ๊ธฐ์— projection shortuct์ด degradation ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋Š”๋ฐ ํ•„์ˆ˜์ ์ด์ง€๋Š” ์•Š๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋”ฐ๋ผ์„œ memory / time complexity์™€ model size๋ฅผ ์ค„์ด๊ธฐ ์œ„ํ•ด ์ด ๋…ผ๋ฌธ์—์„œ๋Š” C ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š๋Š”๋‹ค. ํŠนํžˆ Identity shortcut์€ bottleneck ๊ตฌ์กฐ์˜ ๋ณต์žก์„ฑ์„ ๋†’์ด์ง€ ์•Š๋Š” ๋ฐ์— ๋งค์šฐ ์ค‘์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

 

 

34-layer ResNet์˜ 2-layer block์„ 3-layer bottleneck block์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ 50-layer ResNet, 101-layer ๋ฐ 152-layer ResNet์„ ๊ตฌ์„ฑํ•˜์˜€๋‹ค. depth๊ฐ€ ์ƒ๋‹นํžˆ ์ฆ๊ฐ€ํ•˜์˜€์Œ์—๋„ VGG-16 / 19 ๋ชจ๋ธ๋ณด๋‹ค ๋” ๋‚ฎ์€ ๋ณต์žก์„ฑ์„ ๊ฐ€์กŒ์œผ๋ฉฐ, degradation ๋ฌธ์ œ์—†์ด ์ƒ๋‹นํžˆ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋ณด์˜€๋‹ค. ์ด๋•Œ, dimension matching์„ ์œ„ํ•ด B ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค. 

 

๊ฒฐ๋ก ์ ์œผ๋กœ Resnet์€ ์œ ์˜๋ฏธํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ผ์œผ์ผฐ์œผ๋ฉฐ ๋”ฅ๋Ÿฌ๋‹์—์„œ ํ˜„์žฌ๊นŒ์ง€ ๋งŽ์ด ์ธ์šฉ๋˜๊ณ  ์žˆ๋Š” ๋…ผ๋ฌธ์œผ๋กœ ์‹œ๊ฐ„์ด ์—†์œผ์‹œ๋ฉด ๊ฐœ๋…๋งŒ ์ •๋ฆฌํ•˜์‹œ๋”๋ผ๋„ ์‹œ๊ฐ„์ด ์žˆ์œผ์‹œ๋ฉด ๊ผญ ํ•œ๋ฒˆ ์ฝ์–ด ๋ณด์„ธ์š”!

 

์˜๊ฒฌ๊ณผ ์งˆ๋ฌธ์€ ์–ธ์ œ๋‚˜ ๊ฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜์‘ํ˜•