Skip to content

LeeJZh/A_Network_That_Trains_At_Zero_Init

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 

Repository files navigation

A Network That Trains At Zero Init

Here is a network that still trains with zero initilization.

Two weeks ago I had a job interview with ByteDance where I was asked whether a zero initlized network can train. My answer was Yes and I fails the interview.

Maybe I didn't make myself clear.

To better illustrate my answer, I write this notebook to prove that a network can be trained when being zero initialized with proper supervision.

The commom belief of zero initialized network cannot be trained relies on the assumption that shallow layers can only recive supervision from deep layers while deep layers can only revice information from shallow layers. Therefore when zero initialized, shallow layers recives zero or constant gradients while deep layers revices constant information.

However, a network can be designed to break such assumption and still performs better than random guess. See this notebook for details.

About

Here is a network that still trains with zero initilization.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published