A Network That Trains At Zero Init

Here is a network that still trains with zero initilization.

Two weeks ago I had a job interview with ByteDance where I was asked whether a zero initlized network can train. My answer was Yes and I fails the interview.

Maybe I didn't make myself clear.

To better illustrate my answer, I write this notebook to prove that a network can be trained when being zero initialized with proper supervision.

The commom belief of zero initialized network cannot be trained relies on the assumption that shallow layers can only recive supervision from deep layers while deep layers can only revice information from shallow layers. Therefore when zero initialized, shallow layers recives zero or constant gradients while deep layers revices constant information.

However, a network can be designed to break such assumption and still performs better than random guess. See this notebook for details.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
README.md		README.md
network_trains_at_zero_init.ipynb		network_trains_at_zero_init.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A Network That Trains At Zero Init

About

Releases

Packages

Languages

LeeJZh/A_Network_That_Trains_At_Zero_Init

Folders and files

Latest commit

History

Repository files navigation

A Network That Trains At Zero Init

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages