Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[20210613] Weekly AI ArXiv 만담 #13

Closed
jungwoo-ha opened this issue Jun 12, 2021 · 5 comments
Closed

[20210613] Weekly AI ArXiv 만담 #13

jungwoo-ha opened this issue Jun 12, 2021 · 5 comments

Comments

@jungwoo-ha
Copy link
Owner

jungwoo-ha commented Jun 12, 2021

@veritas9872
Copy link

ICML 2021 Accepted Papers (Initial)
ICML 2021 페이퍼들이 나왔습니다. 다음 주 중으로 재미있어 보이는 논문 선별해 공유해드리도록 하겠습니다.
https://icml.cc/Conferences/2021/AcceptedPapersInitial

CALM (feature attribution)
네이버에서 새로 나온 연구 제가 하정우님 대신해서 올려드립니다 ㅋㅋㅋ
https://github.com/naver-ai/calm
아직 Arxiv에 논문은 찾지 못했지만 기존의 GradCAM의 단점을 극복할 수 있는 deep learning attrribution 연구를 기대하겠습니다.

Single Image Depth Estimation using Wavelet Decomposition (CVPR 2021)
https://arxiv.org/abs/2106.02022
추억팔이 느낌으로 공유해드립니다. 대부분 비전 딥러닝은 이미지 영역에서 (아직까지는) CNN을 사용해서 학습을 진행하는데 Wavelet transform 혹은 Fourier Transform으로 다른 feature representation으로 바꾸어 학습하는 것을 생각해볼 수 있습니다.
여기에서는 사용되지 않았지만 예를 들어 Fourier Transform으로 변환한 후에 complex-valued transformer를 사용해보는 방법 또한 생각해볼 수 있습니다. 마침 PyTorch 1.8.1부터 복소수 학습 지원이 추가되었는데 혹시나 연구 주제가 필요하신 분들은 참조하시기를 ㅎㅎ

@jshin49
Copy link

jshin49 commented Jun 13, 2021

@nick-jhlee
Copy link

nick-jhlee commented Jun 13, 2021

  • Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks

    • Bogazici University, Telecom Paris, INRIA, Vector Institute
    • 최근 neural network를 compress하는 기법들이 많이 나오고 있는데, 다 대충 잘 되는 것 같애요
    • 이 논문: Why do they work??
    • Theoretical results
      • Suppose that both theories are true simultaneously: heavy tail theory of SGD & mean-field regime of neural network
      • Then networks are l_p-compressible!
      • magnitude pruning, singular value pruning, node pruning works, provably!
      • novel error and generalization bound for compressible networks!
  • Regularization in ResNet with Stochastic Depth

    • University of Oxford, Huawei Technologies at France
    • ECCV 2016에서 depth가 stochastic하게 바꾸는 approach가 있었음! (Huang et al., 2016)
      • For each mini-batch, drop residual blocks with some probability by bypassing it with identity mapping!
      • 실제로 training time 엄청 줄이고 test accuracy가 많이 증가했음!
    • 이 논문: Why does it work??
    • Theoretical results
      • Enforces flatness on loss surface as an explicit regulariser
      • Explicit quantification of "best" survival probability
      • large depth에선 Gaussian noise injection과 비슷한 효과!
      • 기타 등등
  • E(n) Equivariant Graph Neural Networks (ICML 2021)

    • University of Amsterdam (Max Welling 교수님 참여...!)
    • point clouds, molecular structure, n-body particle simulation: translation, rotation symmetries!! => E(n)-equivariant...
    • 기존의 방법론보다 computational overhead가 훨씬 덜 함! + E(n), n>2도 가뿐히 할 수 있음!
    • dynamical system, graph autoencoder, molecular property prediction에서 기존의 방법들 다 이김! + 더 빠름!

  • Barlow Twins: Self-Supervised Learning via Redundancy Reduction (ICML 2021)
    • FAIR, NYU (LeCun 아저씨 있음!)
    • 기존 SSL 방법론들은 trivial solution을 avoid하려고 많은 짓(?)을 함
    • Barlow Twins!
      • neuroscience의 "redundancy-reduction principle"에서 inspired
      • 방법론: "measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible."
      • trivial sol 절대 안나옴: "the representation vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors."
      • negative-sample-free method
      • related to information bottleneck principle
    • semi-supervised low-data regime 새로운 SOTA, ImageNet/transfer classification/objective SOTA랑 비슷한 성능

  • Chasing Sparsity in Vision Transformers: An End-to-End Exploration
    • University of Texas at Austin, Microsoft
    • 가장 이상적인 pruning!
      • end-to-end training!
      • low training memory overhead
      • no accuracy loss
    • "Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output."
    • "our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0.28% top-1 accuracy, and meanwhile enjoys 49.32% FLOPs and 4.40% running time savings

  • Implicit Representations of Meaning in Neural Language Models (ACL 2021)
    • MIT
    • Main question: Just about how much of meaning is represented by NLM??
    • "in simple semantic domains, they build representations of situations and entities that encode logical descriptions of each entity’s dynamic state."
    • Empirically validate this claim!
      • 기존의 "probing": predict semantic roles from NLM embeddings
      • 여기서 제안한 "probing": recover a representation of the situation described by a discourse
      • 검증하려는 가설: LMs represent (a particular class of) information states
      • 정확한 실험 세팅은 논문 참조! (<- 여기에 담기엔 너무 복잡,,,)
    • Future direction: improving factuality and coherence, correct biases...etc.

@jwlee-ml
Copy link

위 RL 이용한 chip design 관련 link 입니다.
https://www.nature.com/articles/d41586-021-01515-9?fbclid=IwAR2m-A7IbIWAMQiddsAUJ_v6R2TCz5arnfBwbnRzUzBAB0dQClNmP5BUHaU
image

@jshin49
Copy link

jshin49 commented Jun 13, 2021

stochastic depth on Transformers

Reducing Transformer Depth on Demand with Structured Dropout

  • ICLR 2020

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants