[20210613] Weekly AI ArXiv 만담 #13

jungwoo-ha · 2021-06-12T08:27:10Z

AI News
- ICCV 2021 review가 나왔어요. Rebuttal 하시는 분들 모두 파이팅!
- ICASSP 2021: 6월 6일 ~ 11일
- CVPR 2021: 6월 19일 ~ 25일
  - Best paper candidate: http://cvpr2021.thecvf.com/node/290 (고려대 김창수 교수님 연구실)
  - 한국 스폰서쉽: 네이버, LG AI Research, SuperbAI,
- 미정부, AI 연구에 정부 핵심 공공데이터 공유 TF 발족 (https://n.news.naver.com/article/001/0012453713)
  - 백악관 뉴스는: https://www.whitehouse.gov/ostp/news-updates/2021/06/10/the-biden-administration-launches-the-national-artificial-intelligence-research-resource-task-force/
- RL로 차세대 AI 칩 설계: https://n.news.naver.com/article/293/0000035052
- 원격의료 규제 과연 풀릴까: http://www.docdocdoc.co.kr/news/articleView.html?idxno=2011552
- 프랑스 격리 해제: https://news.naver.com/main/read.nhn?mode=LSD&mid=shm&sid1=103&oid=421&aid=0005399038
Arxiv
- Scaling Vision Transformers
  - ViT를 정말 열심히 키워봤어요
  - 그냥 키우면 이래저래 힘드니 온갖 다양한 트릭들 (LR, WD, 메모리 절약형 Adam 등등등)
  - 데이터도 모자라서 pseudo-labeling style semi-supervised 로 JFT-300M --> JFT-3B로 뻥튀기도 했어요
  - 그랬더니 GPT-3에서 보여주던 Scaling law 가 보여요.
  - ImageNet-1k SOTA도 깼어요 (Finetuning 기준) - Meta pseudo labeling 기록 90.2
  - 근대 ViT-G/14 정도는 되어야 쓸만 한 것 같아요..
  - 역시 우리는 기업과 학교가 함께 고민해야 겠어요.
- NWT: Towards natural audio-to-video generation with representation learning
  - Audio-to-video 연구에요.
  - 여기도 Discrete VAE with Wasserstein adversarial loss를 사용합니다. 멀티모달은 뭔가 discrete embedding이 굉장히 잘 워킹하는 듯
  - 제목과는 다르게 퀄이 생각보다 좋습니다.
  - Frame-level AR < Memcode-level AR
  - https://next-week-tonight.github.io/NWT/
  - Ack에 Alex가....
- Barbershop: GAN-based Image Compositing using Segmentation Masks
  - https://zpdesu.github.io/Barbershop/
  - Structure (Semantic mask), Appearance, Identity --> 얼굴, 머리모양, 머리색 조합
- Swarm Learning for decentralized and confidential clinical machine learning
  - 네이처(자매지 아니고)에 소개된 탈중앙화 privacy-preserving ML
- Decision Transformer: Reinforcement Learning via Sequence Modeling
  - RL을 value function, policy gradient 없이... LM 스타일로..
- Cross-Modal Discrete Representation Learning
  - from MIT
  - visual object 와 spoken word의 concept과 event 같은 멀티모달을 fine granularity 로 표현하는
  - 이걸 discrete embedding 파트를 쉐어한다고.. (Cross-modal code matching)
  - fine-grained (pixel, word, frame 등)이 video, sentence, waveform등과 complementary 역할을 한다고..
- Automated Self-Supervised Learning for Graphs
  - GNN에서 SSL이 대세이긴한데 pretext task들이 매우 다양함
  - target task와 dataset에 따라 pretext task 선택이 중요하더라는
  - 이걸 Auto로 자동으로 여러개 pretext task를 활용하는지 (ES, DS) 둘다 제안. 기본 철학은 homophily
  - https://github.com/ChandlerBang/AutoSSL (아직코드는 공개전)
- Multi-Dataset Benchmarks for Masked Identification using Contrastive Representation Learning
  - 다양한 공개 얼굴데이터셋을 활용하여 마스크를 씌우고 공개한 데이터셋
  - https://github.com/sachith500/ContrastiveFaceRepresentation
- MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
  - Midi 같은 symbolic 데이터를 LM 스타일로 pretraining (100만곡) (from MSRA, MS, 북경대, Findings of ACL 2021)
  - 그냥하면 당연히 안되서 OctupleMIDI 인코딩, bar-level masking 등 다양한 아이디어를 넣었다고 함.
  - Melody completion, 반주제안, 장르 분류, 스타일 분류 등 downstream task
- MST: Masked Self-Supervised Transformer for Visual Representation
  - Teacher-student 구조를 기반으로 attention guided token masking 으로 self-supervised learning 효율 높임.
  - Student에선 decoder (CNN)도 붙여서 reconstruction도 수행하면서 local 정보도 보존하도록
  - 결론은 DINO랑 마지널 하게 좋은데 (classification도 detection, segmentation도) epoch수는 덜먹음 (CNN upsample decoding recon이 multitask라.. 실제 수행시간은 어떨지??).
  - 백본은 DeiT-S랑 Swin-T 를 사용해서 비교
- Ruddit: Norms of Offensiveness for English Reddit Comments
  - NSMC의 Reddit 버전? ㅎㅎ, -1부터 1점까지 레딧 댓글 정량화한 데이터셋 (ACL2021)
  - https://github.com/hadarishav/Ruddit

veritas9872 · 2021-06-13T07:52:19Z

ICML 2021 Accepted Papers (Initial)
ICML 2021 페이퍼들이 나왔습니다. 다음 주 중으로 재미있어 보이는 논문 선별해 공유해드리도록 하겠습니다.
https://icml.cc/Conferences/2021/AcceptedPapersInitial

CALM (feature attribution)
네이버에서 새로 나온 연구 제가 하정우님 대신해서 올려드립니다 ㅋㅋㅋ
https://github.com/naver-ai/calm
아직 Arxiv에 논문은 찾지 못했지만 기존의 GradCAM의 단점을 극복할 수 있는 deep learning attrribution 연구를 기대하겠습니다.

Single Image Depth Estimation using Wavelet Decomposition (CVPR 2021)
https://arxiv.org/abs/2106.02022
추억팔이 느낌으로 공유해드립니다. 대부분 비전 딥러닝은 이미지 영역에서 (아직까지는) CNN을 사용해서 학습을 진행하는데 Wavelet transform 혹은 Fourier Transform으로 다른 feature representation으로 바꾸어 학습하는 것을 생각해볼 수 있습니다.
여기에서는 사용되지 않았지만 예를 들어 Fourier Transform으로 변환한 후에 complex-valued transformer를 사용해보는 방법 또한 생각해볼 수 있습니다. 마침 PyTorch 1.8.1부터 복소수 학습 지원이 추가되었는데 혹시나 연구 주제가 필요하신 분들은 참조하시기를 ㅎㅎ

jshin49 · 2021-06-13T12:08:16Z

Shades of BLEU, Flavours of Success: The Case of MultiWOZ
- Reports yet another problem with MultiWOZ
- "unsatisfactory preprocessing, insufficient or underspecified evaluation metrics, or rigid database"
- show that reported scores cannot be directly compared
- release standalone standardized evaluation scripts
AUGNLG: Few-shot Natural Language Generation using Self-trained Data Augmentation
- novel data augmentation approach that combines a self-trained neural retrieval model with a few-shot learned
  NLU model
- outperforms the state-ofthe-art methods on the FEWSHOTWOZ data in both BLEU and Slot Error Rate
VARIATIONAL INFORMATION BOTTLENECK FOR EFFECTIVE LOW-RESOURCE FINE-TUNING
- ICLR 2021
- fine tune a pretrained BERT Model using Variational Information Bottleneck method of Alemi to improve transfer learning in low resource scenarios
- improvements and the success on a wide range of tasks and the surprising success of VIB over other alternatives like dropout
- 원래 tie 였는데 AC 가 위와 같은 메타리뷰로 억셉시켰음
AGGGEN: Ordering and Aggregating while Generating
- ACL 2021
- performs sentence planning at the same time as generating text by learning latent alignments (via semantic facts)
- compared with vanilla seq2seq?

nick-jhlee · 2021-06-13T12:53:56Z

Heavy Tails in SGD and Compressibility of Overparametrized Neural Networks
- Bogazici University, Telecom Paris, INRIA, Vector Institute
- 최근 neural network를 compress하는 기법들이 많이 나오고 있는데, 다 대충 잘 되는 것 같애요
- 이 논문: Why do they work??
- Theoretical results
  - Suppose that both theories are true simultaneously: heavy tail theory of SGD & mean-field regime of neural network
  - Then networks are l_p-compressible!
  - magnitude pruning, singular value pruning, node pruning works, provably!
  - novel error and generalization bound for compressible networks!
Regularization in ResNet with Stochastic Depth
- University of Oxford, Huawei Technologies at France
- ECCV 2016에서 depth가 stochastic하게 바꾸는 approach가 있었음! (Huang et al., 2016)
  - For each mini-batch, drop residual blocks with some probability by bypassing it with identity mapping!
  - 실제로 training time 엄청 줄이고 test accuracy가 많이 증가했음!
- 이 논문: Why does it work??
- Theoretical results
  - Enforces flatness on loss surface as an explicit regulariser
  - Explicit quantification of "best" survival probability
  - large depth에선 Gaussian noise injection과 비슷한 효과!
  - 기타 등등
E(n) Equivariant Graph Neural Networks (ICML 2021)
- University of Amsterdam (Max Welling 교수님 참여...!)
- point clouds, molecular structure, n-body particle simulation: translation, rotation symmetries!! => E(n)-equivariant...
- 기존의 방법론보다 computational overhead가 훨씬 덜 함! + E(n), n>2도 가뿐히 할 수 있음!
- dynamical system, graph autoencoder, molecular property prediction에서 기존의 방법들 다 이김! + 더 빠름!

Barlow Twins: Self-Supervised Learning via Redundancy Reduction (ICML 2021)
- FAIR, NYU (LeCun 아저씨 있음!)
- 기존 SSL 방법론들은 trivial solution을 avoid하려고 많은 짓(?)을 함
- Barlow Twins!
  - neuroscience의 "redundancy-reduction principle"에서 inspired
  - 방법론: "measuring the cross-correlation matrix between the outputs of two identical networks fed with distorted versions of a sample, and making it as close to the identity matrix as possible."
  - trivial sol 절대 안나옴: "the representation vectors of distorted versions of a sample to be similar, while minimizing the redundancy between the components of these vectors."
  - negative-sample-free method
  - related to information bottleneck principle
- semi-supervised low-data regime 새로운 SOTA, ImageNet/transfer classification/objective SOTA랑 비슷한 성능

Graph Barlow Twins: A self-supervised representation learning framework for graphs
- Wroclaw University of Science and Technology, University of Notre Dame
- 위에 논문을 graph에 적용!
- Hilbert-Schmidt Independence Criterion (HSIC)과 연관이 있음!
- Significantly faster convergence, comparable performance to SOTA

Chasing Sparsity in Vision Transformers: An End-to-End Exploration
- University of Texas at Austin, Microsoft
- 가장 이상적인 pruning!
  - end-to-end training!
  - low training memory overhead
  - no accuracy loss
- "Specifically, instead of training full ViTs, we dynamically extract and train sparse subnetworks, while sticking to a fixed small parameter budget. Our approach jointly optimizes model parameters and explores connectivity throughout training, ending up with one sparse network as the final output."
- "our sparsified DeiT-Small at (5%, 50%) sparsity for (data, architecture), improves 0.28% top-1 accuracy, and meanwhile enjoys 49.32% FLOPs and 4.40% running time savings

Implicit Representations of Meaning in Neural Language Models (ACL 2021)
- MIT
- Main question: Just about how much of meaning is represented by NLM??
- "in simple semantic domains, they build representations of situations and entities that encode logical descriptions of each entity’s dynamic state."
- Empirically validate this claim!
  - 기존의 "probing": predict semantic roles from NLM embeddings
  - 여기서 제안한 "probing": recover a representation of the situation described by a discourse
  - 검증하려는 가설: LMs represent (a particular class of) information states
  - 정확한 실험 세팅은 논문 참조! (<- 여기에 담기엔 너무 복잡,,,)
- Future direction: improving factuality and coherence, correct biases...etc.

jwlee-ml · 2021-06-13T12:58:49Z

위 RL 이용한 chip design 관련 link 입니다.
https://www.nature.com/articles/d41586-021-01515-9?fbclid=IwAR2m-A7IbIWAMQiddsAUJ_v6R2TCz5arnfBwbnRzUzBAB0dQClNmP5BUHaU

jshin49 · 2021-06-13T13:56:03Z

stochastic depth on Transformers

Reducing Transformer Depth on Demand with Structured Dropout

ICLR 2020

jungwoo-ha closed this as completed Jul 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20210613] Weekly AI ArXiv 만담 #13

[20210613] Weekly AI ArXiv 만담 #13

jungwoo-ha commented Jun 12, 2021 •

edited

Loading

veritas9872 commented Jun 13, 2021

jshin49 commented Jun 13, 2021 •

edited

Loading

nick-jhlee commented Jun 13, 2021 •

edited

Loading

jwlee-ml commented Jun 13, 2021

jshin49 commented Jun 13, 2021 •

edited

Loading

[20210613] Weekly AI ArXiv 만담 #13

[20210613] Weekly AI ArXiv 만담 #13

Comments

jungwoo-ha commented Jun 12, 2021 • edited Loading

veritas9872 commented Jun 13, 2021

jshin49 commented Jun 13, 2021 • edited Loading

nick-jhlee commented Jun 13, 2021 • edited Loading

jwlee-ml commented Jun 13, 2021

jshin49 commented Jun 13, 2021 • edited Loading

jungwoo-ha commented Jun 12, 2021 •

edited

Loading

jshin49 commented Jun 13, 2021 •

edited

Loading

nick-jhlee commented Jun 13, 2021 •

edited

Loading

jshin49 commented Jun 13, 2021 •

edited

Loading