[20210404] Weekly Arxiv 만담 #4

jungwoo-ha · 2021-03-29T13:30:07Z

AI News
- ACL 2021 Rebuttal 종료
- Interspeech 2021 제출 마감 --> 모두 수고 많으셨습니다.
- 조동성 교수님의 최고경영자 + 경영대 교수들을 위한 강연 (https://dbr.donga.com/article/view/1206/article_no/10000/ac/magazine)
- 한국과학기술 한림원: 인공지능 시대 인재양성 토론회 (https://www.youtube.com/watch?v=uZxd1GOiLx4)
  - 중요: 시작 7분 40초임.
  - AI교육과정에 기업 참여
  - 온라인 교육과정 활용 + 정규과정외 부트캠프 형태 연계 필요
  - Computational thinking, 어렸을 때 부터 교육이 필요한데 결국 시수확보가 중요
  - 전문연 확대 + 외국 연구자 비자 완화 + 가을학기제
  - AI는 산업을 넘어 사회변화의 개념이므로 그에 맞게 인식되어야
  - 공급자체를 획기적으로 증가시킬 수 있는 방안이 필요
  - 정원, 시수 등 결국 교육제도 전반을 손봐야 한다.
- 네이버 채용
  - 월간 경력 채용: https://www.naver-monthlyopening.com/
    - 매월 1일 ~ 10일은 지원신청 기간, 면접은 월중간 혹은 말에
  - 네이버 신입 공채: https://www.naver-recruit.com/
ArXiv
- EfficientNetV2: Smaller Models and Faster Training
  - 드디어 나왔다. EfficientNet V2
  - 타겟은 ViT. ImageNet-21k pretrain 으로 ViT, resnet-rs, lambda net 까지 다 즈려밟았다고?
- Putting NeRF on a Diet: Semantically Consistent Few-Shot View Synthesis
  - UC Berkeley Peter Abbeel 그룹
  - CLIP feature 가 이제는 NeRF에도
  - ViT CLIP을 인코더로서 써서 semantics consistency loss를 계산한다고.. 신박함.
  - 데모 동영상은 여기: https://www.ajayj.com/dietnerf, 코드는 커밍 쑨~
- A Survey on Natural Language Video Localization
  - NLVL? 자연어 쿼리에 해당하는 장면 가져오기 --> NL로 표현되는 비디오내 특정 장면 추출
  - Action recognition과의 차이? classification vs. text-to-image retrieval 의 관계랑 비슷해 보임
  - Supervised and weakly supervised
  - 3가지 데이터셋에 대해 매우 다양한 방법들의 성능과 특징을 비교해 놓음.
- Using Python for Model Inference in Deep Learning
  - 딥러닝 모델 inference 와 서빙 성능 향상을 위한 패키징 방법
  - Single process 에 multiple python interpreter 쓴다고..
  - Facebook AI Research 작품
- Mesh Graphormer
  - from MS
  - 싱글 이미지에서 3d 휴먼 mesh recon 해내는 연구
  - Norm+FC+2GCN+Norm+FC 로 구성된 Graph residual block을 transformer 안의 SA바로 다음에 녹여 넣은 구조
  - Joint 와 vertex 를 포지션 인코딩으로
  - Mesh recon쪽도 이제 Transformer로
- CUPID: Adaptive Curation of Pre-training Data for Video-and-Language Representation Learning
  - MS Cloud
  - 비디오-text pretraining-finetuning 시 도메인 차이가 나는 데이터에서 transfer learning 을 위한 좀 휴리스틱하지만 효과좋은 curation 기반의 pretraining 기법
- Many-to-English Machine Translation Tools, Data, and Pretrained Models
  - 세계인을 위한 영어로 번역 데이터, 툴 모델 공개 from USC
  - 데모페이지: http://rtg.isi.edu/many-eng/
- Unsupervised Sound Localization via Iterative Contrastive Learning
  - UC Merced Ming-Hsuan Yang
  - 비디오에서 소리의 위치를 찾는데 pseudo region + contrastive learning 을 iterative 하게 돌려 annotation 없이 학습
  - GT데이터 만드는데 노가다가 많이 드는 영역인데 꽤 유용해 보임.
- Jigsaw Clustering for Unsupervised Visual Representation Learning
  - CVPR2021 oral
  - Self-super 의 고전인 jigsaw clustering 기반,
  - COCO에서 MoCo v2보다 성능이 좋다고??
  - 싱글배치 가능해서 하드웨어 오버헤드도 덜하고 기존 방법보다 배치사이즈 줄이는 이점이 가능
  - DA기반의 contrastive 는 원본과 DA된 것 dual로 batch 구성해야
  - 코드 깃헙은 아직 깡통임
- FeTaQA: Free-form Table Question Answering
  - Yale U, Salesforce 그 외에 기타 등등
  - Wikipedia 로 부터 10k 테이블, 질문, free-form 답, supporting table cell pair 포함한 QA셋 만듬
  - https://github.com/Yale-LILY/FeTaQA --> 현재는 404
- Anchor Pruning for Object Detection
  - 보통 detector 경량화는 백본쪽 위주로 한다.
  - 이 연구는 detector 의 anchor 를 pruning, 실제 초 경량화 detector가 필요한 분들께는 도움될 각?
- Enriched Music Representations with Multiple Cross-modal Contrastive Learning
  - 이제 contrastive learning에 뮤직에도
  - audio, 장르 메타, playlist 등등의 다양한 데이터를 crossmodal로 학습
  - 카카오 김윤태님도 저자로 계심
- SCALoss: Side and Corner Aligned Loss for Bounding Box Regression
  - 통상 detector는 IoU를 중요 metric으로 하고 겹치는 영역기반 BB regression 수행
  - 이 연구에선 side와 corner align을 loss에 반영 --> 기존 로스 대비 다양한 detector 에서 꾸준히 개선된 성능 보인다고..
- Improving Calibration for Long-Tailed Recognition
  - CVPR 2021
  - 클래스 임밸런스가 심한 경우 (long-tail)
  - 보통 representation learning 과 classifier learning 을 분리해서 진행
  - 근데 calibration을 지금까진 잘 안했음 --> 클래스 인스턴스 개수와 over-confident 고려한 스무딩 + shifted BN.
  - CIFAR, ImageNet-LT 데이터셋에서 좋은 성능을 보였다고
  - 실세계 Long-tail 케이스 엄청많은 데 (특히 쇼핑 카테고리 같은) 꽤 유용할 것 같음.
  - https://github.com/Jia-Research-Lab/MiSLAS 하지만 깃헙은 아직 깡통
- In&Out : Diverse Image Outpainting via GAN Inversion
  - 주어진 이미지 outpainting.--> 이미지 내부를 채워넣는 inpainting과는 달리 outpainting은 바깥을 그리는 것
  - GAN inversion으로 파노라마 샷을 만들 수 있음.
  - 데모: https://yccyenchicheng.github.io/InOut/
- Is Label Smoothing Truly Incompatible with Knowledge Distillation: An Empirical Study
  - 레이블 스무딩이 KD랑 궁합이 안맞다고 하는데....
  - 그걸 실제 실험으로 분석한 논문. 어떤 경우에 얼마나 잘 안먹히는 지..
  - ICLR 2021 논문 이라고
  - 프로젝트 페이지: http://zhiqiangshen.com/projects/LS_and_KD/index.html
- ReMix: Towards Image-to-Image Translation with Limited Data
  - I2I에서 데이터 부족할때 써먹을 DA
  - feature level 에서 interpolation 한다고... perceptual relation 기반 새로운 content loss
  - 일단 품질은 좋은 것 같음. (CVPR 2021)
- VITON-HD: High-Resolution Virtual Try-On via Misalignment-Aware Normalization
  - KAIST AI 대학원 주재걸 교수님 연구실
  - Virtual Try-on 에 관심있는 분들 찾아주세요~ 1024*768 해상도 된다고..
- The User behind the Abuse: A Position on Ethics and Explainability
  - 인터넷의 사용자와 커뮤니티에 대한 AI 윤리, Data 이슈 등을 정리 해놓은 연구
  - Facebook AI London
  - 소양삼아 읽어볼만 할 듯.
- 사심을 담은 논문 2개
  - Rethinking Spatial Dimensions of Vision Transformers
  - Rainbow Memory: Continual Learning with a Memory of Diverse Samples

veritas9872 · 2021-03-29T13:49:50Z

Efficient Linear Transformers with Kernel Methods:

Rethinking Attention with Performers:
Paper: https://openreview.net/forum?id=Ua6zuk0WRH
GitHub: https://github.com/google-research/google-research/tree/master/performer/fast_attention
Blog: https://ai.googleblog.com/2020/10/rethinking-attention-with-performers.html
PyTorch Implementation (HuggingFace): https://github.com/norabelrose/transformers-plus-performers/blob/master/src/transformers/modeling_performer_attention.py

Random Feature Attention:
Paper: https://openreview.net/forum?id=QtTKTdVrFBB

ICLR 2021에서 Kernel method을 사용해 self-attention의 O(N^2)을 O(N)으로 바꾸고자 하는 논문 2개가 oral session과 spotlight paper로 선정되었습니다. 구글과 딥마인드에서 나온 연구인데 softmax를 직접 연산하는 대신 kernel(SVM에서의 kernel과 동일한 kernel입니다)을 통해서 attention을 연산합니다.

Transformer의 가장 큰 문제점 중 하나인 quadratic growth를 해결할 수 있는 방법론으로 앞으로 많은 발전을 이룰 것을 예상합니다.

tteon · 2021-04-02T08:45:41Z

Unsupervised Hyperbolic Representation Learning via Message Passing Auto-Encoders

paper ; https://arxiv.org/pdf/2103.16046.pdf
github ; https://github.com/junhocho/HGCAE

기존 Euclidean space 에서 진행하였던 embedding 과 본 논문 저자가 주장하는 hyperbolic 에서 진행한 embedding 방법론을 비교하고자 link prediction , node clustering task 를 통해 우수함을 보임.

Question; Table 2에서 보인 link prediction performance 에서 Pubmed (bio) dataset 은 오히려 comparison 중 하나인 DBGAN이 더 우수하였습니다. 기존 분자 구조들은 3차원이라 hyperbolic space에서 좀 더 좋은 performance가 나올것이라 기대되었으나 그렇지않았는데 DBGAN paper을 읽어보며 왜 그러한 결과가 나왔는가에 탐구해보면 흥미로운 인사이트를 얻을 수 있지 않을까 기대가 됩니다. 혹 인사이트를 얻게 된다면 이 방에 공유토록 하겠습니다. :)

jshin49 · 2021-04-04T08:24:24Z

Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
- Blog Post
- Transformer 구성 요소에 대한 systematic study
- Self-attention vs skip connections and MLP (in regards to rank collapse)
- SANs 분석을 위한 path decomposition 방법론 제시: SANs as an ensemble of shallow networks
- Verify the theory with experiments on common transformer architectures.
  - when skip connections are removed, all networks exhibit a rapid rank collapse,
  - adding MLP or skip connections either stops or drastically slows down rank collapse
  - short paths are responsible for the majority of SANs’ expressive power.

jshin49 · 2021-04-04T12:56:47Z

For next week

Weekly NLP by Jiho Park, "모델 중심에서 데이터 중심의 AI 개발로"
- Review of A Chat with Andrew on MLOps: From Model-centric to Data-centric AI
- AI = Code + Data
- MLOps = efficient & continual data collection method
Shortformer: Better Language Modeling using Shorter Inputs
- 이름 그대로 Short-sequence 를 이용한 LM
- curriculum learning style
- 더 효율적인 positional encoding 처리 방법
- UWNLP & Facebook AI
All NLP Tasks Are Generation Tasks: A General Pretraining Framework
GPT Understands, Too
- Reminds me of Plug and Play Language Models from Uber
Generating images with sparse representations
ViViT: A Video Vision Transformer

jshin49 · 2021-04-04T14:24:04Z

Poincaré Embeddings for Learning Hierarchical Representations

veritas9872 · 2021-04-09T01:41:56Z

Steven Boyd 교수님께서 Minimum DIstortion Embedding (MDE)라는 논문 및 라이브러리를 발표하셨습니다.
정보를 효율적으로 압축, 시각화 등을 하는데 t-SNE와 비슷하게 사용할 수 있고 영상 등의 representation을 바꾸는데에도 사용할 수 있습니다. 또한, 파이썬 라이브러리로 PyTorch 기반으로 구현해주셔서 GPU를 쉽게 사용할 수 있어 많은 양의 데이터를 빠르게 처리할 수 있습니다.
딥러닝은 아니지만 representation space등을 자주 다루어야하는 경우에 도움이 많이 될 수 있을 것 같습니다.
논문이 매우 길기 때문에 documentation에서 요약을 찾아보시는 것을 추천합니다.

논문: https://arxiv.org/abs/2103.02559
GitHub: https://github.com/cvxgrp/pymde
Documentation: https://web.stanford.edu/~boyd/papers/min_dist_emb.html

jungwoo-ha closed this as completed May 5, 2021

jnhwkim mentioned this issue Aug 8, 2021

[20210808] Weekly AI ArXiv 만담 #20

Closed

ghlee0304 mentioned this issue Mar 27, 2022

[20220327] Weekly AI ArXiv 만담 - 45회차 (Stanford AI Index Report 특집) #45

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[20210404] Weekly Arxiv 만담 #4

[20210404] Weekly Arxiv 만담 #4

jungwoo-ha commented Mar 29, 2021 •

edited

Loading

veritas9872 commented Mar 29, 2021

tteon commented Apr 2, 2021 •

edited

Loading

jshin49 commented Apr 4, 2021 •

edited

Loading

jshin49 commented Apr 4, 2021 •

edited

Loading

jshin49 commented Apr 4, 2021

veritas9872 commented Apr 9, 2021 •

edited

Loading

[20210404] Weekly Arxiv 만담 #4

[20210404] Weekly Arxiv 만담 #4

Comments

jungwoo-ha commented Mar 29, 2021 • edited Loading

veritas9872 commented Mar 29, 2021

tteon commented Apr 2, 2021 • edited Loading

jshin49 commented Apr 4, 2021 • edited Loading

jshin49 commented Apr 4, 2021 • edited Loading

jshin49 commented Apr 4, 2021

veritas9872 commented Apr 9, 2021 • edited Loading

jungwoo-ha commented Mar 29, 2021 •

edited

Loading

tteon commented Apr 2, 2021 •

edited

Loading

jshin49 commented Apr 4, 2021 •

edited

Loading

jshin49 commented Apr 4, 2021 •

edited

Loading

veritas9872 commented Apr 9, 2021 •

edited

Loading