Westlake University
- https://bingyang-20.github.io/
A python implementation of “Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer” [TASLP 2024]
A description of "RealMAN: A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization" [NeurIPS 2024]
Impulse response generation based on state-of-the-art geometric sound propagation engine.
Some comprehensive papers about speaker diarization
The official Pytorch implementation of "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors". [ICASSP 2024] and "LS-EEND: long-form streaming…
The official repo of NBC & SpatialNet for multichannel speech separation, denoising, and dereverberation
The missing star history graph of GitHub repos - https://star-history.com
Deep-learning-based implementation of the popular Hungarian algorithm that helps solve the assignment problem.
A python implementation of “SRP-DNN: Learning Direct-Path Phase Difference for Multiple Moving Sound Source Localization” [ICASSP 2022]
The Official PyTorch Implementation of FN-SSL & IPDnet for Sound Source Localization [INTERSPEECH2023 & TASLP2024]
A library that contains a rich collection of performant PyTorch model metrics, a simple interface to create new metrics, a toolkit to facilitate metric computation in distributed training and tools…
This repo contains the scripts, models, and required files for the Deep Noise Suppression (DNS) Challenge.
A list of publicly available room impulse response datasets and scripts to download them.
Gammatone-based spectrograms, using gammatone filterbanks or Fourier transform weightings.
Measuring room impulse responses with python and sounddevice
👫 Joint Discriminative and Generative Learning for Person Re-identification. CVPR'19 (Oral) 👫
End-to-End Object Detection with Transformers
High-Resolution Image Synthesis with Latent Diffusion Models
Implementation of Denoising Diffusion Probabilistic Model in Pytorch
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
Transformer seq2seq model, program that can build a language translator from parallel corpus
Pyroomacoustics is a package for audio signal processing for indoor applications. It was developed as a fast prototyping platform for beamforming algorithms in indoor scenarios.