-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Plans? #1
Comments
Hello, I just merged branch 'intra_net' into master, since the former had most of the successful code. We had plans to implement the mentioned paper, which is a continuation of the original work from the guys at MERL, but unfortunately we had other priorities. The branch 'end2end' was supposed to initiate this work. I really didn't re-test the code, and since it has been a while since we fiddled with this project, there may be something broken or out of place. Feel free to file an issue if this is the case. IF, and that's a big if because I really don't encourage you to do so, but if you wish to delve into spaghetti code, check our speech enhancement branches, namely 'irm' for ideal ratio masks and 'softmask' for some really experimental spectral masks based on the deep clustering framework. |
Hello Eduardo, |
Hi Eduardo, |
@zhr1201 I've forked the code and updated it to use Keras 2 and TensorFlow 1.1. I haven't cleaned it up and made it publicly available yet though. Let me know if that would be useful to you. I too have had good results on small-ish data sets (~1 hour or so of training data). Is your data set available for me to try? |
@gauss256 I guess my implementation is just like jcsilva's repository: same network structure, vanilla l2 loss after filtering the silent TF bins, and different optimizers with different hyper parameters have been tried out and it just won't converge. I kind of worry that if there is some kind of tricks used which the author forgot to mention in the paper. I haven't run jcsilva's code yet since I haven't installed Keras on my labs' server. Seems that his code makes sense and I don't know what problem he encountered. If you get any ideas about why jcsilva labels the rep as unfinished, please let me know. Thanks bro. |
This work is labeled unfinished because we never got to prepare the code for the SDR results in the original paper, so we didn't really reproduce the paper's pipeline. However, there should be enough code to generate binary masks for speech separation, and training should converge without much hassle. As aforementioned, the softmasks from the end-to-end extended work were not implemented. As of the resulting quality and practical applications, we never really went too far. We were investigating speech enhancement techniques at the time, and MERL's speech separation approach turned out to be an overkill (and didn't really produce great results for us). Seems like there is a limit to what can be achieved with single-channel sources. We were applying all this work in Brazilian Portuguese speech processing. If you are curious about what we used, we started our work using the benchmark dataset provided here, along with the CHiME3 noise dataset for data augmentation. I don't know if the later is still available for public use, as there are more recent CHiME challenges. Latest work from MERL focuses on music separation, awesome work btw: http://www.merl.com/publications/docs/TR2017-010.pdf Looks way more promising in terms of practicality than single-channel speaker separation IMO. Also, it looks easier to implement. |
@akira-miasato |
@gauss256 Hi, are you going to open source your Tensorflow extension of this repository? I am going to use DC algorithm in source separation experiments for my master thesis and your code would be of great help for me. Thanks in advance. |
Yes, since there is some interest, I will do that. I'll see if I can get to
it this weekend.
…On Fri, Jul 28, 2017 at 6:58 AM, isklyar ***@***.***> wrote:
@gauss256 <https://github.com/gauss256> Hi, are you going to open source
your Tensorflow extension of this repository? I am going to use DC
algorithm in source separation experiments for my master thesis and your
code would be of great help for me. Thanks in advance.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AIvlmTmm1jQDBDPgTDY6YtO3ierwIMxnks5sSekAgaJpZM4L-3Co>
.
|
@isklyar I have updated my fork here: https://github.com/SingSoftNext/deep-clustering It's not pretty, but it works. I'm going to turn my attention now to implementing the algorithm in this follow-on paper: https://arxiv.org/abs/1611.08930 It's probably not hard to adapt the DC code to DANet. If someone gets there before I do, please let us know! |
@gauss256 great, thank you!! Most probably I will also work on extending vanilla DC with some type of end-to-end training, either with DANet or enhancement network. I will let you know if I achieve smth in this direction. |
@isklyar @gauss256 My tensorflow implementation has also been updated just now, because I'm going to look for a job and I didn't update my github a lot before TUT. You are welcome to check that out. If you want to get fair performances in reverberant conditions, use a reverberant training set can get satisfactory performance. By the way, I'm now also working on DANet, and have got some results on small datasets with 2-mixtures. For DC, the loss function is invariant for 2 and 3 mixtures. However, it's kind of hard to deal with different conditions with one mixture, two mixture, three mixture in DANet in tensorflow, especially flow control of every sample in a batch of data. |
@zhr1201 I hope you will be able to post your DANet code. It would speed things up to have that as a starting point. |
@gauss256 We are not able to open source our implementation of DANet right away because we are now working together with a company. Sorry about that. |
@gauss256 Did you go forward with the implementation of the DANet? |
Yes, we have a rough implementation of DANet. Would be happy to collaborate. My email address is in my Github profile. Send me a message there. |
@zhr1201 as you mentioned, dc is not currently suitable for practical use and moved to the DANet. how about DANet real time performance based some embedded processor? |
@LiJiongliang It is not a real time model because you need to feed in trunks of frames once (approximately 0.8 s data). |
@zhr1201 IN DANet, if you use the anchored attractor points, you can implement it in "real-time", the main delay will be the one of the Fourier transform, so here 32 ms, way under the 0.8s you suggest. Just to explain where the 0.8s comes from and why we can do better : the first training phase uses 100 frames of STFT as the input, each frame is 32 ms long but they are computed every 8ms, so 100*8ms = 0.8s. |
Hi Eduardo,
What are your plans for this code? It is described as unfinished and has not been updated lately.
I am interested in working on this, and even more interested in code for this subsequent paper that extends the technique to get much better results.
https://arxiv.org/abs/1607.02173
The text was updated successfully, but these errors were encountered: