Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

No effective results #7

Open
hiroki1953 opened this issue Aug 12, 2024 · 3 comments
Open

No effective results #7

hiroki1953 opened this issue Aug 12, 2024 · 3 comments

Comments

@hiroki1953
Copy link

Nice to meet you.
I was very interested in this research and tried it myself.

I am executing the following two commands, but I am not seeing any significant changes from the input values. (I want to change the melody.)

I have read the source code and tried to devise some ideas, but what argument values ​​should I set to get better results?

Please give me some advice.

python main_pc_extract_inv.py --source_prompt "A high quality recording of wind instruments and strings playing. " --target_neg_prompt "low quality" --init_aud "../sample_audio/MDDBBeethoven.wav" --model_id "cvssp/audioldm2-music" --results_path "../result" --n_evs 3

python apply_drift.py --extraction_path "../result/audioldm2-music/MDDBBeethoven/pmt_A_high_quality_recording_of_wind_instruments_and_strings_playing. __neg__low_quality/sNone_pc-both_cfgd3_driftNone-None_it50_c1.0e-03_1723454811.pt" --drift_ start 0 --drift_end 50 --amount 1 --evs 1 2 3 --combine_evs

@HilaManor
Copy link
Owner

HilaManor commented Aug 13, 2024

Hi, thanks for the issue.
In diffusion the generation process starts from a high-value timestep (e.g., 200) and ends in 0.
Choosing the starting timestep (drift_start, in your example 0) to be lower than the ending timestep (drift_end, 50), results in no editing at all.

Try just swapping between the 0 and the 50 in you apply_drift script. Then you should start hearing some changes :)
Then, to change the amount of change, try playing with the timestep, e.g., 150->50, would change more.

I'll add a value check to raise an error to prevent people getting confused by this, thanks!

Edit: added on commit c369ad3

HilaManor added a commit that referenced this issue Aug 14, 2024
@hiroki1953
Copy link
Author

Thank you.
After various trials, I was able to reproduce the edited melody.
I have one question: what is the appropriate value for the amount?
When I tried it several times with 1.0, I didn't see any significant changes, but when I raised it to 100, I saw a significant change.

@HilaManor
Copy link
Owner

Since it's an unsupervised method, there is not "appropriate" value, and it's a bit of a trial-and-error process to find a value that you are satisfied with.

In the first method of adding the PCs (which you now use), a different PC is added for each timestep in the time-range, I generally saw that amount=-40,40 yields a significant change.
In the second method, where the same PC is added for each timestep in the time-range (by setting the --use_specific_ts_pc), using amount=-2,2 was already significant.
The difference is that the first method changes multiple elements (different PCs), whereas the second changes more strongly a specific element across all timesteps which accumulates.

The change is added in the latent space of the model, which means that if we add too much it will start to deteriorate the quality of the results, but I haven't found when this will happen (I didn't try above the 2/40 for the 2 methods respectively with audio).

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants