No effective results #7

hiroki1953 · 2024-08-12T12:12:48Z

Nice to meet you.
I was very interested in this research and tried it myself.

I am executing the following two commands, but I am not seeing any significant changes from the input values. (I want to change the melody.)

I have read the source code and tried to devise some ideas, but what argument values should I set to get better results?

Please give me some advice.

python main_pc_extract_inv.py --source_prompt "A high quality recording of wind instruments and strings playing. " --target_neg_prompt "low quality" --init_aud "../sample_audio/MDDBBeethoven.wav" --model_id "cvssp/audioldm2-music" --results_path "../result" --n_evs 3

python apply_drift.py --extraction_path "../result/audioldm2-music/MDDBBeethoven/pmt_A_high_quality_recording_of_wind_instruments_and_strings_playing. __neg__low_quality/sNone_pc-both_cfgd3_driftNone-None_it50_c1.0e-03_1723454811.pt" --drift_ start 0 --drift_end 50 --amount 1 --evs 1 2 3 --combine_evs

The text was updated successfully, but these errors were encountered:

HilaManor · 2024-08-13T15:38:56Z

Hi, thanks for the issue.
In diffusion the generation process starts from a high-value timestep (e.g., 200) and ends in 0.
Choosing the starting timestep (drift_start, in your example 0) to be lower than the ending timestep (drift_end, 50), results in no editing at all.

Try just swapping between the 0 and the 50 in you apply_drift script. Then you should start hearing some changes :)
Then, to change the amount of change, try playing with the timestep, e.g., 150->50, would change more.

I'll add a value check to raise an error to prevent people getting confused by this, thanks!

Edit: added on commit c369ad3

Solves #7 confusion

hiroki1953 · 2024-08-14T15:10:31Z

Thank you.
After various trials, I was able to reproduce the edited melody.
I have one question: what is the appropriate value for the amount?
When I tried it several times with 1.0, I didn't see any significant changes, but when I raised it to 100, I saw a significant change.

HilaManor · 2024-08-15T13:17:51Z

Since it's an unsupervised method, there is not "appropriate" value, and it's a bit of a trial-and-error process to find a value that you are satisfied with.

In the first method of adding the PCs (which you now use), a different PC is added for each timestep in the time-range, I generally saw that amount=-40,40 yields a significant change.
In the second method, where the same PC is added for each timestep in the time-range (by setting the --use_specific_ts_pc), using amount=-2,2 was already significant.
The difference is that the first method changes multiple elements (different PCs), whereas the second changes more strongly a specific element across all timesteps which accumulates.

The change is added in the latent space of the model, which means that if we add too much it will start to deteriorate the quality of the results, but I haven't found when this will happen (I didn't try above the 2/40 for the 2 methods respectively with audio).

HilaManor added a commit that referenced this issue Aug 14, 2024

Update main_pc_apply_drift.py

c369ad3

Solves #7 confusion

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No effective results #7

No effective results #7

hiroki1953 commented Aug 12, 2024

HilaManor commented Aug 13, 2024 •

edited

Loading

hiroki1953 commented Aug 14, 2024

HilaManor commented Aug 15, 2024

No effective results #7

No effective results #7

Comments

hiroki1953 commented Aug 12, 2024

HilaManor commented Aug 13, 2024 • edited Loading

hiroki1953 commented Aug 14, 2024

HilaManor commented Aug 15, 2024

HilaManor commented Aug 13, 2024 •

edited

Loading