Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

The 4th order coefficient of FLUX does not show a clear relationship between the output_diff and the predicted_output_diff #20

Closed
phyllispeng123 opened this issue Jan 7, 2025 · 8 comments

Comments

@phyllispeng123
Copy link

phyllispeng123 commented Jan 7, 2025

Fig 1
Fig 1
Fig 2
Fig 2

I used 400 prompts from https://huggingface.co/datasets/k-mktr/improved-flux-prompts to generate 400 pairs of (modulated_input_diff, output_diff), the shape of each is 49 as I take the following hyperparameters.

num_inference_steps = 50, 
guidance_scale=3.5, 
max_sequence_length=256, 
generator=torch.Generator(device).manual_seed(42)

The result is satisfying because the modulated_input_diff and modulated_output_diff using my 400 generated data always show a stable and close relationship using different prompt (Fig 2). However, I meet some problem when I use the 4th order coefficient provided in ./TeaCache4FLUX/teacache_flux.py,

  1. I don't see an obvious relationship between either ( log(ouput_diff) vs log(predicted_output_diff) ) or ( ouput_diff vs predicted_output_diff ) using my own data.(Fig 1
  2. I do the 4th order polynomial fitting with my own data, and get the different coeffient [-34.84608751, -10.79323838, 16.39479138, -1.21976726, 0.12762022]), but it also show a bad relationship.
  3. I find the L1 loss between the ouput_diff and predicted_output_diff decreses as the order of fitting increases (I tried order from 1 to 10)

The code is displayed below, I wonder if I do it wrong ? (BTW, the TeaCache speed-up and performance is marvelous in both flux and hunyuanvideo !!! )

plt.clf()
x = input_diff.mean() #### the .csv, shape = (400, 49)
y = output_diff.mean() #### the .csv, shape = (400, 49)
coefficients = [4.98651651e+02, -2.83781631e+02,  5.58554382e+01, -3.82021401e+00, 2.64230861e-01]
rescale_func = np.poly1d(coefficients)
ypred = rescale_func(x)
plt.figure(figsize=(8,8))
plt.plot(np.log(x), np.log(y), '*',label='log original values',color='green')
plt.plot(np.log(x), np.log(ypred), '.',label='log polyfit values',color='blue')
plt.xlabel(f'4th order true fitting')
plt.legend(loc=4) 
@LiewFeng
Copy link
Collaborator

LiewFeng commented Jan 7, 2025

There may be some difference between our implementation.

  1. We use relative L1 Loss instead of L1 loss, not sure what are you using.
  2. The output in our setting is the residual output, i.e., output hidden states - input hidden states, instead of output hidden states, since we cache the residual output.
  3. The coeff is calculated under 28-step setting. It should work well for 50-step setting.
  4. The output hidden states is the one brefore normed.

@phyllispeng123
Copy link
Author

phyllispeng123 commented Jan 7, 2025

There may be some difference between our implementation.

  1. We use relative L1 Loss instead of L1 loss, not sure what are you using.
  2. The output in our setting is the residual output, i.e., output hidden states - input hidden states, instead of output hidden states, since we cache the residual output.
  3. The coeff is calculated under 28-step setting. It should work well for 50-step setting.
  4. The output hidden states is the one brefore normed.
  1. I used the equation (4) provided in the paper as my L1 loss, then I calculated output_diff = L1_loss (output hidden states, previous output hidden states ), input_diff = L1_loss ( input hidden states, previous input hidden states )
  2. Does that mean that the 'model output diff' in the paper always indicate the residual output ? The Fig 3 and the Fig 5 in the paper also means the relation between the L1_loss ( input hidden states, previous input hidden states ) and the L1_loss (output hidden states, input hidden states ) ?(the loss denominator are previous input hidden states, and input hidden states respectively ?) But I can not find where you define the model ouput diff, and I always reckon the model ouput diff should be the different between output hidden states and previous output hidden states.
  3. thanks for your comfirmation !
  4. yes, I use the output hidden states as model output and it is brefore normed.

@LiewFeng
Copy link
Collaborator

LiewFeng commented Jan 7, 2025

L1_rel(modulated input, previous modulated input) and L1_rel(residual output, previous residual output)

@phyllispeng123
Copy link
Author

phyllispeng123 commented Jan 7, 2025

L1_rel(modulated input, previous modulated input) and L1_rel(residual output, previous residual output)

截屏2025-01-07 15 41 57

OKK!! Now I get the point !!!!
I regenerate 100 pairs of (modulated input diff, residual output diff ) like you stated above using prompts in https://huggingface.co/datasets/k-mktr/improved-flux-prompts and get a very different coefficient = [-76.48384686 15.27823855 11.35678576 -0.87895694 0.12150872]. However, yours coefficient = [4.98651651e+02, -2.83781631e+02, 5.58554382e+01, -3.82021401e+00, 2.64230861e-01]. I wonder the differences is due to the datasets ? If you can share what datasets and how many dataset you used to generate the residual output and modulated input ? Did you do another adjusting to the coefficient ?

My way of getting the coefficient is like below:

def find_coefficient():
    output_diff = pd.read_csv('./output_diff.csv') ### residual output diff csv, shape=(100, 49), 100 prompts that have (49+1) inference step
    input_diff = pd.read_csv('./input_diff.csv') ### modulated input diff csv, shape=(100, 49), 100 prompts that have (49+1) inference step
    
    #### take mean out of 100 prompts
    x = input_diff.mean()
    y = output_diff.mean()
    
    #### 4th order fit
    coefficients = np.polyfit(x, y, 4)
    rescale_func = np.poly1d(coefficients)
    ypred = rescale_func(x)
    plt.clf()
    plt.figure(figsize=(8,8))
    plt.plot(np.log(x), np.log(y), '*',label='log residual output diff values',color='green')
    plt.plot(np.log(x), np.log(ypred), '.',label='log polyfit values',color='blue')
    plt.xlabel(f'log input_diff')
    plt.ylabel(f'log residual_output_diff')
    plt.ylim(-3,1)
    plt.legend(loc=4) 
    plt.title('4th order My Polynomial fitting ')
    plt.tight_layout()
    plt.savefig('residual_polynomial_fitting_log.png')
    
    #### 4th order fit using teacache coefficients
    coefficients = [4.98651651e+02, -2.83781631e+02,  5.58554382e+01, -3.82021401e+00, 2.64230861e-01]
    rescale_func = np.poly1d(coefficients)
    x = input_diff.mean()
    y = output_diff.mean()
    ypred = rescale_func(x)
    plt.clf()
    plt.figure(figsize=(8,8))
    plt.plot(np.log(x), np.log(y), '*',label='log residual output diff values',color='green')
    plt.plot(np.log(x), np.log(ypred), '.',label='log polyfit values',color='blue')
    plt.xlabel(f'log input_diff')
    plt.ylabel(f'log residual_output_diff')
    plt.legend(loc=4) 
    plt.ylim(-3,1)
    plt.title('4th order Teacache Polynomial fitting ')
    plt.tight_layout()
    plt.savefig('residual_polynomial_fitting_loggt.png')

@LiewFeng
Copy link
Collaborator

LiewFeng commented Jan 7, 2025

70 prompts from here.

Maybe you can try with 28 inference steps.

@LiewFeng
Copy link
Collaborator

LiewFeng commented Jan 9, 2025

Closed due to inactive. Feel free to reopen it if necessary.

@hkunzhe
Copy link

hkunzhe commented Jan 20, 2025

L1_rel(modulated input, previous modulated input) and L1_rel(residual output, previous residual output)

The relative L1 distance should be

relative_l1_distance = (torch.abs(prev - cur).mean()) / torch.abs(prev).mean()

Is it correct? @LiewFeng

@LiewFeng
Copy link
Collaborator

@hkunzhe . Yes.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants