Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Value error when loading saved DIEN model #511

Open
jefflao opened this issue Feb 20, 2023 · 0 comments
Open

Value error when loading saved DIEN model #511

jefflao opened this issue Feb 20, 2023 · 0 comments

Comments

@jefflao
Copy link

jefflao commented Feb 20, 2023

Describe the bug(问题描述)

I am training the DIEN model on a dataset with around 20 categorical features and 5 user behavior columns, all being strings. I was able to save the model with keras.save_model in .h5 format, but it throws the following error when I try to load the model with keras.load_model:

  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/utils/generic_utils.py", line 668, in deserialize_keras_object
    deserialized_obj = cls.from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 670, in from_config
    input_tensors, output_tensors, created_layers = reconstruct_from_config(
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1298, in reconstruct_from_config
    process_node(layer, node_data)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/functional.py", line 1244, in process_node
    output_tensors = layer(input_tensors, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 764, in __call__
    self._maybe_build(inputs)
  File "/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer_v1.py", line 2086, in _maybe_build
    self.build(input_shapes)
  File "/usr/local/lib/python3.8/dist-packages/deepctr/layers/sequence.py", line 255, in build
    raise ValueError('A `AttentionSequencePoolingLayer` layer requires '
ValueError: A `AttentionSequencePoolingLayer` layer requires inputs of a 3 tensor with shape (None,1,embedding_size),(None,T,embedding_size) and (None,1) Got different shapes: [TensorShape([None, 15, 35]), TensorShape([None, 1, 35]), TensorShape([None, 1])]

This seems to be an issue in the model reconstruction when calling load_model(). More details in the additional content section.

To Reproduce(复现步骤)

Model:

    model = DIEN(
        feature_columns,
        behavior_feat_list,
        dnn_hidden_units=[256, 128, 64],
        dnn_dropout=0.5,
        gru_type='AUGRU',
        use_negsampling=False,
        att_activation='sigmoid',
    )
    model.compile(Adam(learning_rate=1e-5), 'binary_crossentropy', metrics=['binary_crossentropy'])

Train and save model:

    history = model.fit(train_inputs,
                        'click',
                        verbose=True,
                        epochs=1,
                        batch_size=32,
                        validation_split='\t'
            )
    save_model(
        model,
        'dien.h5',
        save_format='h5',
    )

Load model (the part that raises the exception):

    from deepctr.layers import custom_objects
    loaded_model = load_model('dien.h5', custom_objects)

Operating environment(运行环境):

  • python version: 3.8
  • tensorflow version: 2.2-2.5. Encounter compatibility from numpy and TF for TF version >= 2.6
  • deepctr version: 0.9.3
  • CUDA version: 11.7
  • NVIDIA driver version: 515.65.01
  • base docker image: tensorflow/tensorflow-2.5.1-gpu

Additional context

I could not try tensorflow older than 2.2 due to driver compatibility issues. DeepCTR also doesn't work with 2.6 <= TF <= 2.11.

My model has the following structure (from `model.summary()):

genre (InputLayer)              [(None, 1)]          0                                            
__________________________________________________________________________________________________
hist_genre (InputLayer)         [(None, 15)]         0                                            
__________________________________________________________________________________________________
...
hash_28 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_15 (Hash)                  (None, 1)            0           genre[0][0]                      
__________________________________________________________________________________________________
hash_3 (Hash)                   (None, 15)           0           hist_genre[0][0]                 
__________________________________________________________________________________________________
...
sparse_seq_emb_hist_genre (Embe multiple             404         hash_3[0][0]                     
                                                                 hash_15[0][0]                    
                                                                 hash_28[0][0]                    
__________________________________________________________________________________________________
concat (Concat)                 (None, 15, 35)       0           sparse_seq_emb_hist_category[0][0
                                                                 sparse_seq_emb_hist_channel[0][0]
                                                                 sparse_seq_emb_hist_episode[0][0]
                                                                 sparse_seq_emb_hist_genre[0][0]  
                                                                 sparse_seq_emb_hist_part[0][0]   
                                                                 sparse_seq_emb_hist_feature0[0][
__________________________________________________________________________________________________
seq_length (InputLayer)         [(None, 1)]          0    

__________________________________________________________________________________________________
gru1 (DynamicGRU)               (None, 15, 35)       7455        concat[0][0]                     
                                                                 seq_length[0][0]                 
__________________________________________________________________________________________________
concat_2 (Concat)               (None, 1, 35)        0           sparse_seq_emb_hist_category[2][0
                                                                 sparse_seq_emb_hist_channel[2][0]
                                                                 sparse_seq_emb_hist_episode[2][0]
                                                                 sparse_seq_emb_hist_genre[2][0]  
                                                                 sparse_seq_emb_hist_part[2][0]   
                                                                 sparse_seq_emb_hist_feature0[2][
__________________________________________________________________________________________________
attention_sequence_pooling_laye (None, 1, 15)        10081       concat_2[0][0]                   
                                                                 gru1[0][0]                       
                                                                 seq_length[0][0]              
...

Following the stack track and a lot of extra debug messages, I believe load_model does not pass the inputs to embedding layers in the same order as the original model when reconstructing the model. Namely in tensorflow/python/keras/engine/functional.py , reconstruct_from_config(config, custom_objects, created_layers) builds the layers whenever all its inputs is ready. As a result, the outputs of the embedding layers in reconstructed model such as sparse_seq_emb_hist_genre could end up having the embedded historical behavior sequence (of shape (None, 15)) before the embedded sparse feature (of shape (None, 1)), i.e. output[0] is the embedded behavior sequence instead of output[1].

Multiple hash layers for the same input are also created when the model initializes the key embedding and query embedding for the attention layer due to a lack of sharing mechanism. This likely does not create a real issue as the two hashes should be identical.

I was able to make a work around by changing the order of the embedding look-up initialization in the dien model deepctr/models/sequence/dien.py:

    keys_emb_list = embedding_lookup(embedding_dict, features, history_feature_columns,
                                     return_feat_list=history_fc_names, to_list=True)
    dnn_input_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                          mask_feat_list=history_feature_list, to_list=True)
    # Move query embeddings from the first being initialized to the last.
    query_emb_list = embedding_lookup(embedding_dict, features, sparse_feature_columns,
                                      return_feat_list=history_feature_list, to_list=True)

This modification is definitely not safe. Please let me know if anyone has a better solution. Thank you in advance.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant