Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Converting from integer-tokens to one-hot tokens gives different results. #179

Closed
codetalker7 opened this issue May 14, 2024 · 2 comments
Closed

Comments

@codetalker7
Copy link

I tried to use the "colbert-ir/colbertv2.0" pretrained checkpoint for a task (it's essentially a BERT model + a linear layer, for this issue we only focus on the BERT model). Here is how I loaded the model:

using CUDA
using Flux
using OneHotArrays
using Test
using Transformers
using Transformers.TextEncoders

const PRETRAINED_BERT = "colbert-ir/colbertv2.0"

bert_config = Transformers.load_config(PRETRAINED_BERT)
bert_tokenizer = Transformers.load_tokenizer(PRETRAINED_BERT)
bert_model = Transformers.load_model(PRETRAINED_BERT)

const VOCABSIZE = size(bert_tokenizer.vocab.list)[1]

Now, we'll simply run the bert_model over a bunch of sentences.

docs = [
    "hello world",
    "thank you!",
    "a",
    "this is some longer text, so length should be longer",
]

encoded_text = encode(bert_tokenizer, docs)
ids, mask = encoded_text.token, encoded_text.attention_mask

Above, by default, ids is a OneHotArray. We convert it to an integer matrix, containing integer token IDS:

integer_ids = Matrix(onecold(ids))

As expected, the bert_model gives the same results on the integer-ids as well as the one-hot encodings:

julia> @test isequal(bert_model((token = integer_ids, attention_mask=mask)), bert_model((token = ids, attention_mask=mask)))
Test Passed

Note that we can also convert from integer_ids back to the OneHotArray using the onehotbatch function. Here's a test just for a sanity check:

julia> @test isequal(ids, onehotbatch(integer_ids, 1:VOCABSIZE))             # test passes
Test Passed

However, if we convert back from the integer ids to the one-hot encodings, and use the converted one-hot encodings in the bert_model, the model throws an error:

julia> bert_model((token = onehotbatch(integer_ids, 1:VOCABSIZE), attention_mask=mask))
ERROR: ArgumentError: invalid index: false of type Bool
Stacktrace:
  [1] to_index(i::Bool)
    @ Base ./indices.jl:293
  [2] to_index(A::Matrix{Float32}, i::Bool)
    @ Base ./indices.jl:277
  [3] _to_indices1(A::Matrix{Float32}, inds::Tuple{Base.OneTo{Int64}}, I1::Bool)
    @ Base ./indices.jl:359
  [4] to_indices
    @ ./indices.jl:354 [inlined]
  [5] to_indices
    @ ./indices.jl:355 [inlined]
  [6] to_indices
    @ ./indices.jl:344 [inlined]
  [7] view
    @ ./subarray.jl:176 [inlined]
  [8] _view(X::Matrix{Float32}, colons::Tuple{Colon}, k::Bool)
    @ NNlib ~/.julia/packages/NNlib/Fg3DQ/src/scatter.jl:38
  [9] gather!(dst::Array{Float32, 4}, src::Matrix{Float32}, idx::OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}})
    @ NNlib ~/.julia/packages/NNlib/Fg3DQ/src/gather.jl:107
 [10] gather
    @ ~/.julia/packages/NNlib/Fg3DQ/src/gather.jl:46 [inlined]
 [11] Embed
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/embed.jl:43 [inlined]
 [12] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:108 [inlined]
 [13] WithArg
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:103 [inlined]
 [14] apply_on_namedtuple
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:80 [inlined]
 [15] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/layer.jl:0 [inlined]
 [16] (::Transformers.Layers.CompositeEmbedding{Tuple{Transformers.Layers.WithArg{(:token,), Transformers.Layers.Embed{Nothing, Matrix{Float32}}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:position,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.FixedLenPositionEmbed{Matrix{Float32}}, typeof(identity)}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:segment,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.Embed{Nothing, Matrix{Float32}}, typeof(Transformers.HuggingFace.bert_ones_like)}}}})(nt::NamedTuple{(:token, :attention_mask), Tuple{OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}}, NeuralAttentionlib.LengthMask{1, Vector{Int32}}}})
    @ Transformers.Layers ~/.julia/packages/Transformers/lD5nW/src/layers/layer.jl:620
 [17] apply_on_namedtuple
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:80 [inlined]
 [18] macro expansion
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:0 [inlined]
 [19] Chain
    @ ~/.julia/packages/Transformers/lD5nW/src/layers/architecture.jl:319 [inlined]
 [20] (::Transformers.HuggingFace.HGFBertModel{Transformers.Layers.Chain{Tuple{Transformers.Layers.CompositeEmbedding{Tuple{Transformers.Layers.WithArg{(:token,), Transformers.Layers.Embed{Nothing, Matrix{Float32}}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:position,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.FixedLenPositionEmbed{Matrix{Float32}}, typeof(identity)}}, Transformers.Layers.WithOptArg{(:hidden_state,), (:segment,), Transformers.Layers.ApplyEmbed{Base.Broadcast.BroadcastFunction{typeof(+)}, Transformers.Layers.Embed{Nothing, Matrix{Float32}}, typeof(Transformers.HuggingFace.bert_ones_like)}}}}, Transformers.Layers.DropoutLayer{Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}, Nothing}}}, Transformer{NTuple{12, Transformers.Layers.PostNormTransformerBlock{Transformers.Layers.DropoutLayer{Transformers.Layers.SelfAttention{NeuralAttentionlib.MultiheadQKVAttenOp{Nothing}, Transformers.Layers.Fork{Tuple{Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}, Nothing}, Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}, Transformers.Layers.DropoutLayer{Transformers.Layers.Chain{Tuple{Transformers.Layers.Dense{typeof(gelu), Matrix{Float32}, Vector{Float32}}, Transformers.Layers.Dense{Nothing, Matrix{Float32}, Vector{Float32}}}}, Nothing}, Transformers.Layers.LayerNorm{Vector{Float32}, Vector{Float32}, Float32}}}, Nothing}, Transformers.Layers.Branch{(:pooled,), (:hidden_state,), Transformers.HuggingFace.BertPooler{Transformers.Layers.Dense{typeof(tanh_fast), Matrix{Float32}, Vector{Float32}}}}})(nt::NamedTuple{(:token, :attention_mask), Tuple{OneHotArrays.OneHotArray{UInt32, 2, 3, Matrix{UInt32}}, NeuralAttentionlib.LengthMask{1, Vector{Int32}}}})
    @ Transformers.HuggingFace ~/.julia/packages/Transformers/lD5nW/src/huggingface/implementation/bert/load.jl:51
 [21] top-level scope
    @ REPL[26]:1
 [22] top-level scope
    @ ~/.julia/packages/CUDA/s5N6v/src/initialization.jl:190

Am I missing something here?

@chengchingwen
Copy link
Owner

You should use integer_ids = reinterpret(Int32, ids) and OneHotArray{VOCABSIZE}(integer_ids). The OneHotArray used in Transformers and Flux is different and the error happened because that OneHotArray does not overload gather

@codetalker7
Copy link
Author

Thanks for this! I didn't notice that the package was using it's own OneHotArray.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants