-
-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
fix: SwTokenizer getstate #136
Conversation
안녕하세요 @Bing-su 님, |
https://docs.python.org/ko/3.11/library/pickle.html?highlight=pickle#object.__setstate__
(kiwi)
kiwipiepy on pickle via △ v3.27.0 via 🐍 v3.10.12 via 🅒 kiwi took 2s
❯ python .\test\test_transformers_addon.py
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
(kiwi)
kiwipiepy on pickle via △ v3.27.0 via 🐍 v3.10.12 via 🅒 kiwi took 3s
❯ 그리고 피클 라이브러리들로 피클화한 뒤, 비교해보는 테스트를 진행해보았습니다. import pickle
import dill
import cloudpickle
import kiwipiepy.transformers_addon
from transformers import AutoTokenizer
repo = "kiwi-farm/roberta-base-32k"
orig = AutoTokenizer.from_pretrained(repo)
with open("pk1.pkl", "wb") as f:
pickle.dump(orig, f)
with open("pk2.pkl", "wb") as f:
dill.dump(orig, f)
with open("pk3.pkl", "wb") as f:
cloudpickle.dump(orig, f) from itertools import permutations
with open("pk1.pkl", "rb") as f:
upk1 = pickle.load(f)
with open("pk2.pkl", "rb") as f:
upk2 = dill.load(f)
with open("pk3.pkl", "rb") as f:
upk3 = cloudpickle.load(f)
for (tk1, tk2) in permutations([orig, upk1, upk2, upk3], 2):
for (k, v1), (_, v2) in zip(tk1.__dict__.items(), tk2.__dict__.items()):
if k != "_tokenizer":
assert getattr(tk1, k) == getattr(tk2, k)
else:
assert vars(getattr(tk1, k)) == vars(getattr(tk2, k))
print("ok!")
|
@Bing-su property만 찍어보면 정상적으로 작동하는 것처럼 보일 수 있지만, 내부의 c++로 구현된 object를 호출하는 부분이 연결되면 아마 오류가 뜰 것으로 예상되어서요. test에서 예상대로 unpickle후 kiwi를 사용하는 부분에서 segmentation fault가 발생하고 있습니다. c++단에서
|
말씀하신게 맞습니다. 더 테스트를 해보고 다시 찾아오겠습니다. 감사합니다. |
fixes: #135
https://docs.python.org/ko/3.11/library/pickle.html?highlight=pickle#pickling-class-instances
python 3.11부터는 __getstate__가 정의되어있지 않을때의 기본 동작을 정의함으로써 이 문제를 해결한 것으로 보입니다.python 3.11에서도 같은 에러 발생python 3.10이하에서는 여전히 필요합니다.