与每个项目相比,批量使用 transformers 分词器时速度是否有显着提高?
Is there a significant speed improvement when using transformers tokenizer over batch compared to per item?
对批次调用分词器是否比对批次中的每个项目调用分词器快得多?例如
encodings = tokenizer(sentences)
# vs
encodings = [tokenizer(x) for x in sentences]
我最后只是计时两者以防其他人感兴趣
%%timeit
for _ in range(10**4): tokenizer("Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
785 ms ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
tokenizer(["Lorem ipsum dolor sit amet, consectetur adipiscing elit."]*10**4)
266 ms ± 6.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
对批次调用分词器是否比对批次中的每个项目调用分词器快得多?例如
encodings = tokenizer(sentences)
# vs
encodings = [tokenizer(x) for x in sentences]
我最后只是计时两者以防其他人感兴趣
%%timeit
for _ in range(10**4): tokenizer("Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
785 ms ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
tokenizer(["Lorem ipsum dolor sit amet, consectetur adipiscing elit."]*10**4)
266 ms ± 6.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)