Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
benchmark: include convert latency in bench_append_paged_kv_cache (#590)
``` model: l1b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] convert: 45us 1layer: 7us 16layers: 151us throughput: 4.936GB/s model: l1b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] convert: 42us 1layer: 14us 16layers: 271us throughput: 1434.769GB/s model: l1b seqlens: [5000] convert: 44us 1layer: 14us 16layers: 272us throughput: 1438.581GB/s model: l1b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 46us 1layer: 14us 16layers: 274us throughput: 1440.357GB/s --- model: l3b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] convert: 42us 1layer: 7us 28layers: 226us throughput: 9.946GB/s model: l3b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] convert: 43us 1layer: 22us 28layers: 647us throughput: 1896.687GB/s model: l3b seqlens: [5000] convert: 42us 1layer: 22us 28layers: 646us throughput: 1898.796GB/s model: l3b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 41us 1layer: 22us 28layers: 648us throughput: 1890.115GB/s --- model: l8b seqlens: [1, 1, 1, 1, 1, 1, 1, 1] convert: 41us 1layer: 7us 32layers: 252us throughput: 9.940GB/s model: l8b seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] convert: 42us 1layer: 21us 32layers: 730us throughput: 1905.826GB/s model: l8b seqlens: [5000] convert: 41us 1layer: 22us 32layers: 729us throughput: 1903.697GB/s model: l8b seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 47us 1layer: 22us 32layers: 737us throughput: 1899.630GB/s --- model: l70b-tp8 seqlens: [1, 1, 1, 1, 1, 1, 1, 1] convert: 42us 1layer: 6us 80layers: 552us throughput: 1.283GB/s model: l70b-tp8 seqlens: [4993, 1, 1, 1, 1, 1, 1, 1] convert: 41us 1layer: 9us 80layers: 800us throughput: 539.484GB/s model: l70b-tp8 seqlens: [5000] convert: 41us 1layer: 9us 80layers: 788us throughput: 548.648GB/s model: l70b-tp8 seqlens: [625, 625, 625, 625, 625, 625, 625, 625] convert: 41us 1layer: 10us 80layers: 803us throughput: 537.731GB/s ```
- Loading branch information