如何将大于 8 字节的 [u8] 数组转换为整数？

Question

由于数组的长度，我不能使用 i32::from_ne_bytes()，但当然，下面的工作特别是因为代码将运行仅在 cpu 架构上支持未对齐访问（或者由于长度较小，整个数组可能存储在多个 cpu 寄存器中）。

fn main() {
    let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
    println!("1 == {}", unsafe{std::ptr::read(&buf[1])} as i32);
}

但是有没有更简洁的方法来做到这一点，同时仍然不复制数组？

Answer 1

提取一个4字节的&[u8]切片并使用try_into()到convert it into a &[u8; 4] array reference. Then you can call i32::from_ne_bytes()。

use std::convert::TryInto;

fn main() {
    let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
    println!("{}", i32::from_ne_bytes((&buf[1..5]).try_into().unwrap()));
}

输出：

302055424

Playground

Answer 2

TL;DR: 实际上，只需使用 John Kugelman 的解决方案，复制 4 个字节是不可测量的。

最大的“测量”差异是 0.09 ps (239.79 - 239.70)。那是 90 飞秒，或 0.00009 纳秒。运行再次进行基准测试，将产生截然不同的结果（以皮秒为单位运行ge。）

测量复制4个字节是不现实的。我们远远低于纳秒，这是纯噪声。

test	`#[bench]`	`criterion`
`try_into`	0 ns	239.79 ps
reinterpret	0 ns	239.70 ps
bit unpack	0 ns	239.74 ps
`b.iter(\|\| 1)`		240.18 ps
`b.iter(\|\| 1)`		239.73 ps
`b.iter(\|\| 1)`		239.68 ps

为了好玩，将所有测试更改为 b.iter(|| 1)，您将收到以皮秒为单位波动的类似结果。

b.iter(|| 1) 测试的最大差异为 0.5 ps (240.18 - 239.68)。这是 0.5 ps 的“测量” 差异。那是 500 飞秒，或 0.0005 纳秒。

与我们进行“实际”“工作”时相比，这确实是一个更大的差异。这是纯噪音。

你说的是复制 4 个字节。这将无法衡量，即使“每一微秒都很重要”。仅此一项就无法在微秒内测量，也无法在纳秒内测量。

（我会避免重复评论中已经说过的内容。）

如果您不想使用 TryInto，那么您可以使用一些很好的旧位解包和位移位。（越界访问会引起恐慌。）

let i = (buf[1] as i32) |
        (buf[2] as i32) <<  8 |
        (buf[3] as i32) << 16 |
        (buf[4] as i32) << 24;
println!("{}", i);
// Prints `302055424`

或者，您也可以将 buf 重新解释为 *const i32 指针并取消引用它。但是，取消引用指针是 unsafe。（同样，越界访问可能导致恐慌。）

// let i = unsafe { &*((buf.as_ptr().add(1)) as *const i32) };
let i = unsafe { &*((buf.as_ptr().offset(1)) as *const i32) };
println!("{:?}", i);
// Prints `302055424`

因此您需要复制 4 个字节的最佳性能解决方案。好吧，让我们采用 John Kugelman 的解决方案和前两个解决方案并对其进行基准测试。

// benches/bench.rs
#![feature(test)]

extern crate test;
use test::Bencher;

use std::convert::TryInto;

#[bench]
fn bench_try_into(b: &mut Bencher) {
    b.iter(|| {
        let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
        i32::from_ne_bytes((&buf[1..5]).try_into().unwrap())
    });
}

#[bench]
fn bench_reinterpret(b: &mut Bencher) {
    b.iter(|| {
        let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
        unsafe { &*((buf.as_ptr().offset(1)) as *const i32) }
    });
}

#[bench]
fn bench_bit_unpack(b: &mut Bencher) {
    b.iter(|| {
        let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
        (buf[1] as i32) | (buf[2] as i32) << 8 | (buf[3] as i32) << 16 | (buf[4] as i32) << 24
    });
}

现在让我们通过执行 cargo +nightly bench.

来进行基准测试

running 3 tests
test bench_bit_unpack  ... bench:           0 ns/iter (+/- 0)
test bench_reinterpret ... bench:           0 ns/iter (+/- 0)
test bench_try_into    ... bench:           0 ns/iter (+/- 0)

就像我推测的那样，复制 4 个字节将无法测量。

现在，让我们尝试使用 criterion. Maybe the test crate 进行基准测试（现实并且）限制在纳秒内，谁知道呢。

// benches/bench.rs
use criterion::{criterion_group, criterion_main, Criterion};
use std::convert::TryInto;

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("try_into", |b| {
        b.iter(|| {
            let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
            i32::from_ne_bytes((&buf[1..5]).try_into().unwrap())
        })
    });

    c.bench_function("reinterpret", |b| {
        b.iter(|| {
            let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
            unsafe { &*((buf.as_ptr().offset(1)) as *const i32) }
        })
    });

    c.bench_function("bit_unpack", |b| {
        b.iter(|| {
            let buf: [u8; 10] = [0, 0, 0, 1, 0x12, 14, 50, 120, 250, 6];
            (buf[1] as i32) | (buf[2] as i32) << 8 | (buf[3] as i32) << 16 | (buf[4] as i32) << 24
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

# Cargo.toml
[dev-dependencies]
criterion = "0.3.3"

[[bench]]
name = "bench"
harness = false

现在，让我们通过执行 cargo bench.

来进行基准测试

try_into                time:   [239.69 ps 239.79 ps 239.91 ps]
                        change: [+0.0101% +0.0700% +0.1316%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  7 (7.00%) high severe

reinterpret             time:   [239.63 ps 239.70 ps 239.78 ps]
                        change: [-0.7006% -0.2163% +0.0525%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 11 outliers among 100 measurements (11.00%)
  4 (4.00%) high mild
  7 (7.00%) high severe

bit_unpack              time:   [239.65 ps 239.74 ps 239.84 ps]
                        change: [-0.0768% +0.0775% +0.2867%] (p = 0.45 > 0.05)
                        No change in performance detected.
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  8 (8.00%) high severe

test	`#[bench]`	`criterion`
`try_into`	0 ns	239.79 ps
reinterpret	0 ns	239.70 ps
bit unpack	0 ns	239.74 ps

因此平均测量值为 239.79 ps、239.70 ps 和 239.74 ps。所以最大的“测量”差异是 0.09 ps。那是 90 飞秒，或 0.00009 纳秒。运行再次进行基准测试，会产生不同的结果。单独测量某些东西，因为复制 4 个字节是不现实的。

当然，在那一瞬间，“重新解释”是“最快的”，但我们远远低于纳秒，这纯属噪音。

使用您喜欢的解决方案，它们之间没有任何可衡量或显着的性能差异。

为了好玩，将所有测试更改为 b.iter(|| 1)，您将收到以皮秒为单位波动的类似结果。

c.bench_function("1", |b| b.iter(|| 1_i32));
c.bench_function("2", |b| b.iter(|| 1_i32));
c.bench_function("3", |b| b.iter(|| 1_i32));

运行基准测试将产生类似的结果。我运行它一次得到了 240.18 ps、239.73 ps 和 239.68 ps。这是 0.5 ps 的“测量” 差异。那是 500 飞秒，或 0.0005 纳秒。

与我们进行“实际”“工作”时相比，这确实是一个更大的差异。同样，这是纯粹的噪音。这不足以以任何重要的方式衡量“工作”。

同样，使用您喜欢的解决方案，它们之间没有任何可衡量或显着的性能差异。

如何将大于 8 字节的 [u8] 数组转换为整数？

How to cast an [u8] array larger than 8 bytes to an integer?

arrays

x86-64

rust