使用Debian Wheezy(7.9)和Debian Jessie(8.2)上的cryptsetup 1.4.3和1.6.6,写入和读取性能之间意想不到的差异

我一直在设置raid1arrays,并使用默认选项cryptsetup设置encryption媒体。 RAIDarrays应该使用2个驱动器,但目前,我只在每个RAID1arrays中有1个驱动器,以比较它们之间的性能

性能

未encryption的数组

写性能

dd if=/dev/zero of=/media/storage/Temp/test.img bs=100M count=10 10+0 records in 10+0 records out 1048576000 bytes (1.0 GB) copied, 7.35153 s, 143 MB/s

最高输出:

 top - 10:30:02 up 2 days, 19:18, 2 users, load average: 0.00, 0.16, 0.72 Tasks: 147 total, 3 running, 144 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 21.4 sy, 0.0 ni, 75.0 id, 0.9 wa, 0.0 hi, 2.7 si, 0.0 st KiB Mem: 4044256 total, 1135880 used, 2908376 free, 224624 buffers KiB Swap: 7812496 total, 123488 used, 7689008 free, 470796 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11591 root 20 0 109m 100m 572 R 98.5 2.5 0:03.12 dd 11592 root 20 0 0 0 0 R 98.5 0.0 0:00.24 flush-9:1 203 root 20 0 0 0 0 S 52.1 0.0 0:15.59 md1_raid1 

这里的一切都像预期的一样

阅读性能

 hdparm -t /dev/md1 /dev/md1: Timing buffered disk reads: 574 MB in 3.01 seconds = 190.95 MB/sec 

encryption数组

写性能

 dd if=/dev/zero of=/dev/mapper/galerkin_storage bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 209.058 s, 50.2 MB/s 

最高输出:

 top - 10:12:20 up 2 days, 19:00, 2 users, load average: 5.65, 2.92, 1.60 Tasks: 149 total, 6 running, 143 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 21.4 sy, 0.0 ni, 74.9 id, 0.9 wa, 0.0 hi, 2.7 si, 0.0 st KiB Mem: 4044256 total, 3749816 used, 294440 free, 3155712 buffers KiB Swap: 7812496 total, 132464 used, 7680032 free, 40892 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10940 root 20 0 0 0 0 R 99.0 0.0 1:49.99 kworker/2:1 11538 root 20 0 0 0 0 R 94.5 0.0 1:28.32 kworker/3:1 11486 root 20 0 0 0 0 R 63.0 0.0 2:13.37 kworker/1:2 11489 root 20 0 0 0 0 R 27.0 0.0 0:52.80 flush-253:0 10910 root 20 0 0 0 0 R 22.5 0.0 2:06.59 kworker/0:2 1305 root 20 0 0 0 0 S 18.0 0.0 338:40.46 md3_raid1 11490 root 20 0 0 0 0 S 13.5 0.0 1:31.37 kworker/0:1 11539 root 20 0 109m 100m 572 D 13.5 2.5 0:23.25 dd 

阅读性能

 hdparm -t /dev/mapper/galerkin_storage /dev/mapper/galerkin_storage: Timing buffered disk reads: 84 MB in 3.03 seconds = 27.73 MB/sec 

使用dd

 dd if=/dev/mapper/galerkin_storage of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 369.272 s, 28.4 MB/s 

顶级输出

 top - 10:29:49 up 3 days, 19:18, 2 users, load average: 2.14, 2.69, 1.69 Tasks: 148 total, 2 running, 146 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.1 us, 15.8 sy, 0.0 ni, 81.4 id, 0.8 wa, 0.0 hi, 2.0 si, 0.0 st KiB Mem: 4044256 total, 1586852 used, 2457404 free, 1070080 buffers KiB Swap: 7812496 total, 115916 used, 7696580 free, 67056 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 13963 root 20 0 0 0 0 R 84.9 0.0 3:55.93 kworker/2:0 13773 root 20 0 0 0 0 S 30.3 0.0 2:38.38 kworker/3:2 14158 root 20 0 109m 100m 572 D 18.2 2.5 0:08.50 dd 14170 robert 20 0 23168 1448 1076 R 6.1 0.0 0:00.02 top 1 root 20 0 10648 708 704 S 0.0 0.0 0:05.26 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.17 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 1:05.31 ksoftirqd/0 5 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u:0 6 root rt 0 0 0 0 S 0.0 0.0 0:00.14 migration/0 

我的结论

写性能似乎受到我的CPU性能的限制,因为顶级报告kworker使用60-98%的CPU。 我可以接受,我的英特尔凌动双核心是为了performance。 令我惊讶的是读取性能(1)低于写入性能,(2)似乎不受CPU性能的限制。

我的观点是,读取的性能应该大致等于写入性能? 我应该简单地更新到最新版本的debian,而不是考古学? 我正在使用(1.4.3)的cryptsetup版本不像写作那么multithreading化? 写入似乎使用4个不同的线程,而写作使用4?

我已经看过这个问题Debian Squeeze下LUKS / LVM / RAID组合的性能非常差,但我似乎没有相同的问题,因为我的顶级输出显示4个进程kryptd,暗示我的cryptsetup是真正的multithreading。

背景信息

raid1arrays此刻只包含1个驱动器,因为我想将它们相互比较。 luksDump我的encryption媒体

 LUKS header information for /dev/md3 Version: 1 Cipher name: aes Cipher mode: cbc-essiv:sha256 Hash spec: sha1 Payload offset: 4096 MK bits: 256 MK digest: MK salt: MK iterations: 12250 UUID: 022e94a0-9dce-45c1-806b-9fb54cfabf9b Key Slot 0: ENABLED Iterations: 49360 Salt: Key material offset: 8 AF stripes: 4000 Key Slot 1: DISABLED Key Slot 2: DISABLED Key Slot 3: DISABLED Key Slot 4: DISABLED Key Slot 5: DISABLED Key Slot 6: DISABLED Key Slot 7: DISABLED 

核心

 uname -ra Linux galerkin 3.2.0-4-amd64 #1 SMP Debian 3.2.73-2+deb7u2 x86_64 GNU/Linux 

Debian版本

 cat /etc/debian_version 7.9 

Cryptsetup版本

 cryptsetup --version cryptsetup 1.4.3 

encryption数组已经build立

 cryptsetup -v luksFormat /dev/md3 --key-file=/root/key-file 

RAIDarrays被设置了

 mdadm --create /dev/md3 --level=1 --raid-devices=2 /dev/sda4 missing 

cpuinfo中

 cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 28 model name : Intel(R) Atom(TM) CPU D525 @ 1.80GHz stepping : 10 microcode : 0x107 cpu MHz : 1800.136 cache size : 512 KB 

CPU被报告为上述4个。

编辑:在标题中给出错误的版本。 正确的是7.9(Wheezy)。

编辑:更新到cryptsetup 1.6.6

 cryptsetup benchmark # Tests are approximate using memory only (no storage IO). PBKDF2-sha1 204800 iterations per second PBKDF2-sha256 151703 iterations per second PBKDF2-sha512 79824 iterations per second PBKDF2-ripemd160 169562 iterations per second PBKDF2-whirlpool 30913 iterations per second # Algorithm | Key | Encryption | Decryption aes-cbc 128b 39.5 MiB/s 43.5 MiB/s serpent-cbc 128b 29.3 MiB/s 32.0 MiB/s twofish-cbc 128b 34.0 MiB/s 46.4 MiB/s aes-cbc 256b 30.6 MiB/s 32.8 MiB/s serpent-cbc 256b 29.8 MiB/s 32.0 MiB/s twofish-cbc 256b 34.4 MiB/s 46.5 MiB/s aes-xts 256b 43.0 MiB/s 44.2 MiB/s serpent-xts 256b 31.5 MiB/s 32.3 MiB/s twofish-xts 256b 33.1 MiB/s 34.2 MiB/s aes-xts 512b 32.7 MiB/s 33.2 MiB/s serpent-xts 512b 31.8 MiB/s 32.3 MiB/s twofish-xts 512b 33.4 MiB/s 34.1 MiB/s 

cryptsetup 1.6.6encryptionarrays的新性能测量

写性能

 dd if=/dev/zero of=/dev/mapper/galerkin_storage bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 207.493 s, 50.5 MB/s 

最高的logging在写

 top - 21:42:48 up 22 min, 2 users, load average: 2.96, 1.07, 0.69 Tasks: 142 total, 7 running, 135 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 12.6 sy, 0.0 ni, 82.6 id, 4.2 wa, 0.0 hi, 0.4 si, 0.0 st KiB Mem: 4044256 total, 3252544 used, 791712 free, 2721776 buffers KiB Swap: 7812496 total, 44 used, 7812452 free, 65520 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4379 root 20 0 0 0 0 R 93.5 0.0 0:24.72 kworker/1:2 4377 root 20 0 0 0 0 R 82.5 0.0 0:03.55 kworker/2:0 4378 root 20 0 0 0 0 R 82.5 0.0 0:31.93 kworker/3:1 4336 root 20 0 0 0 0 R 55.0 0.0 0:33.53 kworker/0:0 189 root 20 0 0 0 0 S 44.0 0.0 0:13.94 md3_raid1 4380 root 20 0 105m 100m 540 R 11.0 2.5 0:09.26 dd 4396 robert 20 0 23348 1396 1032 R 11.0 0.0 0:00.03 top 1 root 20 0 15468 900 740 S 0.0 0.0 0:01.15 init 

阅读性能

 dd if=/dev/mapper/galerkin_storage of=/dev/null bs=100M count=100 100+0 records in 100+0 records out 10485760000 bytes (10 GB) copied, 368.387 s, 28.5 MB/s 

阅读过程中的最高logging:

 top - 21:25:17 up 4 min, 2 users, load average: 0.57, 0.20, 0.09 Tasks: 141 total, 2 running, 139 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.8 us, 3.7 sy, 0.0 ni, 91.9 id, 3.6 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 4044256 total, 1055628 used, 2988628 free, 611612 buffers KiB Swap: 7812496 total, 0 used, 7812496 free, 130004 cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11 root 20 0 0 0 0 R 54.5 0.0 0:07.55 kworker/0:1 30 root 20 0 0 0 0 S 30.3 0.0 0:10.40 kworker/2:1 9 root 20 0 0 0 0 S 24.2 0.0 0:02.59 kworker/1:0 4287 root 20 0 105m 100m 540 D 24.2 2.5 0:04.63 dd 4288 root 20 0 0 0 0 S 12.1 0.0 0:04.24 kworker/3:2 4306 robert 20 0 23348 1404 1032 R 6.1 0.0 0:00.02 top 1 root 20 0 15468 900 740 S 0.0 0.0 0:01.13 init 

用hdparm

 hdparm -t /dev/mapper/galerkin_storage /dev/mapper/galerkin_storage: Timing buffered disk reads: 84 MB in 3.06 seconds = 27.44 MB/sec 

所以,读取性能仍然远远低于写入性能。 如果我正确解释luksDump,我有一个256位的aes-cbc。 基准testing的命令表明它应该在我的dd基准testing区域有读取性能。 然而,写入性能出乎意料地高。 有一件事给我留下了深刻的印象。 我之前用/ dev / zero填充了encryption分区,所以不需要执行写操作,因为数据已经是零了?