如何debugging呆滞的进程性能

我有一个testing应用程序,它将一个string写入一个文件10000次。 我在两台服务器上运行这个,我得到完全不同的结果。

[SYSTEM 1] $ time ./test_depth real 0m0.152s user 0m0.070s sys 0m0.082s $ strace -cT ./test_depth % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 57.38 0.000420 0 10000 write 30.19 0.000221 0 9924 fadvise64 12.43 0.000091 0 10000 gettid 0.00 0.000000 0 11 read 0.00 0.000000 0 66 54 open 0.00 0.000000 0 11 close 0.00 0.000000 0 7 6 stat 0.00 0.000000 0 13 fstat 0.00 0.000000 0 2 lseek 0.00 0.000000 0 32 mmap 0.00 0.000000 0 15 mprotect 0.00 0.000000 0 2 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 uname 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 1 unlink 0.00 0.000000 0 1 symlink 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 3 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.000732 30105 63 total [SYSTEM 2] $ time ./test_depth real 1m0.252s user 0m0.163s sys 0m0.307s $ strace -cT ./test_depth % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 98.29 0.147977 15 10000 write 1.56 0.002355 0 9925 fadvise64 0.14 0.000217 0 10000 gettid 0.00 0.000006 1 11 close 0.00 0.000000 0 11 read 0.00 0.000000 0 57 45 open 0.00 0.000000 0 3 3 stat 0.00 0.000000 0 13 fstat 0.00 0.000000 0 2 lseek 0.00 0.000000 0 32 mmap 0.00 0.000000 0 15 mprotect 0.00 0.000000 0 2 munmap 0.00 0.000000 0 3 brk 0.00 0.000000 0 2 rt_sigaction 0.00 0.000000 0 1 rt_sigprocmask 0.00 0.000000 0 1 1 access 0.00 0.000000 0 1 execve 0.00 0.000000 0 2 uname 0.00 0.000000 0 3 fcntl 0.00 0.000000 0 1 unlink 0.00 0.000000 0 1 symlink 0.00 0.000000 0 1 getrlimit 0.00 0.000000 0 1 arch_prctl 0.00 0.000000 0 3 1 futex 0.00 0.000000 0 1 set_tid_address 0.00 0.000000 0 1 set_robust_list ------ ----------- ----------- --------- --------- ---------------- 100.00 0.150555 30093 50 total 

该过程在第二个系统上运行约一分钟。 看着这个strace,它在每个write电话上花费15微秒。 我如何从这里继续诊断问题? 这看起来像在第二个系统上的光盘写入需要更长的时间。 我如何确定它是真的吗? 而且,还有什么我应该看的?

在两种情况下使用的坐骑:

 1) - /dev/sda1 on / type ext4 (rw) 2) - /dev/mapper/VolGroup-lv_home on /home type ext4 (rw) 

顺便说一下,日志正在使用glog lib编写。 如果使用简单的printf ,则不会出现此问题。 所以这看起来像是一个图书馆的问题。 🙁

 [root@localhost src]# hdparm -I /dev/sda /dev/sda: SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0d 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ATA device, with non-removable media Standards: Likely used: 1 Configuration: Logical max current cylinders 0 0 heads 0 0 sectors/track 0 0 -- Logical/Physical Sector size: 512 bytes device size with M = 1024*1024: 0 MBytes device size with M = 1000*1000: 0 MBytes cache/buffer size = unknown Capabilities: IORDY not likely Cannot perform double-word IO R/W multiple sector transfer: not supported DMA: not supported PIO: pio0 [root@localhost src]# df -k Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/VolGroup-lv_root 51606140 6267948 42716752 13% / tmpfs 8141564 1016 8140548 1% /dev/shm /dev/sda3 495844 37728 432516 9% /boot /dev/mapper/VolGroup-lv_home 1090839440 1022652 1034405284 1% /home [root@localhost src]# ./lshw -class disk *-disk description: SCSI Disk product: PERC H310 vendor: DELL physical id: 2.0.0 bus info: scsi@0:2.0.0 logical name: /dev/sda version: 2.12 serial: 00677e1515fc98ef19000d93f7a0a38c size: 1117GiB (1199GB) capabilities: partitioned partitioned:dos configuration: ansiversion=5 logicalsectorsize=512 sectorsize=512 signature=31c27a66 

操作系统:

 [SYSTEM 1] Linux snapoffice 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64 x86_64 x86_64 GNU/Linux [SYSTEM 2] Linux localhost.localdomain 2.6.32-279.el6.x86_64 #1 SMP Fri Jun 22 12:19:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux