关于hadoop中的TeraSort演示,请build议症状是否如预期的那样,或者应该分发工作量。
启动Hadoop(集群中的3个节点)并在执行中运行TeraSort基准testing。
我预计所有3个节点都会忙,所有的CPU都会被利用(最高400%)。 但是,只有作业开始的节点处于忙碌状态,CPU没有被充分利用。 例如,如果它在sydspark02上启动,则顶部显示如下。
我想知道这是否如预期的那样,或者是否存在工作负载未在节点间分配的configuration问题。
sydspark02
top - 13:37:12 up 5 days, 2:58, 2 users, load average: 0.22, 0.06, 0.12 Tasks: 134 total, 1 running, 133 sleeping, 0 stopped, 0 zombie %Cpu(s): 27.5 us, 2.7 sy, 0.0 ni, 69.8 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem: 8175980 total, 1781888 used, 6394092 free, 68 buffers KiB Swap: 0 total, 0 used, 0 free. 532116 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2602 hadoop 20 0 2191288 601352 22988 S 120.7 7.4 0:15.52 java 1197 hadoop 20 0 105644 976 0 S 0.3 0.0 0:00.16 sshd 2359 hadoop 20 0 2756336 270332 23280 S 0.3 3.3 0:08.87 java
sydspark01
top - 13:38:32 up 2 days, 19:28, 2 users, load average: 0.15, 0.07, 0.11 Tasks: 141 total, 1 running, 140 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.2 us, 1.2 sy, 0.0 ni, 96.6 id, 2.1 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 10240364 total, 10092352 used, 148012 free, 648 buffers KiB Swap: 0 total, 0 used, 0 free. 8527904 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10635 hadoop 20 0 2766264 238540 23160 S 3.0 2.3 0:11.15 java 11353 hadoop 20 0 2770680 287504 22956 S 1.0 2.8 0:08.97 java 11057 hadoop 20 0 2888396 327260 23068 S 0.7 3.2 0:12.42 java
sydspark03
top - 13:44:21 up 5 days, 1:01, 1 user, load average: 0.00, 0.01, 0.05 Tasks: 124 total, 1 running, 123 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.7 us, 0.0 sy, 0.0 ni, 99.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem: 8175980 total, 4552876 used, 3623104 free, 1156 buffers KiB Swap: 0 total, 0 used, 0 free. 3818884 cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29374 hadoop 20 0 2729012 204180 22952 S 3.0 2.5 0:07.47 java
> sbin/start-dfs.sh Starting namenodes on [sydspark01] sydspark01: starting namenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-namenode-sydspark01.out sydspark03: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark03.out sydspark02: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark02.out sydspark01: starting datanode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-datanode-sydspark01.out Starting secondary namenodes [0.0.0.0] 0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-sydspark01.out > sbin/start-yarn.sh starting yarn daemons starting resourcemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-sydspark01.out sydspark01: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark01.out sydspark03: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark03.out sydspark02: starting nodemanager, logging to /usr/local/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-sydspark02.out > hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 100000000 /user/hadoop/terasort-input > hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /user/hadoop/terasort-input /user/hadoop/terasort-output
奴隶
sydspark01 sydspark02 sydspark03
核心sites.xml
<configuration> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/tmp</value> </property> <property> <name>fs.defaultFS</name> <value>hdfs://sydspark01:9000</value> </property> </configuration>
在VMWare上的Ubuntu 14.04.4 LTS 4CPU
Hadoop 2.7.3
Java 8
在执行作业的节点的数据节点上运行JMC时。
中央处理器
只有约25%的CPU资源(4个CPU中有1个)被使用。
记忆