apachehadoop-創(chuàng)新互聯(lián)

兩年多沒有搭建過apache hadoop的環(huán)境了,昨天再次搭建hadoop環(huán)境,將過程記錄下來,以便以后查閱。
主機(jī)角色分配
NameNode、DFSZKFailoverController 角色由 oversea-stable、bus-stable 服務(wù)器承擔(dān);需要安裝軟件有:JDK、Hadoop2.9.1
ResourceManager角色由 oversea-stable 服務(wù)器承擔(dān);需要安裝軟件有:JDK、Hadoop2.9.1
JournalNode、DataNode、NodeManager角色由open-stable、permission-stable、sp-stable服務(wù)器承擔(dān);需要安裝軟件有:JDK、Hadoop2.9.1
zookeeper cluster的QuorumPeerMain角色由open-stable、permission-stable、sp-stable服務(wù)器承擔(dān);需要安裝軟件有:JDK、zookeeper3.4.12

創(chuàng)新互聯(lián)主要業(yè)務(wù)有網(wǎng)站營銷策劃、成都網(wǎng)站建設(shè)、網(wǎng)站制作、微信公眾號開發(fā)、成都微信小程序、H5建站、程序開發(fā)等業(yè)務(wù)。一次合作終身朋友,是我們奉行的宗旨;我們不僅僅把客戶當(dāng)客戶,還把客戶視為我們的合作伙伴,在開展業(yè)務(wù)的過程中,公司還積累了豐富的行業(yè)經(jīng)驗(yàn)、成都全網(wǎng)營銷資源和合作伙伴關(guān)系資源,并逐漸建立起規(guī)范的客戶服務(wù)和保障體系。 

1、環(huán)境設(shè)置
(1) 設(shè)置主機(jī)名,并配置本地解析(主機(jī)名與解析必須配置一致,否則journalnode無法啟動(dòng))

[root@oversea-stable ~]# cat /etc/hosts
192.168.20.68 oversea-stable
192.168.20.67 bus-stable
192.168.20.66 open-stable
192.168.20.65 permission-stable
192.168.20.64 sp-stable
[root@oversea-stable ~]#

并將該文件同步到所有機(jī)器 上。
(2) 各節(jié)點(diǎn)同步時(shí)間
(3) 同步j(luò)dk,并在所有節(jié)點(diǎn)上安裝jdk
(4) 配置環(huán)境變量
在/etc/profile文件中加入如下設(shè)置:

export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/opt/hadoop
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH

2、配置SSH 密鑰,并復(fù)制給本機(jī)(ssh本機(jī)時(shí)也需要免密碼登錄)
在所有機(jī)器上如下操作:
(1) 創(chuàng)建hadoop用戶,useradd hadoop
(2) 設(shè)置hadoop用戶的密碼: echo "xxxxxxxx" | passwd --stdin hadoop
在其中一臺server上切換hadoop: su - hadoop
并生成 ssh 密鑰: ssh-keygen -b 2048 -t rsa
同步密鑰到其它server 上 : scp -r .ssh server_name:~/
每臺server 切換 hadoop用戶,驗(yàn)證是否能夠免密登錄其它server

3、配置zookeeper
在 open-stable 、permission-stable、sp-stable server 上配置zookeeper cluster,如下操作:

[root@open-stable ~]# chmod o+w /opt
[root@open-stable ~]# su - hadoop
[hadoop@open-stable ~]$  wget http://mirrors.hust.edu.cn/apache/zookeeper/zookeeper-3.4.12/zookeeper-3.4.12.tar.gz
[hadoop@open-stable ~]$  tar xfz zookeeper-3.4.12.tar.gz  -C  /opt
[hadoop@open-stable ~]$ cd /opt/
[hadoop@open-stable opt]$  mv zookeeper{-3.4.12,}
[hadoop@open-stable opt]$ cd zookeeper/
[hadoop@open-stable zookeeper]$ cp conf/zoo_sample.cfg  conf/zoo.cfg
[hadoop@open-stable zookeeper]$ vim conf/zoo.cfg
[hadoop@open-stable zookeeper]$ grep -Pv "^(#|$)" conf/zoo.cfg
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/opt/zookeeper/zkdata
dataLogDir=/opt/zookeeper/zklogs
clientPort=2181
server.6=open-stable:2888:3888
server.5=permission-stable:2888:3888
server.4=sp-stable:2888:3888
[hadoop@open-stable zookeeper]$  mkdir zkdata
[hadoop@open-stable zookeeper]$  mkdir zklogs
[hadoop@open-stable zookeeper]$ echo 6 > zkdata/myid
[hadoop@open-stable zookeeper]$ bin/zkServer.sh start
其它server 配置相同

[hadoop@open-stable zookeeper]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: leader
[hadoop@open-stable zookeeper]$ 

[hadoop@permission-stable zookeeper]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[hadoop@permission-stable zookeeper]$

[hadoop@sp-stable zookeeper]$ bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: follower
[hadoop@sp-stable zookeeper]$

4、配置hadoop
(1) 在其中一臺上配置hadoop ,如下操作:

[hadoop@oversea-stable ~]$ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.9.1/hadoop-2.9.1.tar.gz
[hadoop@oversea-stable ~]$  tar xfz hadoop-2.9.1.tar.gz -C /opt/
[hadoop@oversea-stable ~]$  cd /opt/
[hadoop@oversea-stable opt]$   ln -s hadoop-2.9.1 hadoop
[hadoop@oversea-stable opt]$     cd hadoop/etc/hadoop

[hadoop@oversea-stable hadoop]$ grep JAVA_HOME hadoop-env.sh 
export JAVA_HOME=/usr/java/latest
[hadoop@oversea-stable hadoop]$

[hadoop@oversea-stable hadoop]$ tail -14  core-site.xml    
<configuration>
  <!-- 指定hdfs的nameservice為inspiryhdfs -->
  <property>
        <name>fs.defaultFS</name>
        <value>hdfs://inspiryhdfs</value>
  </property>
    <!-- 指定hadoop臨時(shí)目錄 -->
  <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop/tmp</value>
  </property>
    <!-- 指定zookeeper地址 -->
  <property>
        <name>ha.zookeeper.quorum</name>
        <value>open-stable:2181,permission-stable:2181,sp-stable:2181</value>
  </property>
</configuration>
[hadoop@oversea-stable hadoop]$

[hadoop@oversea-stable hadoop]$ tail -50 hdfs-site.xml 
<configuration>
  <!--指定hdfs的nameservice為inspiryhdfs,需要和core-site.xml中的保持一致 -->
  <property>
        <name>dfs.nameservices</name>
        <value>inspiryhdfs</value>
  </property>
    <!-- 定義inspiryhdfs下有兩個(gè)NameNode,分別是nn1,nn2 -->
  <property>
        <name>dfs.ha.namenodes.inspiryhdfs</name>
        <value>nn1,nn2</value>
  </property>
    <!-- 分別指定nn1的RPC通信地址、與http通信地址 -->
  <property>
        <name>dfs.namenode.rpc-address.inspiryhdfs.nn1</name>
        <value>oversea-stable:9000</value>
  </property>
  <property>
        <name>dfs.namenode.http-address.inspiryhdfs.nn1</name>
        <value>oversea-stable:50070</value>
  </property>
    <!-- 分別指定nn2的RPC通信地址、與http通信地址 -->
  <property>
        <name>dfs.namenode.rpc-address.inspiryhdfs.nn2</name>
        <value>bus-stable:9000</value>
  </property>
  <property>
        <name>dfs.namenode.http-address.inspiryhdfs.nn2</name>
        <value>bus-stable:50070</value>
  </property>
    <!-- 指定NameNode 元數(shù)據(jù)存放的JournalNode -->
  <property>
        <name>dfs.namenode.shared.edits.dir</name>
        <value>qjournal://open-stable:8485;permission-stable:8485;sp-stable:8485/inspiryhdfs</value>
  </property>
    <!-- 指定JournalNode在本地磁盤存放數(shù)據(jù)的位置 -->
  <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/opt/hadoop/journal</value>
  </property>
    <!-- 開啟NameNode失敗自動(dòng)切換 -->
  <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
  </property>
    <!-- 配置失敗自動(dòng)切換的方式 -->
  <property>
        <name>dfs.client.failover.proxy.provider.inspiryhdfs</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
  </property>
    <!-- 配置隔離機(jī)制 -->
  <property>
        <name>dfs.ha.fencing.methods</name>
        <value>sshfence</value>
  </property>
    <!-- 使用隔離機(jī)制時(shí)需要ssh免登陸 -->
  <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/home/hadoop/.ssh/id_rsa</value>
  </property>
</configuration>

指定MapReduce運(yùn)行在yarn框架之上
[hadoop@oversea-stable hadoop]$ cp mapred-site.xml{.template,}
[hadoop@oversea-stable hadoop]$ tail -6  mapred-site.xml   
<configuration>
  <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
  </property>
</configuration>
[hadoop@oversea-stable hadoop]$

指定DataNode節(jié)點(diǎn)
[hadoop@oversea-stable hadoop]$ cat slaves 
open-stable
permission-stable
sp-stable
[hadoop@oversea-stable hadoop]$

[hadoop@oversea-stable hadoop]$ tail -11 yarn-site.xml 
<configuration>
<!-- Site specific YARN configuration properties -->
  <!-- 指定resourcemanager地址 -->
  <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>oversea-stable</value>
  </property>
    <!-- 指定nodemanager啟動(dòng)時(shí)加載server的方式為shuffle server -->
  <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
  </property>
</configuration>
[hadoop@oversea-stable hadoop]$

(2) 將配置完畢的hadoop 同步到其它servers上

[hadoop@oversea-stable opt]$  rsync -avzoptgl hadoop-2.9.1 bus-stable:/opt/
[hadoop@oversea-stable opt]$  rsync -avzoptgl hadoop-2.9.1 open-stable:/opt/
[hadoop@oversea-stable opt]$  rsync -avzoptgl hadoop-2.9.1 permission-stable:/opt/
[hadoop@oversea-stable opt]$  rsync -avzoptgl hadoop-2.9.1 sp-stable:/opt/

其它各servers 創(chuàng)建 hadoop 的 soft link

(3) 啟動(dòng)journalnode

sbin/hadoop-daemons.sh start journalnode

在oversea-stable上格式化namenode,并啟動(dòng)主namenode

hadoop namenode -format
sbin/hadoop-daemon.sh start namenode 

[hadoop@oversea-stable hadoop]$ ls /opt/hadoop/tmp/dfs/name/current/
    fsimage_0000000000000000000      seen_txid
    fsimage_0000000000000000000.md5  VERSION

(4) standby_namenode同步數(shù)據(jù)
在oversea-stable 節(jié)點(diǎn)格式化namenode,并啟動(dòng)namenode之后,在bus-stable節(jié)點(diǎn)上同步namenode信息,避免再次對namenode格式化(同時(shí)保證bus-stable上也有/opt/hadoop/tmp目錄)。在bus-stable上如下操作:

bin/hdfs namenode -bootstrapStandby

sbin/hadoop-daemon.sh start namenode

5、格式化zkfs(讓namenode可以將本機(jī)狀態(tài)匯報(bào)給zookeeper)
hdfs zkfc -formatZK
(如果格式化失敗,要檢查 core-site.xml中指定的zookeeper地址是否完全正確)

6、啟動(dòng)hdfs

[hadoop@oversea-stable hadoop]$ sbin/start-dfs.sh 
Starting namenodes on [oversea-stable bus-stable]
bus-stable: starting namenode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-namenode-bus-stable.out
oversea-stable: starting namenode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-namenode-oversea-stable.out
sp-stable: starting datanode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-sp-stable.out
permission-stable: starting datanode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-permission-stable.out
open-stable: starting datanode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-datanode-open-stable.out
Starting journal nodes [open-stable permission-stable sp-stable]
sp-stable: starting journalnode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-sp-stable.out
open-stable: starting journalnode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-open-stable.out
permission-stable: starting journalnode, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-journalnode-permission-stable.out
Starting ZK Failover Controllers on NN hosts [oversea-stable bus-stable]
oversea-stable: starting zkfc, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-zkfc-oversea-stable.out
bus-stable: starting zkfc, logging to /opt/hadoop-2.9.1/logs/hadoop-hadoop-zkfc-bus-stable.out
[hadoop@oversea-stable hadoop]$

7、啟動(dòng)yarn(Namenode和ResourceManger如果不是同一臺機(jī)器,不能在NameNode上啟動(dòng) yarn,必須在ResouceManager機(jī)器上啟動(dòng)yarn)

[hadoop@oversea-stable hadoop]$ sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-resourcemanager-oversea-stable.out
sp-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-sp-stable.out
open-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-open-stable.out
permission-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-permission-stable.out
[hadoop@oversea-stable hadoop]$

8、驗(yàn)證各節(jié)點(diǎn)的角色

[hadoop@oversea-stable hadoop]$ sbin/start-yarn.sh 
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-resourcemanager-oversea-stable.out
sp-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-sp-stable.out
open-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-open-stable.out
permission-stable: starting nodemanager, logging to /opt/hadoop-2.9.1/logs/yarn-hadoop-nodemanager-permission-stable.out
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable ~]$ jps
4389 DFSZKFailoverController
5077 ResourceManager
25061 Jps
4023 NameNode
[hadoop@oversea-stable ~]$

[hadoop@bus-stable ~]$ jps
9073 Jps
29956 NameNode
30095 DFSZKFailoverController
[hadoop@bus-stable ~]$

[hadoop@open-stable ~]$ jps
2434 DataNode
421 QuorumPeerMain
2559 JournalNode
2847 NodeManager
11903 Jps
[hadoop@open-stable ~]$

[hadoop@permission-stable ~]$ jps
30489 QuorumPeerMain
32505 JournalNode
9689 Jps
32380 DataNode
303 NodeManager
[hadoop@permission-stable ~]$

[hadoop@sp-stable ~]$ jps
29955 DataNode
30339 NodeManager
30072 JournalNode
6792 Jps
28060 QuorumPeerMain
[hadoop@sp-stable ~]$

在瀏覽器中輸入:http://oversea-stable:50070/,以及http://bus-stable:50070/
apache hadoop
上面可以看到bus-stable是處于active狀態(tài),oversea-stable是處于standby,接下來測試以下namenode的高可用,當(dāng)bus-stable掛掉時(shí)oversea-stable是否能夠自動(dòng)切換;
在bus-stable中kill掉NameNode進(jìn)程

[root@bus-stable ~]# jps
1614 NameNode
2500 Jps
1929 DFSZKFailoverController
[root@bus-stable ~]# kill -9 1614

再次刷新http://bus-stable:50070/,無法訪問;刷新http://oversea-stable:50070/
這時(shí)oversea-stable已經(jīng)處于active狀態(tài)了,這說明切換是沒有問題的,現(xiàn)在已經(jīng)完成了hadoop集群的高可用的搭建;

輸入:http://oversea-stable:8088 查看hadoop cluster 狀態(tài),如下所示:
apache hadoop

9、hadoop的應(yīng)用

[hadoop@oversea-stable hadoop]$ hdfs dfs -ls /   
Found 2 items
drwxr-xr-x   - hadoop supergroup          0 2018-06-15 10:32 /data
[hadoop@oversea-stable ~]$  hdfs dfs -put /tmp/notepad.txt  /data/notepad.txt
[hadoop@oversea-stable ~]$ cd /opt/hadoop
[hadoop@oversea-stable hadoop]$ ls share/hadoop/mapreduce/
hadoop-mapreduce-client-app-2.9.1.jar         hadoop-mapreduce-client-jobclient-2.9.1.jar        lib
hadoop-mapreduce-client-common-2.9.1.jar      hadoop-mapreduce-client-jobclient-2.9.1-tests.jar  lib-examples
hadoop-mapreduce-client-core-2.9.1.jar        hadoop-mapreduce-client-shuffle-2.9.1.jar          sources
hadoop-mapreduce-client-hs-2.9.1.jar          hadoop-mapreduce-examples-2.9.1.jar
hadoop-mapreduce-client-hs-plugins-2.9.1.jar  jdiff
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable hadoop]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.1.jar wordcount /data  /out1
18/06/15 11:04:53 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:8032
18/06/15 11:04:54 INFO input.FileInputFormat: Total input files to process : 1
18/06/15 11:04:54 INFO mapreduce.JobSubmitter: number of splits:1
18/06/15 11:04:54 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/06/15 11:04:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528979206314_0002
18/06/15 11:04:55 INFO impl.YarnClientImpl: Submitted application application_1528979206314_0002
18/06/15 11:04:55 INFO mapreduce.Job: The url to track the job: http://oversea-stable:8088/proxy/application_1528979206314_0002/
18/06/15 11:04:55 INFO mapreduce.Job: Running job: job_1528979206314_0002
18/06/15 11:05:02 INFO mapreduce.Job: Job job_1528979206314_0002 running in uber mode : false
18/06/15 11:05:02 INFO mapreduce.Job:  map 0% reduce 0%
18/06/15 11:05:08 INFO mapreduce.Job:  map 100% reduce 0%
18/06/15 11:05:14 INFO mapreduce.Job:  map 100% reduce 100%
18/06/15 11:05:14 INFO mapreduce.Job: Job job_1528979206314_0002 completed successfully
18/06/15 11:05:14 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=68428
                FILE: Number of bytes written=535339
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=88922
                HDFS: Number of bytes written=58903
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=3466
                Total time spent by all reduces in occupied slots (ms)=3704
                Total time spent by all map tasks (ms)=3466
                Total time spent by all reduce tasks (ms)=3704
                Total vcore-milliseconds taken by all map tasks=3466
                Total vcore-milliseconds taken by all reduce tasks=3704
                Total megabyte-milliseconds taken by all map tasks=3549184
                Total megabyte-milliseconds taken by all reduce tasks=3792896
        Map-Reduce Framework
                Map input records=1770
                Map output records=5961
                Map output bytes=107433
                Map output materialized bytes=68428
                Input split bytes=100
                Combine input records=5961
                Combine output records=2366
                Reduce input groups=2366
                Reduce shuffle bytes=68428
                Reduce input records=2366
                Reduce output records=2366
                Spilled Records=4732
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=145
                CPU time spent (ms)=2730
                Physical memory (bytes) snapshot=505479168
                Virtual memory (bytes) snapshot=4347928576
                Total committed heap usage (bytes)=346554368
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
               wRONG_LENGTH=0
               wRONG_MAP=0
               wRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=88822
        File Output Format Counters 
                Bytes Written=58903
[hadoop@oversea-stable hadoop]$
[hadoop@oversea-stable hadoop]$ hdfs dfs -ls /out1/             
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2018-06-15 11:05 /out1/_SUCCESS
-rw-r--r--   3 hadoop supergroup      58903 2018-06-15 11:05 /out1/part-r-00000
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable hadoop]$ hdfs dfs -cat /out1/part-r-00000

自定義map-reduce函數(shù)運(yùn)行任務(wù)如下效果:

[hadoop@oversea-stable hadoop]$  hadoop jar /opt/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.9.1.jar -file /opt/map.py -mapper /opt/map.py -file /opt/reduce.py -reducer /opt/reduce.py -input /data/notepad.txt -output /out2
18/06/15 14:30:32 WARN streaming.StreamJob: -file option is deprecated, please use generic option -files instead.
packageJobJar: [/opt/map.py, /opt/reduce.py, /tmp/hadoop-unjar5706672822735184593/] [] /tmp/streamjob6067385394162603509.jar tmpDir=null
18/06/15 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:8032
18/06/15 14:30:33 INFO client.RMProxy: Connecting to ResourceManager at oversea-stable/192.168.20.68:8032
18/06/15 14:30:34 INFO mapred.FileInputFormat: Total input files to process : 1
18/06/15 14:30:34 INFO mapreduce.JobSubmitter: number of splits:2
18/06/15 14:30:34 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
18/06/15 14:30:35 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529036356241_0004
18/06/15 14:30:35 INFO impl.YarnClientImpl: Submitted application application_1529036356241_0004
18/06/15 14:30:35 INFO mapreduce.Job: The url to track the job: http://oversea-stable:8088/proxy/application_1529036356241_0004/
18/06/15 14:30:35 INFO mapreduce.Job: Running job: job_1529036356241_0004
18/06/15 14:30:42 INFO mapreduce.Job: Job job_1529036356241_0004 running in uber mode : false
18/06/15 14:30:42 INFO mapreduce.Job:  map 0% reduce 0%
18/06/15 14:30:48 INFO mapreduce.Job:  map 100% reduce 0%
18/06/15 14:30:54 INFO mapreduce.Job:  map 100% reduce 100%
18/06/15 14:30:54 INFO mapreduce.Job: Job job_1529036356241_0004 completed successfully
18/06/15 14:30:54 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=107514
                FILE: Number of bytes written=823175
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=93092
                HDFS: Number of bytes written=58903
                HDFS: Number of read operations=9
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters 
                Launched map tasks=2
                Launched reduce tasks=1
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=7194
                Total time spent by all reduces in occupied slots (ms)=3739
                Total time spent by all map tasks (ms)=7194
                Total time spent by all reduce tasks (ms)=3739
                Total vcore-milliseconds taken by all map tasks=7194
                Total vcore-milliseconds taken by all reduce tasks=3739
                Total megabyte-milliseconds taken by all map tasks=7366656
                Total megabyte-milliseconds taken by all reduce tasks=3828736
        Map-Reduce Framework
                Map input records=1770
                Map output records=5961
                Map output bytes=95511
                Map output materialized bytes=107520
                Input split bytes=174
                Combine input records=0
                Combine output records=0
                Reduce input groups=2366
                Reduce shuffle bytes=107520
                Reduce input records=5961
                Reduce output records=2366
                Spilled Records=11922
                Shuffled Maps =2
                Failed Shuffles=0
                Merged Map outputs=2
                GC time elapsed (ms)=292
                CPU time spent (ms)=4340
                Physical memory (bytes) snapshot=821985280
                Virtual memory (bytes) snapshot=6525067264
                Total committed heap usage (bytes)=548929536
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
               wRONG_LENGTH=0
               wRONG_MAP=0
               wRONG_REDUCE=0
        File Input Format Counters 
                Bytes Read=92918
        File Output Format Counters 
                Bytes Written=58903
18/06/15 14:30:54 INFO streaming.StreamJob: Output directory: /out2
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable hadoop]$ hdfs dfs -ls /out2
Found 2 items
-rw-r--r--   3 hadoop supergroup          0 2018-06-15 14:30 /out2/_SUCCESS
-rw-r--r--   3 hadoop supergroup      58903 2018-06-15 14:30 /out2/part-00000
[hadoop@oversea-stable hadoop]$ 

[hadoop@oversea-stable hadoop]$ cat /opt/map.py 
#!/usr/bin/python
import sys
for line in sys.stdin:
    line = line.strip()
   words = line.split()
    for word in words:
        print "%s\t%s" % (word, 1)
[hadoop@oversea-stable hadoop]$ 
[hadoop@oversea-stable hadoop]$ cat /opt/reduce.py 
#!/usr/bin/python
from operator import itemgetter
import sys

current_word = None
current_count = 0
word = None

for line in sys.stdin:
    line = line.strip()
   word, count = line.split('\t',1)
    try:
        count = int(count)
    except ValueError:
        continue
    if current_word == word:
        current_count += count
    else:
        if current_word:
            print "%s\t%s" % (current_word, current_count)
        current_count = count
        current_word = word

if word == current_word:
    print "%s\t%s" % (current_word, current_count)
[hadoop@oversea-stable hadoop]$

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn,海內(nèi)外云服務(wù)器15元起步,三天無理由+7*72小時(shí)售后在線,公司持有idc許可證,提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案,具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢,專為企業(yè)上云打造定制,能夠滿足用戶豐富、多元化的應(yīng)用場景需求。

分享名稱:apachehadoop-創(chuàng)新互聯(lián)
標(biāo)題URL:http://www.muchs.cn/article4/dpcioe.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站內(nèi)鏈、網(wǎng)站維護(hù)、ChatGPT、App開發(fā)、服務(wù)器托管、網(wǎng)站導(dǎo)航

廣告

聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請盡快告知,我們將會在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)

成都網(wǎng)站建設(shè)