hadoop2.6HA部署-創(chuàng)新互聯(lián)

因為需要部署spark環(huán)境，特意重新安裝了一個測試的hadoop集群，現(xiàn)將相關步驟記錄如下：

創(chuàng)新互聯(lián)專注為客戶提供全方位的互聯(lián)網(wǎng)綜合服務，包含不限于做網(wǎng)站、網(wǎng)站制作、黃南州網(wǎng)絡推廣、小程序定制開發(fā)、黃南州網(wǎng)絡營銷、黃南州企業(yè)策劃、黃南州品牌公關、搜索引擎seo、人物專訪、企業(yè)宣傳片、企業(yè)代運營等，從售前售中售后，我們都將竭誠為您服務，您的肯定，是我們大的嘉獎；創(chuàng)新互聯(lián)為所有大學生創(chuàng)業(yè)者提供黃南州建站搭建服務，24小時服務熱線：13518219792，官方網(wǎng)址：muchs.cn

硬件環(huán)境：四臺虛擬機，hadoop1~hadoop4，3G內存，60G硬盤，2核CPU

軟件環(huán)境：CentOS6.5，hadoop-2.6.0-cdh6.8.2，JDK1.7

部署規(guī)劃：

hadoop1（192.168.0.3）：namenode（active）、resourcemanager

hadoop2（192.168.0.4）：namenode（standby）、journalnode、datanode、nodemanager、historyserver

hadoop3（192.168.0.5）：journalnode、datanode、nodemanager

hadoop4（192.168.0.6）：journalnode、datanode、nodemanager

HDFS的HA采用QJM的方式（journalnode）：

hadoop2.6 HA部署

一、系統(tǒng)準備

1、每臺機關閉selinux

#vi /etc/selinux/config

SELINUX=disabled

2、每臺機關閉防火墻（切記要關閉，否則格式化hdfs時會報錯無法連接journalnode）

#chkconfig iptables off

#service iptables stop

3、每臺機安裝jdk1.7

#cd /software

#tar -zxf jdk-7u65-linux-x64.gz -C /opt/

#cd /opt

#ln -s jdk-7u65-linux-x64.gz java

#vi /etc/profile

export JAVA_HOME=/opt/java

export PATH=$PATH:$JAVA_HOME/bin

4、每臺機建立hadoop相關用戶，并建立互信

#useradd grid

#passwd grid

（建立互信步驟略）

5、每臺機建立相關目錄

#mkdir -p /hadoop_data/hdfs/name

#mkdir -p /hadoop_data/hdfs/data

#mkdir -p /hadoop_data/hdfs/journal

#mkdir -p /hadoop_data/yarn/local

#chown -R grid:grid /hadoop_data

二、hadoop部署

HDFS HA主要是指定nameservices（如果不做HDFS ferderation，就只會有一個ID），同時指定該nameserviceID下面的兩個namenode及其地址。此處的nameservice名設置為hadoop-spark

1、每臺機解壓hadoop包

#cd /software

#tar -zxf hadoop-2.6.0-cdh6.8.2.tar.gz -C /opt/

#cd /opt

#chown -R grid:grid hadoop-2.6.0-cdh6.8.2

#ln -s hadoop-2.6.0-cdh6.8.2 hadoop

2、切換到grid用戶繼續(xù)操作

#su - grid

$cd /opt/hadoop/etc/hadoop

3、配置hadoop-env.sh（其實只配置JAVA_HOME）

$vi hadoop-env.sh

# The java implementation to use.

export JAVA_HOME=/opt/java

4、設置hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-spark</value>
<description>
Comma-separated list of nameservices.
</description>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-spark</name>
<value>nn1,nn2</value>
<description>
The prefix for a given nameservice, contains a comma-separated
list of namenodes for a given nameservice (eg EXAMPLENAMESERVICE).
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-spark.nn1</name>
<value>hadoop1:8020</value>
<description>
RPC address for nomenode1 of hadoop-spark
</description>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-spark.nn2</name>
<value>hadoop2:8020</value>
<description>
RPC address for nomenode2 of hadoop-spark
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-spark.nn1</name>
<value>hadoop1:50070</value>
<description>
The address and the base port where the dfs namenode1 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-spark.nn2</name>
<value>hadoop2:50070</value>
<description>
The address and the base port where the dfs namenode2 web ui will listen on.
</description>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///hadoop_data/hdfs/name</value>
<description>Determines where on the local filesystem the DFS name node
should store the name table(fsp_w_picpath).  If this is a comma-delimited list
of directories then the name table is replicated in all of the
directories, for redundancy. </description>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop2:8485;hadoop3:8485;hadoop4:8485/hadoop-spark</value>
<description>A directory on shared storage between the multiple namenodes
in an HA cluster. This directory will be written by the active and read
by the standby in order to keep the namespaces synchronized. This directory
does not need to be listed in dfs.namenode.edits.dir above. It should be
left empty in a non-HA cluster.
</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///hadoop_data/hdfs/data</value>
<description>Determines where on the local filesystem an DFS data node
should store its blocks.  If this is a comma-delimited
list of directories, then data will be stored in all named
directories, typically on different devices.
Directories that do not exist are ignored.
</description>
</property>
<!-- 這個如果不設置，會造成無法直接通過nameservice名稱來訪問hdfs，只能直接寫active的namenode地址 -->
<property> 
  <name>dfs.client.failover.proxy.provider.hadoop-spark</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>false</value>
<description>
Whether automatic failover is enabled. See the HDFS High
Availability documentation for details on automatic HA
configuration.
</description>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop_data/hdfs/journal</value>
</property>
</configuration>

5、配置core-site.xml（配置fs.defaultFS，使用HA的nameservices名稱）

<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-spark</value>
<description>The name of the default file system.  A URI whose
scheme and authority determine the FileSystem implementation.  The
uri's scheme determines the config property (fs.SCHEME.impl) naming
the FileSystem implementation class.  The uri's authority is used to
determine the host, port, etc. for a filesystem.</description>
</property>

6、配置mapred-site.xml

<configuration>
<!-- MR YARN Application properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
<description>The runtime framework for executing MapReduce jobs.
Can be one of local, classic or yarn.
</description>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>

7、配置yarn-site.xml

<configuration>
<!-- Site specific YARN configuration properties -->
<!-- Resource Manager Configs -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<description>fair-scheduler conf location</description>
<name>yarn.scheduler.fair.allocation.file</name>
<value>${yarn.home.dir}/etc/hadoop/fairscheduler.xml</value>
</property>
<property>
<description>List of directories to store localized files in. An
application's localized file directory will be found in:
${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
Individual containers' work directories, called container_${contid}, will
be subdirectories of this.
</description>
<name>yarn.nodemanager.local-dirs</name>
<value>/hadoop_data/yarn/local</value>
</property>
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<description>Number of CPU cores that can be allocated
for containers.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>2</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

8、配置slaves

hadoop2

hadoop3

hadoop4

9、配置fairscheduler.xml

<?xml version="1.0"?>
<allocations>
<queue name="common">
<minResources>0mb, 0 vcores </minResources>
<maxResources>6144 mb, 6 vcores </maxResources>
<maxRunningApps>50</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>grid</aclSubmitApps>
</queue>
</allocations>

10、同步配置文件到各個節(jié)點

$cd /opt/hadoop/etc

$scp -r hadoop hadoop2:/opt/hadoop/etc/

$scp -r hadoop hadoop3:/opt/hadoop/etc/

$scp -r hadoop hadoop4:/opt/hadoop/etc/

三、啟動集群（格式化文件系統(tǒng)）

1、建立環(huán)境變量

$vi ~/.bash_profile

export HADOOP_HOME=/opt/hadoop

export YARN_HOME_DIR=/opt/hadoop

export HADOOP_CONF_DIR=/opt/hadoop/etc/hadoop

export YARN_CONF_DIR=/opt/hadoop/etc/hadoop

2、啟動HDFS

先啟動journalnode，在hadoop2~hadoop4上：

$cd /opt/hadoop/

$sbin/hadoop-daemon.sh start journalnode

格式化HDFS，然后啟動namenode。在hadoop1上：

$bin/hdfs namenode -format

$sbin/hadoop-daemon.sh start namenode

同步另一個namenode，并啟動。在hadoop2上：

$bin/hdfs namenode -bootstrapStandby

$sbin/hadoop-daemon.sh start namenode

此時兩個namenode都是standby狀態(tài)，將hadoop1切換成active（hadoop1在hdfs-site.xml里對應的是nn1）：

$bin/hdfs haadmin -transitionToActive nn1

啟動datanode。在hadoop1上（active的namenode）：

$sbin/hadoop-daemons.sh start datanode

注意事項：后續(xù)啟動，只需使用sbin/start-dfs.sh即可。但由于沒有配置zookeeper的failover，所以只能HA只能使用手工切換。所以每次啟動HDFS，都要執(zhí)行$bin/hdfs haadmin -transitionToActive nn1來使hadoop1的namenode變成active狀態(tài)

2、啟動yarn

在hadoop1上（resourcemanager）：

$sbin/start-yarn.sh

————————————————————————————————————————————

以上配置的HDFS HA并不是自動故障切換的，如果配置HDFS自動故障切換，需要添加以下步驟（先停掉集群）：

1、部署zookeeper，步驟省略。部署在hadoop2、hadoop3、hadoop4，并啟動

2、在hdfs-site.xml中添加：

<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>

<property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/exampleuser/.ssh/id_rsa</value> </property>

解釋詳見官方文檔。這樣配置設定了fencing方法是通過ssh去關閉前一個活動節(jié)點的端口。前提前兩個namenode能互相SSH。

還有另外一種配置方法：

<property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property>

<property> <name>dfs.ha.fencing.methods</name> <value>shell(/path/to/my/script.sh arg1 arg2 ...)</value> </property>

這樣的配置實際上是使用shell來隔絕端口和程序。如果不想做實際的動作，dfs.ha.fencing.methods可配置成shell(/bin/true)

3、在core-site.xml中添加：

<property> <name>ha.zookeeper.quorum</name> <value>hadoop2:2181,hadoop3:2181,hadoop4:2181</value> </property>

4、初始化zkfc（在namenode上執(zhí)行）

bin/hdfs zkfc -formatZK

5、啟動集群

___________________________________________________________________________________________________

zkfc：每個namenode上都運行，是zk的客戶端，負責自動故障切換

zk：奇數(shù)個節(jié)點，維護一致性鎖、負責選舉活動節(jié)點

joural node：奇數(shù)個節(jié)點，用于active和standby節(jié)點之間數(shù)據(jù)同步?；顒庸?jié)點把數(shù)據(jù)寫入這些節(jié)點，standby節(jié)點讀取

————————————————————————————————————————————

更改成resourcemanager HA：

選擇hadoop2作為另一個rm節(jié)點

1、設置hadoop2對其它節(jié)點作互信

2、編譯yarn-site.xml并同步到其它機器

3、復制fairSheduler.xml到hadoop2

4、啟動rm

5、啟動另一個rm

另外有需要云服務器可以了解下創(chuàng)新互聯(lián)scvps.cn，海內外云服務器15元起步，三天無理由+7*72小時售后在線，公司持有idc許可證，提供“云服務器、裸金屬服務器、高防服務器、香港服務器、美國服務器、虛擬主機、免備案服務器”等云主機租用服務以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡單易用、服務可用性高、性價比高”等特點與優(yōu)勢，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應用場景需求。

新聞名稱：hadoop2.6HA部署-創(chuàng)新互聯(lián)
鏈接地址：http://muchs.cn/article38/dejgpp.html

成都網(wǎng)站建設公司_創(chuàng)新互聯(lián)，為您提供外貿(mào)網(wǎng)站建設、虛擬主機、App設計、企業(yè)建站、Google、ChatGPT

聲明：本網(wǎng)站發(fā)布的內容（圖片、視頻和文字）以用戶投稿、用戶轉載內容為主，如果涉及侵權請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內容未經(jīng)允許不得轉載，或轉載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內容