一個(gè)Hadoop管理員的職責(zé)(翻譯)-創(chuàng)新互聯(lián)

最近看過一篇與Hadoop有關(guān)的英文文檔，其實(shí)就是一本書里的一部分內(nèi)容。覺得很好，基本闡述了一個(gè)hadoop管理員的職責(zé)。平時(shí)，工作當(dāng)中接觸到hadoop的朋友，可以看下，這篇文檔中所描述的知識(shí)和技能，大家是否都已經(jīng)具備了？
譯文:
一個(gè)Hadoop管理員的職責(zé)

創(chuàng)新互聯(lián)建站專注于鐵山網(wǎng)站建設(shè)服務(wù)及定制，我們擁有豐富的企業(yè)做網(wǎng)站經(jīng)驗(yàn)。熱誠為您提供鐵山營銷型網(wǎng)站建設(shè)，鐵山網(wǎng)站制作、鐵山網(wǎng)頁設(shè)計(jì)、鐵山網(wǎng)站官網(wǎng)定制、微信小程序開發(fā)服務(wù)，打造鐵山網(wǎng)絡(luò)公司原創(chuàng)品牌,更為您提供鐵山網(wǎng)站排名全網(wǎng)營銷落地服務(wù)。

隨著對(duì)大數(shù)據(jù)日益增長的興趣和洞察力，各個(gè)組織正在積極計(jì)劃或者組建他們的大數(shù)據(jù)團(tuán)隊(duì)。要開始進(jìn)行數(shù)據(jù)工作，他們需要一個(gè)良好而扎實(shí)的基礎(chǔ)架構(gòu)。
一旦他們具備基礎(chǔ)架構(gòu)，他們就須要針對(duì)集群的維護(hù)，管理和排錯(cuò)進(jìn)行控制和指定策略。

市場(chǎng)對(duì)Hadoop管理員的需求日益增長，他們的工作(創(chuàng)建和維護(hù)集群)使得數(shù)據(jù)分析成為真正的可能。

Hadoop管理員在網(wǎng)絡(luò)，操作系統(tǒng)，和存儲(chǔ)方面，須要很好的系統(tǒng)操作技能。在復(fù)雜的網(wǎng)絡(luò)環(huán)境中，對(duì)于計(jì)算機(jī)硬件和硬件操作，他們需要具備大量的知識(shí)。

Apache Hadoop軟件主要運(yùn)行在Linux操作系統(tǒng)，所有必須對(duì)Linux操作系統(tǒng)具備諸如：監(jiān)控，排錯(cuò)，配置，安全管理等這些技能。

為集群設(shè)置節(jié)點(diǎn)涉及很多重復(fù)性的工作，Hadoop管理員應(yīng)該使用快速而有效率的方法把這些服務(wù)器使用起來，比如使用Puppet,Chef和CFEngine這樣的管理工具.
除了這些工具，管理也應(yīng)該具備良好的規(guī)劃技能去設(shè)計(jì)和規(guī)劃集群.

在一個(gè)集群中許多節(jié)點(diǎn)須要復(fù)制數(shù)據(jù)，比如，namenode守護(hù)進(jìn)程的fsimage文件，可以被配置為寫入相同節(jié)點(diǎn)的不同硬盤，或者寫入不同節(jié)點(diǎn)。
所以hadoop管理員須要理解NFS掛載點(diǎn)以及如何配合集群來建立NFS掛載.管理員也可能被要求在特定的節(jié)點(diǎn)上配置磁盤RAID.

因?yàn)镠adoop所有的服務(wù)和守護(hù)進(jìn)程都是建立在Java之上,所以JVM(Java Virtual Machine Java虛擬機(jī))的基本知識(shí),和對(duì)Java異常的理解將會(huì)非常有用.
這些知識(shí)能夠幫助管理員快速的確認(rèn)問題.

Hadoop管理員應(yīng)具備進(jìn)行基準(zhǔn)測(cè)試的技能,能夠在高流量的場(chǎng)景下測(cè)試集群的性能.

集群總是在持續(xù)不斷的運(yùn)行,并處理大量的數(shù)據(jù),所以集群比較容易出現(xiàn)故障.為了監(jiān)控集群的健康狀況,管理員須要部署監(jiān)控工具,諸如:Nagios 和 Ganglia等等.
并且管理員須要為關(guān)鍵節(jié)點(diǎn)配置告警和監(jiān)控,在出現(xiàn)問題之前,提前預(yù)見到問題.

具備良好的腳步語言編程知識(shí),諸如: Python，Ruby, 或者 Shell,將會(huì)極大的幫助到Hadoop管理員.
通常,Hadoop管理員會(huì)被要求把一些預(yù)定的文件從外部文件源,分期的導(dǎo)入至HDFS. 腳步技能可以幫助管理員通過執(zhí)行腳本來自動(dòng)化地管理這些工作.

最重要的是,Hadoop管理員應(yīng)該很好的了解Apache Hadoop的體系結(jié)構(gòu)和它的內(nèi)部運(yùn)作.

下面這些項(xiàng)目是Hadoop管理員必須掌握的一些關(guān)鍵hadoop操作:
規(guī)劃集群,評(píng)估集群須要處理的數(shù)據(jù)量,以此來決定集群中的節(jié)點(diǎn)數(shù)量.
在集群上安裝和升級(jí)Apache Hadoop.
通過使用Hadoop的各種配置文件來配置和調(diào)試Hadoop.
理解所有Hadoop守護(hù)進(jìn)程,以及它們?cè)诩褐械慕巧统袚?dān)的職責(zé).
Hadoop 管理員應(yīng)該知如何閱讀和解釋Hadoop的日志.
在集群中添加和刪除節(jié)點(diǎn).
在集群中重新平衡節(jié)點(diǎn).
使用認(rèn)證和認(rèn)證系統(tǒng)來啟用安全機(jī)制,比如Kerberos

幾乎所有的組織都會(huì)遵循一定的策略來備份他們的數(shù)據(jù),執(zhí)行數(shù)據(jù)備份工作是Hadoop管理員的責(zé)任.
所以Hadoop管理員應(yīng)該熟悉服務(wù)器的備份和恢復(fù)操作.

原文:
Responsibilities of a Hadoop administrator

With the increase in the interest to derive insight on their big data,
organizations are now planning and building their big data teams aggressively.
To start working on their data, they need to have a good solid infrastructure.
Once they have this setup, they need several controls and system policies in place to maintain, manage,and troubleshoot their cluster.

There is an ever-increasing demand for Hadoop Administrators in the market
as their function (setting up and maintaining Hadoop clusters) is what makes analysis really possible.

The Hadoop administrator needs to be very good at system operations, networking, operating systems, and storage.
They need to have a strong knowledge of computer hardware and their operations, in a complex network.

Apache Hadoop, mainly, runs on Linux. So having good Linux skills such as monitoring, troubleshooting, confguration, and security is a must.

Setting up nodes for clusters involves a lot of repetitive tasks
and the Hadoop administrator should use quicker and effcient ways to bring up these servers using confguration management tools
such as Puppet, Chef, and CFEngine.
Apart from these tools, the administrator should also have good capacity planning skills to design and plan clusters.

There are several nodes in a cluster that would need duplication of data,
for example, the fsimage file of the namenode daemon can be confgured to write to two different disks on the same node
or on a disk on a different node.
An understanding of NFS mount points and how to set it up within a cluster is required.
The administrator may also be asked to set up RAID for disks on specifc nodes.

As all Hadoop services/daemons are built on Java,
a basic knowledge of the JVM along with the ability to understand Java exceptions would be very useful.
This helps administrators identify issues quickly.

The Hadoop administrator should possess the skills to benchmark the cluster to test performance under high traffc scenarios.

Clusters are prone to failures as they are up all the time and are processing large amounts of data regularly.
To monitor the health of the cluster, the administrator should deploy monitoring tools such as Nagios and Ganglia
and should confgure alerts and monitors for critical nodes of the cluster to foresee issues before they occur.

Knowledge of a good scripting language such as Python, Ruby, or Shell would greatly help the function of an administrator.
Often, administrators are asked to set up some kind of a scheduled file staging from an external source to HDFS.
The scripting skills help them execute these requests by building scripts and automating them.

Above all, the Hadoop administrator should have a very good understanding of the Apache Hadoop architecture and its inner workings.

The following are some of the key Hadoop-related operations that the Hadoop administrator should know:

Planning the cluster, deciding on the number of nodes based on the estimated amount of data the cluster is going to serve.

Installing and upgrading Apache Hadoop on a cluster.

Confguring and tuning Hadoop using the various confguration files available within Hadoop.

An understanding of all the Hadoop daemons along with their roles and responsibilities in the cluster.

The administrator should know how to read and interpret Hadoop logs.

Adding and removing nodes in the cluster.

Rebalancing nodes in the cluster.

Employ security using an authentication and authorization system such as Kerberos.

Almost all organizations follow the policy of backing up their data
and it is the responsibility of the administrator to perform this activity.
So, an administrator should be well versed with backups and recovery operations of servers

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn，海內(nèi)外云服務(wù)器15元起步，三天無理由+7*72小時(shí)售后在線，公司持有idc許可證，提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機(jī)、免備案服務(wù)器”等云主機(jī)租用服務(wù)以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡(jiǎn)單易用、服務(wù)可用性高、性價(jià)比高”等特點(diǎn)與優(yōu)勢(shì)，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應(yīng)用場(chǎng)景需求。

網(wǎng)頁題目：一個(gè)Hadoop管理員的職責(zé)(翻譯)-創(chuàng)新互聯(lián)
文章網(wǎng)址：http://www.muchs.cn/article2/dsoioc.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供網(wǎng)站改版、網(wǎng)站內(nèi)鏈、網(wǎng)站導(dǎo)航、網(wǎng)站設(shè)計(jì)、網(wǎng)站收錄、App設(shè)計(jì)

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請(qǐng)盡快告知，我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng)，如需處理請(qǐng)聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時(shí)需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容