HBase數(shù)據(jù)導(dǎo)入ImportTsv-創(chuàng)新互聯(lián)

ImportTsv 工具是通過map reduce 完成的。所以要啟動yarn. 工具要使用jar包，所以注意配置classpath。ImportTsv默認(rèn)是通過hbase api 插入數(shù)據(jù)的

公司專注于為企業(yè)提供成都網(wǎng)站建設(shè)、成都做網(wǎng)站、微信公眾號開發(fā)、商城系統(tǒng)網(wǎng)站開發(fā)，微信小程序開發(fā)，軟件按需求定制設(shè)計等一站式互聯(lián)網(wǎng)企業(yè)服務(wù)。憑借多年豐富的經(jīng)驗，我們會仔細(xì)了解各客戶的需求而做出多方面的分析、設(shè)計、整合，為客戶設(shè)計出具風(fēng)格及創(chuàng)意性的商業(yè)解決方案，成都創(chuàng)新互聯(lián)更提供一系列網(wǎng)站制作和網(wǎng)站推廣的服務(wù)。
[hadoop-user@rhel work]$ cat /home/hadoop-user/.bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
PATH=$PATH:$HOME/bin
export PATH
JAVA_HOME=/usr/java/jdk1.8.0_171-amd64
PATH=$PATH:$JAVA_HOME/bin
CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
HADOOP_HOME=/home/hadoop-user/hadoop-2.8.0
PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib
HBASE_HOME=/home/hadoop-user/hbase-2.0.0
PATH=$PATH:$HBASE_HOME/bin
CLASSPATH=$CLASSPATH:$HBASE_HOME/lib
ZOOKEEPER_HOME=/home/hadoop-user/zookeeper-3.4.12
PATH=$PATH:$ZOOKEEPER_HOME/bin
PHOENIX_HOME=/home/hadoop-user/apache-phoenix-5.0.0-alpha-HBase-2.0-bin
PATH=$PATH:$PHOENIX_HOME/bin
export PATH
創(chuàng)建表

hbase(main):033:0> create 'test','cf'
創(chuàng)建要導(dǎo)入的文件
[hadoop-user@rhel work]$ cat /home/hadoop-user/work/sample1.csv
row10,"mjj10"
row11,"mjj11"
row12,"mjj12"
row14,"mjj13"
將文件放入hdfs
[hadoop-user@rhel work]$ hdfs dfs -put /home/hadoop-user/work/sample1.csv /sample1.csv
ImportTsv導(dǎo)入命令
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv
注: HBASE_ROW_KEY表示文件rowid的位置，后面是列的定義。這里意思是導(dǎo)入的列為列族為cf,列名為a。要導(dǎo)入的文件是hdfs中/sample1.csv
幫助中的解釋
Usage: importtsv -Dimporttsv.columns=a,b,c <tablename> <inputdir>
Imports the given input directory of TSV data into the specified table.
The column names of the TSV data must be specified using the -Dimporttsv.columns
option. This option takes the form of comma-separated column names, where each
column name is either a simple column family, or a columnfamily:qualifier. The special
column name HBASE_ROW_KEY is used to designate that this column should be used
as the row key for each imported record. You must specify exactly one column
to be the row key, and you must specify a column name for every column that exists in the
input data. Another special columnHBASE_TS_KEY designates that this column should be
used as timestamp for each record. Unlike HBASE_ROW_KEY, HBASE_TS_KEY is optional.
You must specify at most one column as timestamp key for each imported record.
Record with invalid timestamps (blank, non-numeric) will be treated as bad record.
Note: if you use this option, then 'importtsv.timestamp' option will be ignored.
注意: ImportTsv導(dǎo)入的內(nèi)容，phoenix看不到。事實上，hbase創(chuàng)建的表，Phoenix看不到。phoenix創(chuàng)建的表，hbase能看到，但是內(nèi)容是編碼后的內(nèi)容。
importtsv 工具默認(rèn)使用hbase put api導(dǎo)數(shù)據(jù).當(dāng)使用選項 -Dimporttsv.bulk.output時，將會先生成HFILE文件的內(nèi)部格式的文件。
The importtsv tool, by default, uses the HBase Put API to insert data into the HBase
table using TableOutputFormat in its map phase. But when the -Dimporttsv.bulk.output option is specified, it instead generates HBase internal format (HFile) files on HDFS
by using HFileOutputFormat. Therefore, we can then use the completebulkload tool to load the generated files into a running cluster. The following steps are to use the bulk output and load tools:
生成HFILE格式的文件命令
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator="," -Dimporttsv.bulk.output=/hfiles_tsv -Dimporttsv.columns=HBASE_ROW_KEY,cf:a test /sample1.csv
注: 生成hfile格式的文件，存放于hdfs中的/hfile_tsv目錄中，目錄會由命令自己創(chuàng)建。
[hadoop-user@rhel work]$ hdfs dfs -ls /hfiles_tsv/cf
18/06/28 10:49:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rw-r--r-- 1 hadoop-user supergroup 5125 2018-06-28 10:40 /hfiles_tsv/cf/0e466616d42a4a128fb60caa7dbe075a
注: 0e466616d42a4a128fb60caa7dbe075a的命名格式跟WEB中region的命名格式很像
通過

hadoop jar hbase-server-2.0.0.jar completebulkload /hfiles_tsv 'test'
出現(xiàn)異常； Exception in thread "main" java.lang.ClassNotFoundException: completebulkload
HBASE文檔中兩種導(dǎo)入方式:
There are two ways to invoke this utility, with explicit classname and via the driver:
Explicit Classname
$ bin/hbase org.apache.hadoop.hbase.tool.LoadIncrementalHFiles <hdfs://storefileoutput> <tablename>
Driver
HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath` ${HADOOP_HOME}/bin/hadoop jar ${HBASE_HOME}/hbase-server-VERSION.jar completebulkload <hdfs://storefileoutput> <tablename>

另外有需要云服務(wù)器可以了解下創(chuàng)新互聯(lián)scvps.cn，海內(nèi)外云服務(wù)器15元起步，三天無理由+7*72小時售后在線，公司持有idc許可證，提供“云服務(wù)器、裸金屬服務(wù)器、高防服務(wù)器、香港服務(wù)器、美國服務(wù)器、虛擬主機、免備案服務(wù)器”等云主機租用服務(wù)以及企業(yè)上云的綜合解決方案，具有“安全穩(wěn)定、簡單易用、服務(wù)可用性高、性價比高”等特點與優(yōu)勢，專為企業(yè)上云打造定制，能夠滿足用戶豐富、多元化的應(yīng)用場景需求。

當(dāng)前標(biāo)題：HBase數(shù)據(jù)導(dǎo)入ImportTsv-創(chuàng)新互聯(lián)
本文路徑：http://muchs.cn/article18/ijhgp.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián)，為您提供小程序開發(fā)、網(wǎng)站排名、網(wǎng)站收錄、網(wǎng)站設(shè)計公司、關(guān)鍵詞優(yōu)化、域名注冊

廣告

聲明：本網(wǎng)站發(fā)布的內(nèi)容（圖片、視頻和文字）以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主，如果涉及侵權(quán)請盡快告知，我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場，如需處理請聯(lián)系客服。電話：028-86922220；郵箱：631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載，或轉(zhuǎn)載時需注明來源：創(chuàng)新互聯(lián)

猜你還喜歡下面的內(nèi)容

什么是地址郵箱電子郵件的地址怎么寫呢？-創(chuàng)新互聯(lián)
msvcr110.dll丟失怎么辦-創(chuàng)新互聯(lián)
國產(chǎn)高性能RK3568四核|Cortex-A55核心板-創(chuàng)新互聯(lián)
什么是jsp源碼-創(chuàng)新互聯(lián)
web前端開發(fā)和后端開發(fā)有哪些區(qū)別-創(chuàng)新互聯(lián)
怎么在php中使用array_keys函數(shù)返回數(shù)組的鍵名-創(chuàng)新互聯(lián)
怎么使用CSS和D3實現(xiàn)用文字組成的心形動畫效果-創(chuàng)新互聯(lián)