hadoop中mapreduce的示例代碼

這篇文章主要介紹hadoop中mapreduce的示例代碼,文中介紹的非常詳細(xì),具有一定的參考價(jià)值,感興趣的小伙伴們一定要看完!

蓋州網(wǎng)站建設(shè)公司成都創(chuàng)新互聯(lián)公司,蓋州網(wǎng)站設(shè)計(jì)制作,有大型網(wǎng)站制作公司豐富經(jīng)驗(yàn)。已為蓋州成百上千家提供企業(yè)網(wǎng)站建設(shè)服務(wù)。企業(yè)網(wǎng)站搭建\成都外貿(mào)網(wǎng)站制作要多少錢(qián),請(qǐng)找那個(gè)售后服務(wù)好的蓋州做網(wǎng)站的公司定做!

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.commons.lang.StringUtils;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordCountMapper extends Mapper<LongWritable, Text, Text, LongWritable>{
    
    @Override
    protected void map(LongWritable key, Text value,Context context)
            throws IOException, InterruptedException {

        //獲取到一行文件的內(nèi)容
        String line = value.toString();
        //切分這一行的內(nèi)容為一個(gè)單詞數(shù)組
        String[] words = StringUtils.split(line, " ");
        //遍歷輸出  <word,1>
        for(String word:words){
            
            context.write(new Text(word), new LongWritable(1));
            
        }
        
        
        
        
    }
    
    
    
    

}
package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable>{
    
    
    // key: hello ,  values : {1,1,1,1,1.....}
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values,Context context)
            throws IOException, InterruptedException {
        
        //定義一個(gè)累加計(jì)數(shù)器
        long count = 0;
        for(LongWritable value:values){
            
            count += value.get();
            
        }
        
        //輸出<單詞:count>鍵值對(duì)
        context.write(key, new LongWritable(count));
        
    }
    
    

}

package cn.itheima.bigdata.hadoop.mr.wordcount;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
 * 用來(lái)描述一個(gè)作業(yè)job(使用哪個(gè)mapper類,哪個(gè)reducer類,輸入文件在哪,輸出結(jié)果放哪。。。。)
 * 然后提交這個(gè)job給hadoop集群
 * @author duanhaitao@itcast.cn
 *
 */
//cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
public class WordCountRunner {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job wcjob = Job.getInstance(conf);
        //設(shè)置job所使用的jar包
        conf.set("mapreduce.job.jar", "wcount.jar");
        
        //設(shè)置wcjob中的資源所在的jar包
        wcjob.setJarByClass(WordCountRunner.class);
        
        
        //wcjob要使用哪個(gè)mapper類
        wcjob.setMapperClass(WordCountMapper.class);
        //wcjob要使用哪個(gè)reducer類
        wcjob.setReducerClass(WordCountReducer.class);
        
        //wcjob的mapper類輸出的kv數(shù)據(jù)類型
        wcjob.setMapOutputKeyClass(Text.class);
        wcjob.setMapOutputValueClass(LongWritable.class);
        
        //wcjob的reducer類輸出的kv數(shù)據(jù)類型
        wcjob.setOutputKeyClass(Text.class);
        wcjob.setOutputValueClass(LongWritable.class);
        
        //指定要處理的原始數(shù)據(jù)所存放的路徑
        FileInputFormat.setInputPaths(wcjob, "hdfs://192.168.88.155:9000/wc/srcdata");
    
        //指定處理之后的結(jié)果輸出到哪個(gè)路徑
        FileOutputFormat.setOutputPath(wcjob, new Path("hdfs://192.168.88.155:9000/wc/output"));
        
        boolean res = wcjob.waitForCompletion(true);
        
        System.exit(res?0:1);
        
        
    }
    
    
    
}

打包成mr.jar放在hadoop server上

[root@hadoop02 ~]# hadoop jar /root/Desktop/mr.jar cn.itheima.bigdata.hadoop.mr.wordcount.WordCountRunner
Java HotSpot(TM) Client VM warning: You have loaded library /home/hadoop/hadoop-2.6.0/lib/native/libhadoop.so.1.0.0 which might have disabled stack guard. The VM will try to fix the stack guard now.
It's highly recommended that you fix the library with 'execstack -c <libfile>', or link it with '-z noexecstack'.
15/12/05 06:07:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/12/05 06:07:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop02/192.168.88.155:8032
15/12/05 06:07:08 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/12/05 06:07:09 INFO input.FileInputFormat: Total input paths to process : 1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: number of splits:1
15/12/05 06:07:09 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1449322432664_0001
15/12/05 06:07:10 INFO impl.YarnClientImpl: Submitted application application_1449322432664_0001
15/12/05 06:07:10 INFO mapreduce.Job: The url to track the job: http://hadoop02:8088/proxy/application_1449322432664_0001/
15/12/05 06:07:10 INFO mapreduce.Job: Running job: job_1449322432664_0001
15/12/05 06:07:22 INFO mapreduce.Job: Job job_1449322432664_0001 running in uber mode : false
15/12/05 06:07:22 INFO mapreduce.Job:  map 0% reduce 0%
15/12/05 06:07:32 INFO mapreduce.Job:  map 100% reduce 0%
15/12/05 06:07:39 INFO mapreduce.Job:  map 100% reduce 100%
15/12/05 06:07:40 INFO mapreduce.Job: Job job_1449322432664_0001 completed successfully
15/12/05 06:07:41 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=635
                FILE: Number of bytes written=212441
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=338
                HDFS: Number of bytes written=223
                HDFS: Number of read operations=6
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Launched map tasks=1
                Launched reduce tasks=1
                Data-local map tasks=1
                Total time spent by all maps in occupied slots (ms)=7463
                Total time spent by all reduces in occupied slots (ms)=4688
                Total time spent by all map tasks (ms)=7463
                Total time spent by all reduce tasks (ms)=4688
                Total vcore-seconds taken by all map tasks=7463
                Total vcore-seconds taken by all reduce tasks=4688
                Total megabyte-seconds taken by all map tasks=7642112
                Total megabyte-seconds taken by all reduce tasks=4800512
        Map-Reduce Framework
                Map input records=10
                Map output records=41
                Map output bytes=547
                Map output materialized bytes=635
                Input split bytes=114
                Combine input records=0
                Combine output records=0
                Reduce input groups=30
                Reduce shuffle bytes=635
                Reduce input records=41
                Reduce output records=30
                Spilled Records=82
                Shuffled Maps =1
                Failed Shuffles=0
                Merged Map outputs=1
                GC time elapsed (ms)=211
                CPU time spent (ms)=1350
                Physical memory (bytes) snapshot=221917184
                Virtual memory (bytes) snapshot=722092032
                Total committed heap usage (bytes)=137039872
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=224
        File Output Format Counters
                Bytes Written=223

以上是“hadoop中mapreduce的示例代碼”這篇文章的所有內(nèi)容,感謝各位的閱讀!希望分享的內(nèi)容對(duì)大家有幫助,更多相關(guān)知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!

分享標(biāo)題:hadoop中mapreduce的示例代碼
URL分享:http://muchs.cn/article28/jepijp.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站收錄、面包屑導(dǎo)航、App開(kāi)發(fā)服務(wù)器托管、云服務(wù)器、小程序開(kāi)發(fā)

廣告

聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)

成都網(wǎng)頁(yè)設(shè)計(jì)公司