Hadoop綜合項目——二手房統(tǒng)計分析(MapReduce篇)-創(chuàng)新互聯(lián)

Hadoop綜合項目——二手房統(tǒng)計分析(MapReduce篇)

成都創(chuàng)新互聯(lián)是一家集網(wǎng)站建設(shè),臺山企業(yè)網(wǎng)站建設(shè),臺山品牌網(wǎng)站建設(shè),網(wǎng)站定制,臺山網(wǎng)站建設(shè)報價,網(wǎng)絡(luò)營銷,網(wǎng)絡(luò)優(yōu)化,臺山網(wǎng)站推廣為一體的創(chuàng)新建站企業(yè),幫助傳統(tǒng)企業(yè)提升企業(yè)形象加強企業(yè)競爭力??沙浞譂M足這一群體相比中小企業(yè)更為豐富、高端、多元的互聯(lián)網(wǎng)需求。同時我們時刻保持專業(yè)、時尚、前沿,時刻以成就客戶成長自我,堅持不斷學習、思考、沉淀、凈化自己,讓我們?yōu)楦嗟钠髽I(yè)打造出實用型網(wǎng)站。文章目錄
  • Hadoop綜合項目——二手房統(tǒng)計分析(MapReduce篇)
    • 0、 寫在前面
    • 1、MapReduce統(tǒng)計分析
      • 1.1 統(tǒng)計四大一線城市房價的最值
      • 1.2 按照城市分區(qū)統(tǒng)計二手房數(shù)量
      • 1.3 根據(jù)二手房信息發(fā)布時間排序統(tǒng)計
      • 1.4 統(tǒng)計二手房四大一線城市總價Top5
      • 1.5 基于二手房總價實現(xiàn)自定義分區(qū)全排序
      • 1.6 基于建造年份和房子總價的二次排序
      • 1.7 自定義類統(tǒng)計二手房地理位置對應(yīng)數(shù)量
      • 1.8 統(tǒng)計二手房標簽的各類比例
    • 2、數(shù)據(jù)及源代碼
    • 3、總結(jié)


在這里插入圖片描述


0、 寫在前面
  • Windows版本:Windows10
  • Linux版本:Ubuntu Kylin 16.04
  • JDK版本:Java8
  • Hadoop版本:Hadoop-2.7.1
  • Hive版本:Hive1.2.2
  • IDE:IDEA 2020.2.3
  • IDE:Pycharm 2021.1.3
  • IDE:Eclipse3.8
1、MapReduce統(tǒng)計分析

通過MapReduce對最值、排序、TopN、自定義分區(qū)排序、二次排序、自定義類、占比等8個方面的統(tǒng)計分析

1.1 統(tǒng)計四大一線城市房價的最值
  • 分析目的:

二手房房價的最值是體現(xiàn)一個城市經(jīng)濟的重要因素,也是顧客購買的衡量因素之一。

  • 代碼:

Driver端:

public class MaxMinTotalPriceByCityDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "MaxMinTotalPriceByCity");
        job.setJarByClass(MaxMinTotalPriceByCityDriver.class);
        job.setMapperClass(MaxMinTotalPriceByCityMapper.class);
        job.setReducerClass(MaxMinTotalPriceByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.setInputPaths(job, new Path("datas/tb_house.txt"));
        FileOutputFormat.setOutputPath(job, new Path("MapReduce/out/MaxMinTotalPriceByCity"));
        job.waitForCompletion(true);
    }
}
  • Mapper端:
public class MaxMinTotalPriceByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable();
    @Override
    protected void map(Object key, Text value, Context out) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        outk.set(data[1]);      // city
        outv.set(Integer.parseInt(data[6]));        // total
        out.write(outk, outv);
    }
}

Reducer端:

public class MaxMinTotalPriceByCityReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        ListtotalList = new ArrayList();
        Iteratoriterator = values.iterator();
        while (iterator.hasNext()) {
            totalList.add(iterator.next().get());
        }
        Collections.sort(totalList);
        int max = totalList.get(totalList.size() - 1);
        int min = totalList.get(0);
        Text outv = new Text();
        outv.set("房子總價大、小值分別為:" + String.valueOf(max) + "萬元," + String.valueOf(min) + "萬元");
        context.write(key, outv);
    }
}
  • 運行情況:

tp

  • 結(jié)果:

    tp

1.2 按照城市分區(qū)統(tǒng)計二手房數(shù)量
  • 分析目的:

二手房的數(shù)量是了解房子基本情況的維度之一,數(shù)量的多少在一定程度上體現(xiàn)了房子的受歡迎度。

  • 代碼:

tp

Driver端:

public class HouseCntByCityDriver {
    public static void main(String[] args) throws Exception {
        args = new String[] { "/input/datas/tb_house.txt", "/output/HouseCntByCity" };
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://node01:9000");
        Job job = Job.getInstance(conf, "HouseCntByCity");
        job.setJarByClass(HouseCntByCityDriver.class);
        job.setMapperClass(HouseCntByCityMapper.class);
        job.setReducerClass(HouseCntByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setPartitionerClass(CityPartitioner.class);
        job.setNumReduceTasks(4);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class HouseCntByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        outk.set(new Text(data[1]));
        context.write(outk, outv);
    }
}

Reducer端:

public class HouseCntByCityReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) sum += val.get();
        context.write(key, new IntWritable(sum));
    }
}
  • 運行情況:

tp

  • 結(jié)果:

在這里插入圖片描述

1.3 根據(jù)二手房信息發(fā)布時間排序統(tǒng)計
  • 分析目的:

二手房的信息發(fā)布時間是了解房子基本情況的維度之一,在一定程度上,顧客傾向于最新的房源信息。

  • 代碼:

Driver端:

public class AcessHousePubTimeSortDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        Configuration conf = new Configuration()
        Job job = Job.getInstance(conf, "AcessHousePubTimeSort");
        job.setJarByClass(AcessHousePubTimeSortDriver.class);
        job.setMapperClass(AcessHousePubTimeSortMapper.class);
        job.setReducerClass(AcessHousePubTimeSortReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.setInputPaths(job, new Path("datas/tb_house.txt"));
        FileOutputFormat.setOutputPath(job, new Path("MapReduce/out/AcessHousePubTimeSort"));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class AcessHousePubTimeSortMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String lines = value.toString();
        String data[] = lines.split("\t");
        String crawler_time = data[9], followInfo = data[4];
        String ct = crawler_time.substring(0, 10);
        int idx1 = followInfo.indexOf("|"), idx2 = followInfo.indexOf("發(fā)");
        String timeStr = followInfo.substring(idx1 + 1, idx2);
        String pubDate = "";
        try {
            pubDate = getPubDate(ct, timeStr);
        } catch (ParseException e) {
            e.printStackTrace();
        }
        outk.set(new Text(pubDate));
        context.write(outk, outv);
    }
    public String getPubDate(String ct, String timeStr) throws ParseException{
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
        Date getTime = sdf.parse(ct);
        String getDate = sdf.format(getTime);
        Calendar calendar = Calendar.getInstance();
        calendar.setTime(getTime);
        if (timeStr.equals("今天")) {
            calendar.add(Calendar.DAY_OF_WEEK,-0);
        } else if (timeStr.contains("天")) {
            int i = 0;
            while (Character.isDigit(timeStr.charAt(i))) i++;
            int size = Integer.parseInt(timeStr.substring(0, i));
            calendar.add(Calendar.DAY_OF_WEEK, -size);
        } else {
            int i = 0;
            while (Character.isDigit(timeStr.charAt(i))) i++; 
            int size = Integer.parseInt(timeStr.substring(0, i));
            calendar.add(Calendar.MONTH, -size);
        }
        Date pubTime = calendar.getTime();
        String pubDate = sdf.format(pubTime);
        return pubDate;
    }
}

Reducer端:

public class AcessHousePubTimeSortReducer extends Reducer{
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) sum += val.get();
        context.write(key, new IntWritable(sum));
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結(jié)果:

tp

1.4 統(tǒng)計二手房四大一線城市總價Top5
  • 分析目的:

TopN是MapReduce分析最常見且必不可少的一個例子。

  • 代碼:

Driver端:

public class TotalPriceTop5ByCityDriver {
    public static void main(String[] args) throws Exception {
        args = new String[] {  "datas/tb_house.txt", "MapReduce/out/TotalPriceTop5ByCity" };
        Configuration conf = new Configuration();
        if (args.length != 2) {
            System.err.println("Usage: TotalPriceTop5ByCity");
            System.exit(2);
        }
        Job job = Job.getInstance(conf);
        job.setJarByClass(TotalPriceTop5ByCityDriver.class);
        job.setMapperClass(TotalPriceTop5ByCityMapper.class);
        job.setReducerClass(TotalPriceTop5ByCityReducer.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setNumReduceTasks(1);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Mapper端:

public class TotalPriceTop5ByCityMapper extends Mapper{
    private int cnt = 1;
    private Text outk = new Text();
    private IntWritable outv = new IntWritable();
    @Override
    public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        String city = data[1], totalPrice = data[6];
        outk.set(data[1]);
        outv.set(Integer.parseInt(data[6]));
        context.write(outk, outv);
    }
}

Reducer端:

public class TotalPriceTop5ByCityReducer extends Reducer{
   private Text outv = new Text();
   private int len = 0;
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        ListtotalPriceList = new ArrayList();
        Iteratoriterator = values.iterator();
        while (iterator.hasNext()) {
            totalPriceList.add(iterator.next().get());
        }
        Collections.sort(totalPriceList);
        int size = totalPriceList.size();
        String top5Str = "二手房總價Top5:";
        for (int i = 1; i<= 5; i++) {
            if (i == 5) {
                top5Str += totalPriceList.get(size - i) + "萬元";
            } else {
                top5Str += totalPriceList.get(size - i) + "萬元, ";
            }
        }
        outv.set(String.valueOf(top5Str));
        context.write(key, outv);
    }
}
  • 運行情況:

tp

  • 結(jié)果:

tp

1.5 基于二手房總價實現(xiàn)自定義分區(qū)全排序
  • 分析目的:

自定義分區(qū)全排序可以實現(xiàn)不同于以往的排序方式,展示效果與默認全排序可以體現(xiàn)出一定的差別。

  • 代碼:
public class TotalOrderingPartition extends Configured implements Tool {
    static class SimpleMapper extends Mapper{
        @Override
        protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            IntWritable intWritable = new IntWritable(Integer.parseInt(key.toString()));
            context.write((Text) key, intWritable);
        }
    }
    static class SimpleReducer extends Reducer{
        @Override
        protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
            for (IntWritable value : values) {
                context.write(value, NullWritable.get());
            }
        }
    }
    @Override
    public int run(String[] args) throws Exception {
        Configuration conf = getConf();
        Job job = Job.getInstance(conf, "Total Order Sorting");
        job.setJarByClass(TotalOrderingPartition.class);
        job.setInputFormatClass(KeyValueTextInputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setNumReduceTasks(3);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(NullWritable.class);
        TotalOrderPartitioner.setPartitionFile(job.getConfiguration(), new Path(args[2]));
        InputSampler.Samplersampler = new InputSampler.SplitSampler(5000, 10);
        InputSampler.writePartitionFile(job, sampler);
        job.setPartitionerClass(TotalOrderPartitioner.class);
        job.setMapperClass(SimpleMapper.class);
        job.setReducerClass(SimpleReducer.class);
        job.setJobName("TotalOrderingPartition");
        return job.waitForCompletion(true) ? 0 : 1;
    }
    public static void main(String[] args) throws Exception {
        args = new String[] { "datas/tb_house.txt", "MapReduce/out/TotalOrderingPartition/outPartition1", "MapReduce/out/TotalOrderingPartition/outPartition2" };
        int exitCode = ToolRunner.run(new TotalOrderingPartition(), args);
        System.exit(exitCode);
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結(jié)果:

在這里插入圖片描述

在這里插入圖片描述


在這里插入圖片描述


在這里插入圖片描述

1.6 基于建造年份和房子總價的二次排序
  • 分析目的:

某些時候按照一個字段的排序方式并不能讓我們滿意,二次排則是解決這個問題的一個方法。

  • 代碼:

Driver端:

tp

Mapper端:

在這里插入圖片描述

Reducer端:

在這里插入圖片描述

  • 運行情況:

tp

  • 結(jié)果:

tp

1.7 自定義類統(tǒng)計二手房地理位置對應(yīng)數(shù)量
  • 分析目的:

某些字段通過MapReduce不可以直接統(tǒng)計得到,這時采用自定義類的方式便可以做到。

  • 代碼:

自定義類:

public class HouseCntByPositionTopListBean implements Writable {
    private Text info;
    private IntWritable cnt;
    public Text getInfo() {
        return info;
    }
    public void setInfo(Text info) {
        this.info = info;
    }
    public IntWritable getCnt() {
        return cnt;
    }
    public void setCnt(IntWritable cnt) {
        this.cnt = cnt;
    }
    @Override
    public void readFields(DataInput in) throws IOException {
        this.cnt = new IntWritable(in.readInt());
    }
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(cnt.get());
    }
    @Override
    public String toString() {
        String infoStr = info.toString();
        int idx = infoStr.indexOf("-");
        String city = infoStr.substring(0, idx);
        String position = infoStr.substring(idx + 1);
        return city + "#" + "[" + position + "]" + "#" + cnt;
    }
}

Driver端:

在這里插入圖片描述

Mapper端:

tp

Reducer端:

tp

  • 運行情況:

在這里插入圖片描述

  • 結(jié)果:

在這里插入圖片描述

在這里插入圖片描述
在這里插入圖片描述

1.8 統(tǒng)計二手房標簽的各類比例
  • 分析目的:

占比分析同樣是MapReduce統(tǒng)計分析的一大常用方式。

  • 代碼:

Driver端:

public class TagRatioByCityDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        args = new String[] {"datas/tb_house.txt", "MapReduce/out/TagRatioByCity" };
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        job.setJarByClass(TagRatioByCityDriver.class);
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setMapperClass(TagRatioByCityMapper.class);
        job.setReducerClass(TagRatioByCityReducer.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.waitForCompletion(true);
    }
}

Mapper端:

public class TagRatioByCityMapper extends Mapper{
    private Text outk = new Text();
    private IntWritable outv = new IntWritable(1);
    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
        String line = value.toString();
        String[] data = line.split("\t");
        String city = data[1], tag = data[8];
        if ("".equals(tag))  tag = "未知標簽";
        outk.set(city + "-" + tag);
        context.write(outk, outv);
    }
}

Reducer端:

public class TagRatioByCityReducer extends Reducer{
    private Text outv = new Text();
    private int sum = 0;
    @Override
    protected void reduce(Text key, Iterablevalues, Context context) throws IOException, InterruptedException {
        DecimalFormat df = new DecimalFormat("0.00");
        int cnt = 0;
        for (IntWritable value : values) {
            cnt += value.get();
        }
        String s = key.toString();
        String format = "";
        if (s.contains("上海")) {
            sum = 2995;
            format = df.format((double) cnt / sum * 100) + "%";
        } else if (s.contains("北京")) {
            sum = 2972;
            format = df.format((double) cnt / sum * 100) + "%";
        } else if (s.contains("廣州")) {
            sum = 2699;
            format = df.format((double) cnt / sum * 100) + "%";
        } else {
            sum = 2982;
            format = df.format((double) cnt / sum * 100) + "%";
        }
        outv.set(format);
        context.write(key, outv);
    }
}
  • 運行情況:

在這里插入圖片描述

  • 結(jié)果:

tp

2、數(shù)據(jù)及源代碼
  • Github

  • Gitee

3、總結(jié)

MapReduce統(tǒng)計分析過程需要比較細心,「根據(jù)二手房信息發(fā)布時間排序統(tǒng)計」這個涉及到Java中日期類SimpleDateFormatDate的使用,需要慢慢調(diào)試得出結(jié)果;統(tǒng)計最值和占比的難度并不高,主要在于統(tǒng)計要計算的類別的數(shù)量和總數(shù)量,最后二者相處即可;二次排序和自定義類難度較高,但一步一步來還是可以實現(xiàn)的。

結(jié)束!

在這里插入圖片描述

你是否還在尋找穩(wěn)定的海外服務(wù)器提供商?創(chuàng)新互聯(lián)www.cdcxhl.cn海外機房具備T級流量清洗系統(tǒng)配攻擊溯源,準確流量調(diào)度確保服務(wù)器高可用性,企業(yè)級服務(wù)器適合批量采購,新人活動首月15元起,快前往官網(wǎng)查看詳情吧

本文題目:Hadoop綜合項目——二手房統(tǒng)計分析(MapReduce篇)-創(chuàng)新互聯(lián)
網(wǎng)站鏈接:http://muchs.cn/article12/dcjddc.html

成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)頁設(shè)計公司、建站公司軟件開發(fā)、用戶體驗、品牌網(wǎng)站建設(shè)外貿(mào)網(wǎng)站建設(shè)

廣告

聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請盡快告知,我們將會在第一時間刪除。文章觀點不代表本網(wǎng)站立場,如需處理請聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時需注明來源: 創(chuàng)新互聯(lián)

成都網(wǎng)站建設(shè)公司