這篇文章主要介紹了MySQL位圖索引如何解決用戶畫像問題,具有一定借鑒價(jià)值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
在甘泉等地區(qū),都構(gòu)建了全面的區(qū)域性戰(zhàn)略布局,加強(qiáng)發(fā)展的系統(tǒng)性、市場(chǎng)前瞻性、產(chǎn)品創(chuàng)新能力,以專注、極致的服務(wù)理念,為客戶提供成都網(wǎng)站建設(shè)、網(wǎng)站設(shè)計(jì) 網(wǎng)站設(shè)計(jì)制作定制網(wǎng)站建設(shè),公司網(wǎng)站建設(shè),企業(yè)網(wǎng)站建設(shè),成都品牌網(wǎng)站建設(shè),成都全網(wǎng)營(yíng)銷推廣,外貿(mào)營(yíng)銷網(wǎng)站建設(shè),甘泉網(wǎng)站建設(shè)費(fèi)用合理。
用戶畫像的原始表,有一億記錄,100多個(gè)維度(100多個(gè)列),比如年齡,性別,愛好,是否有車,是否有房什么的.
測(cè)試環(huán)境800w數(shù)據(jù),大概在5G左右
需要解決的問題一 :在100列中任選N列,過濾查詢,執(zhí)行時(shí)間小于一秒。實(shí)際上N一般在5到10
即類似
select * from 畫像表 where 性別=‘男’ and 年齡 between 20 and 30 and 有車='yes' and 有房='yes' and 已婚='no'
問題二:全體數(shù)據(jù)的隨意聚合,執(zhí)行時(shí)間小于5秒
比如
select 年齡,性別,count(*) from 畫像表 group by 年齡,性別
在數(shù)據(jù)庫(kù)解決這個(gè)問題有一些麻煩,傳統(tǒng)建索引優(yōu)化的方式不起作用了。
100多個(gè)列隨意選擇幾列查詢,索引不可能提前建出這么多.
先看測(cè)試數(shù)據(jù)
CREATE TABLE `o_huaxiang_big` ( `id` bigint(20) NOT NULL AUTO_INCREMENT, `user_id` bigint(20) DEFAULT NULL, `umc_sex` varchar(20) DEFAULT NULL, `age` varchar(30) DEFAULT NULL, PRIMARY KEY (`id`) ) ;
處理這個(gè)問題,我自然想到模擬一個(gè)位圖.
一般畫像數(shù)據(jù)有幾種類型
1.數(shù)值類型
2.日期類型
3.日期時(shí)間類型
4.字符串類型
其中 日期和字符串類型可以作為離散值,
日期時(shí)間類型也可以轉(zhuǎn)化為日期類型,作為離散值處理。
數(shù)值類型比較麻煩,需要人為介入判斷是否是離散值,如果不是還需要?jiǎng)澐址秶?/p>
總之,所有的值都要映射為離散值
然后以上圖前5個(gè)數(shù)據(jù)為例,將離散值映射為位圖
男 0 0 0 0 1
未知 1 0 0 1 0
女 0 1 1 0 0
一個(gè)bigint 是8字節(jié)的,為了取整,我存放60個(gè)記錄的位信息。
然后建位圖表如下
CREATE TABLE `bitmap20` ( `table_name` varchar(32) NOT NULL DEFAULT '' comment '位圖表記錄的原始表名稱', `column_name` varchar(32) NOT NULL DEFAULT '' comment '列名稱', `min_id` int(11) DEFAULT NULL comment '起始ID', `max_id` int(11) DEFAULT NULL comment '終止ID', `gid` int(11) NOT NULL DEFAULT '0' comment '分組ID,每組1200記錄' , `grouped` varchar(32) NOT NULL DEFAULT '' comment '離散值', `total` bigint(21) NOT NULL DEFAULT '0' comment '總數(shù)', `c20` bigint(20) NOT NULL DEFAULT '0', `c19` bigint(20) NOT NULL DEFAULT '0', `c18` bigint(20) NOT NULL DEFAULT '0', `c17` bigint(20) NOT NULL DEFAULT '0', `c16` bigint(20) NOT NULL DEFAULT '0', `c15` bigint(20) NOT NULL DEFAULT '0', `c14` bigint(20) NOT NULL DEFAULT '0', `c13` bigint(20) NOT NULL DEFAULT '0', `c12` bigint(20) NOT NULL DEFAULT '0', `c11` bigint(20) NOT NULL DEFAULT '0', `c10` bigint(20) NOT NULL DEFAULT '0', `c9` bigint(20) NOT NULL DEFAULT '0', `c8` bigint(20) NOT NULL DEFAULT '0', `c7` bigint(20) NOT NULL DEFAULT '0', `c6` bigint(20) NOT NULL DEFAULT '0', `c5` bigint(20) NOT NULL DEFAULT '0', `c4` bigint(20) NOT NULL DEFAULT '0', `c3` bigint(20) NOT NULL DEFAULT '0', `c2` bigint(20) NOT NULL DEFAULT '0', `c1` bigint(20) NOT NULL DEFAULT '0', PRIMARY KEY (`column_name`,`gid`,`grouped`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED comment '位圖表';
c1-c20,一共20個(gè)bigint類型的字段,每個(gè)bigint記錄60個(gè)位信息。
也就是位圖表每行存儲(chǔ)1200個(gè)原始記錄的位圖信息,并且位圖表啟用了壓縮。
測(cè)試環(huán)境
4C 8G內(nèi)存(innodb buffer 2G) SSD硬盤
800萬(wàn)原始畫像數(shù)據(jù),占用硬盤5G
初始化位圖表
insert into bitmap20 select 'o_huaxiang_big' table_name, 'umc_sex' column_name, ((g1200-1)*60)*20 min_id, ((g1200-1)*60)*20+1200 max_id, v2.* from ( select g1200, grouped, sum(total) total, ifnull(max(case when abs((g1200-1)*20-g60)=20 then bitmap else null end),0) c20, ifnull(max(case when abs((g1200-1)*20-g60)=19 then bitmap else null end),0) c19, ifnull(max(case when abs((g1200-1)*20-g60)=18 then bitmap else null end),0) c18, ifnull(max(case when abs((g1200-1)*20-g60)=17 then bitmap else null end),0) c17, ifnull(max(case when abs((g1200-1)*20-g60)=16 then bitmap else null end),0) c16, ifnull(max(case when abs((g1200-1)*20-g60)=15 then bitmap else null end),0) c15, ifnull(max(case when abs((g1200-1)*20-g60)=14 then bitmap else null end),0) c14, ifnull(max(case when abs((g1200-1)*20-g60)=13 then bitmap else null end),0) c13, ifnull(max(case when abs((g1200-1)*20-g60)=12 then bitmap else null end),0) c12, ifnull(max(case when abs((g1200-1)*20-g60)=11 then bitmap else null end),0) c11, ifnull(max(case when abs((g1200-1)*20-g60)=10 then bitmap else null end),0) c10, ifnull(max(case when abs((g1200-1)*20-g60)=9 then bitmap else null end),0) c9, ifnull(max(case when abs((g1200-1)*20-g60)=8 then bitmap else null end),0) c8, ifnull(max(case when abs((g1200-1)*20-g60)=7 then bitmap else null end),0) c7, ifnull(max(case when abs((g1200-1)*20-g60)=6 then bitmap else null end),0) c6, ifnull(max(case when abs((g1200-1)*20-g60)=5 then bitmap else null end),0) c5, ifnull(max(case when abs((g1200-1)*20-g60)=4 then bitmap else null end),0) c4, ifnull(max(case when abs((g1200-1)*20-g60)=3 then bitmap else null end),0) c3, ifnull(max(case when abs((g1200-1)*20-g60)=2 then bitmap else null end),0) c2, ifnull(max(case when abs((g1200-1)*20-g60)=1 then bitmap else null end),0) c1 from ( SELECT CEIL(id / 60) g60, CEIL(id / 1200) g1200, umc_sex grouped, COUNT(*) total, BIT_OR(1 << (MOD(id, 60))) bitmap FROM o_huaxiang_big o GROUP BY g1200 , g60 , umc_sex ) v1 group by g1200,grouped ) v2; insert into bitmap20 select 'o_huaxiang_big' table_name, 'age' column_name, ((g1200-1)*60)*20 min_id, ((g1200-1)*60)*20+1200 max_id, v2.* from ( select g1200, grouped, sum(total) total, ifnull(max(case when abs((g1200-1)*20-g60)=20 then bitmap else null end),0) c20, ifnull(max(case when abs((g1200-1)*20-g60)=19 then bitmap else null end),0) c19, ifnull(max(case when abs((g1200-1)*20-g60)=18 then bitmap else null end),0) c18, ifnull(max(case when abs((g1200-1)*20-g60)=17 then bitmap else null end),0) c17, ifnull(max(case when abs((g1200-1)*20-g60)=16 then bitmap else null end),0) c16, ifnull(max(case when abs((g1200-1)*20-g60)=15 then bitmap else null end),0) c15, ifnull(max(case when abs((g1200-1)*20-g60)=14 then bitmap else null end),0) c14, ifnull(max(case when abs((g1200-1)*20-g60)=13 then bitmap else null end),0) c13, ifnull(max(case when abs((g1200-1)*20-g60)=12 then bitmap else null end),0) c12, ifnull(max(case when abs((g1200-1)*20-g60)=11 then bitmap else null end),0) c11, ifnull(max(case when abs((g1200-1)*20-g60)=10 then bitmap else null end),0) c10, ifnull(max(case when abs((g1200-1)*20-g60)=9 then bitmap else null end),0) c9, ifnull(max(case when abs((g1200-1)*20-g60)=8 then bitmap else null end),0) c8, ifnull(max(case when abs((g1200-1)*20-g60)=7 then bitmap else null end),0) c7, ifnull(max(case when abs((g1200-1)*20-g60)=6 then bitmap else null end),0) c6, ifnull(max(case when abs((g1200-1)*20-g60)=5 then bitmap else null end),0) c5, ifnull(max(case when abs((g1200-1)*20-g60)=4 then bitmap else null end),0) c4, ifnull(max(case when abs((g1200-1)*20-g60)=3 then bitmap else null end),0) c3, ifnull(max(case when abs((g1200-1)*20-g60)=2 then bitmap else null end),0) c2, ifnull(max(case when abs((g1200-1)*20-g60)=1 then bitmap else null end),0) c1 from ( SELECT CEIL(id / 60) g60, CEIL(id / 1200) g1200, age grouped, COUNT(*) total, BIT_OR(1 << (MOD(id, 60))) bitmap FROM o_huaxiang_big o GROUP BY g1200 , g60 , age ) v1 group by g1200,grouped ) v2;
性別和年齡的初始化分別耗時(shí)36秒和49秒
兩個(gè)維度的索引占用磁盤40M
聚合查詢,800萬(wàn)數(shù)據(jù)耗時(shí)1.7秒.因?yàn)槭荂PU密集型操作,IO非常小,所以可以通過多線程再優(yōu)化.
select t1p,t2p,sum(total) from ( select t1.grouped t1p, t2.grouped t2p, bit_count(t1.c1&t2.c1) + bit_count(t1.c2&t2.c2) + bit_count(t1.c3&t2.c3) + bit_count(t1.c4&t2.c4) + bit_count(t1.c5&t2.c5) + bit_count(t1.c6&t2.c6) + bit_count(t1.c7&t2.c7) + bit_count(t1.c8&t2.c8) + bit_count(t1.c9&t2.c9) + bit_count(t1.c10&t2.c10) + bit_count(t1.c11&t2.c11) + bit_count(t1.c12&t2.c12) + bit_count(t1.c13&t2.c13) + bit_count(t1.c14&t2.c14) + bit_count(t1.c15&t2.c15) + bit_count(t1.c16&t2.c16) + bit_count(t1.c17&t2.c17) + bit_count(t1.c18&t2.c18) + bit_count(t1.c19&t2.c19) + bit_count(t1.c20&t2.c20) total from bitmap20 t1 inner join bitmap20 t2 on(t1.gid=t2.gid) where t1.column_name='umc_sex' and t2.column_name='age' ) t3 where total>0 group by t1p,t2p
還有一個(gè)問題是過濾
select max_id , concat( concat(right(c20,1),left(c20,59)) , concat(right(c19,1),left(c19,59)) , concat(right(c18,1),left(c18,59)) , concat(right(c17,1),left(c17,59)) , concat(right(c16,1),left(c16,59)) , concat(right(c15,1),left(c15,59)) , concat(right(c14,1),left(c14,59)) , concat(right(c13,1),left(c13,59)) , concat(right(c12,1),left(c12,59)) , concat(right(c11,1),left(c11,59)) , concat(right(c10,1),left(c10,59)) , concat(right(c9,1),left(c9,59)) , concat(right(c8,1),left(c8,59)) , concat(right(c7,1),left(c7,59)) , concat(right(c6,1),left(c6,59)) , concat(right(c5,1),left(c5,59)) , concat(right(c4,1),left(c4,59)) , concat(right(c3,1),left(c3,59)) , concat(right(c2,1),left(c2,59)) , concat(right(c1,1),left(c1,59)) ) c from ( select gid,min_id,max_id, lpad(conv(bit_and(c20),10,2),60,'0') c20, lpad(conv(bit_and(c19),10,2),60,'0') c19, lpad(conv(bit_and(c18),10,2),60,'0') c18, lpad(conv(bit_and(c17),10,2),60,'0') c17, lpad(conv(bit_and(c16),10,2),60,'0') c16, lpad(conv(bit_and(c15),10,2),60,'0') c15, lpad(conv(bit_and(c14),10,2),60,'0') c14, lpad(conv(bit_and(c13),10,2),60,'0') c13, lpad(conv(bit_and(c12),10,2),60,'0') c12, lpad(conv(bit_and(c11),10,2),60,'0') c11, lpad(conv(bit_and(c10),10,2),60,'0') c10, lpad(conv(bit_and(c9),10,2),60,'0') c9, lpad(conv(bit_and(c8),10,2),60,'0') c8, lpad(conv(bit_and(c7),10,2),60,'0') c7, lpad(conv(bit_and(c6),10,2),60,'0') c6, lpad(conv(bit_and(c5),10,2),60,'0') c5, lpad(conv(bit_and(c4),10,2),60,'0') c4, lpad(conv(bit_and(c3),10,2),60,'0') c3, lpad(conv(bit_and(c2),10,2),60,'0') c2, lpad(conv(bit_and(c1),10,2),60,'0') c1 from bitmap20 where ( (column_name='umc_sex' and grouped='未知') or (column_name='age' and grouped='117')) group by gid,min_id,max_id having count(distinct column_name)=2 ) v1
用max_id 減去 ‘1’在c字符串的位置,就是原始的ID
感謝你能夠認(rèn)真閱讀完這篇文章,希望小編分享的“MySQL位圖索引如何解決用戶畫像問題”這篇文章對(duì)大家有幫助,同時(shí)也希望大家多多支持創(chuàng)新互聯(lián),關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道,更多相關(guān)知識(shí)等著你來學(xué)習(xí)!
當(dāng)前名稱:MySQL位圖索引如何解決用戶畫像問題
鏈接分享:http://muchs.cn/article12/gecsdc.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供關(guān)鍵詞優(yōu)化、靜態(tài)網(wǎng)站、網(wǎng)站收錄、網(wǎng)站策劃、小程序開發(fā)、營(yíng)銷型網(wǎng)站建設(shè)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來源: 創(chuàng)新互聯(lián)