小編給大家分享一下Java如何實(shí)現(xiàn)決策樹算法,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
創(chuàng)新互聯(lián)成立于2013年,先為盤州等服務(wù)建站,盤州等地企業(yè),進(jìn)行企業(yè)商務(wù)咨詢服務(wù)。為盤州企業(yè)網(wǎng)站制作PC+手機(jī)+微官網(wǎng)三網(wǎng)同步一站式服務(wù)解決您的所有建站問(wèn)題。
具體如下:
決策樹算法是一種逼近離散函數(shù)值的方法。它是一種典型的分類方法,首先對(duì)數(shù)據(jù)進(jìn)行處理,利用歸納算法生成可讀的規(guī)則和決策樹,然后使用決策對(duì)新數(shù)據(jù)進(jìn)行分析。本質(zhì)上決策樹是通過(guò)一系列規(guī)則對(duì)數(shù)據(jù)進(jìn)行分類的過(guò)程。
決策樹構(gòu)造可以分兩步進(jìn)行。第一步,決策樹的生成:由訓(xùn)練樣本集生成決策樹的過(guò)程。一般情況下,訓(xùn)練樣本數(shù)據(jù)集是根據(jù)實(shí)際需要有歷史的、有一定綜合程度的,用于數(shù)據(jù)分析處理的數(shù)據(jù)集。第二步,決策樹的剪枝:決策樹的剪枝是對(duì)上一階段生成的決策樹進(jìn)行檢驗(yàn)、校正和修下的過(guò)程,主要是用新的樣本數(shù)據(jù)集(稱為測(cè)試數(shù)據(jù)集)中的數(shù)據(jù)校驗(yàn)決策樹生成過(guò)程中產(chǎn)生的初步規(guī)則,將那些影響預(yù)衡準(zhǔn)確性的分枝剪除。
java實(shí)現(xiàn)代碼如下:
package demo; import java.util.HashMap; import java.util.LinkedList; import java.util.List; import java.util.Map; import java.util.Map.Entry; import java.util.Set; public class DicisionTree { public static void main(String[] args) throws Exception { System.out.print("創(chuàng)新互聯(lián)測(cè)試結(jié)果:"); String[] attrNames = new String[] { "AGE", "INCOME", "STUDENT", "CREDIT_RATING" }; // 讀取樣本集 Map<Object, List<Sample>> samples = readSamples(attrNames); // 生成決策樹 Object decisionTree = generateDecisionTree(samples, attrNames); // 輸出決策樹 outputDecisionTree(decisionTree, 0, null); } /** * 讀取已分類的樣本集,返回Map:分類 -> 屬于該分類的樣本的列表 */ static Map<Object, List<Sample>> readSamples(String[] attrNames) { // 樣本屬性及其所屬分類(數(shù)組中的最后一個(gè)元素為樣本所屬分類) Object[][] rawData = new Object[][] { { "<30 ", "High ", "No ", "Fair ", "0" }, { "<30 ", "High ", "No ", "Excellent", "0" }, { "30-40", "High ", "No ", "Fair ", "1" }, { ">40 ", "Medium", "No ", "Fair ", "1" }, { ">40 ", "Low ", "Yes", "Fair ", "1" }, { ">40 ", "Low ", "Yes", "Excellent", "0" }, { "30-40", "Low ", "Yes", "Excellent", "1" }, { "<30 ", "Medium", "No ", "Fair ", "0" }, { "<30 ", "Low ", "Yes", "Fair ", "1" }, { ">40 ", "Medium", "Yes", "Fair ", "1" }, { "<30 ", "Medium", "Yes", "Excellent", "1" }, { "30-40", "Medium", "No ", "Excellent", "1" }, { "30-40", "High ", "Yes", "Fair ", "1" }, { ">40 ", "Medium", "No ", "Excellent", "0" } }; // 讀取樣本屬性及其所屬分類,構(gòu)造表示樣本的Sample對(duì)象,并按分類劃分樣本集 Map<Object, List<Sample>> ret = new HashMap<Object, List<Sample>>(); for (Object[] row : rawData) { Sample sample = new Sample(); int i = 0; for (int n = row.length - 1; i < n; i++) sample.setAttribute(attrNames[i], row[i]); sample.setCategory(row[i]); List<Sample> samples = ret.get(row[i]); if (samples == null) { samples = new LinkedList<Sample>(); ret.put(row[i], samples); } samples.add(sample); } return ret; } /** * 構(gòu)造決策樹 */ static Object generateDecisionTree( Map<Object, List<Sample>> categoryToSamples, String[] attrNames) { // 如果只有一個(gè)樣本,將該樣本所屬分類作為新樣本的分類 if (categoryToSamples.size() == 1) return categoryToSamples.keySet().iterator().next(); // 如果沒有供決策的屬性,則將樣本集中具有最多樣本的分類作為新樣本的分類,即投票選舉出分類 if (attrNames.length == 0) { int max = 0; Object maxCategory = null; for (Entry<Object, List<Sample>> entry : categoryToSamples .entrySet()) { int cur = entry.getValue().size(); if (cur > max) { max = cur; maxCategory = entry.getKey(); } } return maxCategory; } // 選取測(cè)試屬性 Object[] rst = chooseBestTestAttribute(categoryToSamples, attrNames); // 決策樹根結(jié)點(diǎn),分支屬性為選取的測(cè)試屬性 Tree tree = new Tree(attrNames[(Integer) rst[0]]); // 已用過(guò)的測(cè)試屬性不應(yīng)再次被選為測(cè)試屬性 String[] subA = new String[attrNames.length - 1]; for (int i = 0, j = 0; i < attrNames.length; i++) if (i != (Integer) rst[0]) subA[j++] = attrNames[i]; // 根據(jù)分支屬性生成分支 @SuppressWarnings("unchecked") Map<Object, Map<Object, List<Sample>>> splits = /* NEW LINE */(Map<Object, Map<Object, List<Sample>>>) rst[2]; for (Entry<Object, Map<Object, List<Sample>>> entry : splits.entrySet()) { Object attrValue = entry.getKey(); Map<Object, List<Sample>> split = entry.getValue(); Object child = generateDecisionTree(split, subA); tree.setChild(attrValue, child); } return tree; } /** * 選取最優(yōu)測(cè)試屬性。最優(yōu)是指如果根據(jù)選取的測(cè)試屬性分支,則從各分支確定新樣本 * 的分類需要的信息量之和最小,這等價(jià)于確定新樣本的測(cè)試屬性獲得的信息增益最大 * 返回?cái)?shù)組:選取的屬性下標(biāo)、信息量之和、Map(屬性值->(分類->樣本列表)) */ static Object[] chooseBestTestAttribute( Map<Object, List<Sample>> categoryToSamples, String[] attrNames) { int minIndex = -1; // 最優(yōu)屬性下標(biāo) double minValue = Double.MAX_VALUE; // 最小信息量 Map<Object, Map<Object, List<Sample>>> minSplits = null; // 最優(yōu)分支方案 // 對(duì)每一個(gè)屬性,計(jì)算將其作為測(cè)試屬性的情況下在各分支確定新樣本的分類需要的信息量之和,選取最小為最優(yōu) for (int attrIndex = 0; attrIndex < attrNames.length; attrIndex++) { int allCount = 0; // 統(tǒng)計(jì)樣本總數(shù)的計(jì)數(shù)器 // 按當(dāng)前屬性構(gòu)建Map:屬性值->(分類->樣本列表) Map<Object, Map<Object, List<Sample>>> curSplits = /* NEW LINE */new HashMap<Object, Map<Object, List<Sample>>>(); for (Entry<Object, List<Sample>> entry : categoryToSamples .entrySet()) { Object category = entry.getKey(); List<Sample> samples = entry.getValue(); for (Sample sample : samples) { Object attrValue = sample .getAttribute(attrNames[attrIndex]); Map<Object, List<Sample>> split = curSplits.get(attrValue); if (split == null) { split = new HashMap<Object, List<Sample>>(); curSplits.put(attrValue, split); } List<Sample> splitSamples = split.get(category); if (splitSamples == null) { splitSamples = new LinkedList<Sample>(); split.put(category, splitSamples); } splitSamples.add(sample); } allCount += samples.size(); } // 計(jì)算將當(dāng)前屬性作為測(cè)試屬性的情況下在各分支確定新樣本的分類需要的信息量之和 double curValue = 0.0; // 計(jì)數(shù)器:累加各分支 for (Map<Object, List<Sample>> splits : curSplits.values()) { double perSplitCount = 0; for (List<Sample> list : splits.values()) perSplitCount += list.size(); // 累計(jì)當(dāng)前分支樣本數(shù) double perSplitValue = 0.0; // 計(jì)數(shù)器:當(dāng)前分支 for (List<Sample> list : splits.values()) { double p = list.size() / perSplitCount; perSplitValue -= p * (Math.log(p) / Math.log(2)); } curValue += (perSplitCount / allCount) * perSplitValue; } // 選取最小為最優(yōu) if (minValue > curValue) { minIndex = attrIndex; minValue = curValue; minSplits = curSplits; } } return new Object[] { minIndex, minValue, minSplits }; } /** * 將決策樹輸出到標(biāo)準(zhǔn)輸出 */ static void outputDecisionTree(Object obj, int level, Object from) { for (int i = 0; i < level; i++) System.out.print("|-----"); if (from != null) System.out.printf("(%s):", from); if (obj instanceof Tree) { Tree tree = (Tree) obj; String attrName = tree.getAttribute(); System.out.printf("[%s = ?]\n", attrName); for (Object attrValue : tree.getAttributeValues()) { Object child = tree.getChild(attrValue); outputDecisionTree(child, level + 1, attrName + " = " + attrValue); } } else { System.out.printf("[CATEGORY = %s]\n", obj); } } /** * 樣本,包含多個(gè)屬性和一個(gè)指明樣本所屬分類的分類值 */ static class Sample { private Map<String, Object> attributes = new HashMap<String, Object>(); private Object category; public Object getAttribute(String name) { return attributes.get(name); } public void setAttribute(String name, Object value) { attributes.put(name, value); } public Object getCategory() { return category; } public void setCategory(Object category) { this.category = category; } public String toString() { return attributes.toString(); } } /** * 決策樹(非葉結(jié)點(diǎn)),決策樹中的每個(gè)非葉結(jié)點(diǎn)都引導(dǎo)了一棵決策樹 * 每個(gè)非葉結(jié)點(diǎn)包含一個(gè)分支屬性和多個(gè)分支,分支屬性的每個(gè)值對(duì)應(yīng)一個(gè)分支,該分支引導(dǎo)了一棵子決策樹 */ static class Tree { private String attribute; private Map<Object, Object> children = new HashMap<Object, Object>(); public Tree(String attribute) { this.attribute = attribute; } public String getAttribute() { return attribute; } public Object getChild(Object attrValue) { return children.get(attrValue); } public void setChild(Object attrValue, Object child) { children.put(attrValue, child); } public Set<Object> getAttributeValues() { return children.keySet(); } } }
運(yùn)行結(jié)果:
以上是“Java如何實(shí)現(xiàn)決策樹算法”這篇文章的所有內(nèi)容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內(nèi)容對(duì)大家有所幫助,如果還想學(xué)習(xí)更多知識(shí),歡迎關(guān)注創(chuàng)新互聯(lián)行業(yè)資訊頻道!
當(dāng)前題目:Java如何實(shí)現(xiàn)決策樹算法
網(wǎng)站地址:http://muchs.cn/article26/ghiijg.html
成都網(wǎng)站建設(shè)公司_創(chuàng)新互聯(lián),為您提供網(wǎng)站維護(hù)、搜索引擎優(yōu)化、自適應(yīng)網(wǎng)站、網(wǎng)站制作、小程序開發(fā)、移動(dòng)網(wǎng)站建設(shè)
聲明:本網(wǎng)站發(fā)布的內(nèi)容(圖片、視頻和文字)以用戶投稿、用戶轉(zhuǎn)載內(nèi)容為主,如果涉及侵權(quán)請(qǐng)盡快告知,我們將會(huì)在第一時(shí)間刪除。文章觀點(diǎn)不代表本網(wǎng)站立場(chǎng),如需處理請(qǐng)聯(lián)系客服。電話:028-86922220;郵箱:631063699@qq.com。內(nèi)容未經(jīng)允許不得轉(zhuǎn)載,或轉(zhuǎn)載時(shí)需注明來(lái)源: 創(chuàng)新互聯(lián)