close
close

first Drop

Com TW NOw News 2024

(Discussion) ambiguity in decision tree construction
news

(Discussion) ambiguity in decision tree construction

(Discussion) ambiguity in decision tree construction

https://preview.redd.it/0013142so4jd1.png?width=707&format=png&auto=webp&s=8764b6772fdabe809e7a4db449b501a6d5df201c

I am trying to perform the numerical calculations of the decision tree for the brand example shown in Figure 1
when i try to construct the decision tree using entropy by hand i get

https://preview.redd.it/zdllqlbzo4jd1.png?width=1106&format=png&auto=webp&s=15deb539b6ed7643df0041b09d0dd880c6fb0ee6

now when i tried to construct the decision tree using the below code

import pandas as pd from sklearn.preprocessing import OrdinalEncoder from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score data = pd.DataFrame({ "Day": ('D1', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9', 'D10', 'D11', 'D12', 'D13', 'D14'), "Outlook": ('Sunny', 'Sunny', 'Overcast', 'Rain', 'Rain', 'Rain', 'Overcast', 'Sunny', 'Sunny', 'Rain', 'Sunny', 'Overcast', 'Overcast', 'Rain'), "Temp": ('Hot', 'Hot', 'Hot', 'Mild', 'Cool', 'Cool', 'Cool', 'Mild', 'Cool', 'Mild', 'Mild', 'Mild', 'Hot', 'Mild'), "Humidity": ('High', 'High', 'High', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'Normal', 'Normal', 'High', 'Normal', 'High'), "Wind": ('Weak', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Weak', 'Weak', 'Strong', 'Strong', 'Weak', 'Strong'), "PlayTennis": ('No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No') }) encoder = OrdinalEncoder() data_encoded = data.copy() data_encoded(('Outlook', 'Temp', 'Humidity', 'Wind')) = encoder.fit_transform(data(('Outlook', 'Temp', 'Humidity', 'Wind'))) data_encoded('PlayTennis') = data_encoded('PlayTennis').map({'No': 0, 'Yes': 1}) X = data_encoded.drop(('PlayTennis', 'Day'), axis=1) y = data_encoded('PlayTennis') clf = DecisionTreeClassifier(criterion='entropy') clf.fit(X, y) plt.figure(figsize=(12, 8)) plot_tree(clf, feature_names=X.columns, class_names=('No', 'Yes'), filled=True, rounded=True) plt.show() 

i get this tree

https://preview.redd.it/r1g3x10ep4jd1.png?width=950&format=png&auto=webp&s=bde00bf06025f3a5847717045528128c8283b817

Am I doing something wrong in the manual numerical calculations or am I using the wrong coding? Can someone explain this?

submitted by /u/jiraiya1729
(link) (reactions)