Decision Tree Score seems overfitted | Sololearn: Learn to code for FREE!
Novo curso! Todo programador deveria aprender IA generativa!
Experimente uma aula grƔtis
+ 1

Decision Tree Score seems overfitted

I have a large dataset and a label column. I try to use from sklearn.tree import DecisionTreeClassifier to make a tree and score it using .score(x,y). But before scoring the accuracy, I need to extract the label from the dataset and encode the remaining entire dataset to boolean using get_dummies(). After doing all these things, it seems overfitted because I get 100 accuracy scores. No matter how I change things in it, it always gives me 100 accuracy score. Is it normal?

2nd Sep 2017, 7:51 PM
Sura Wankam
6 Respostas
+ 4
Hmm.. decision trees tend to have really high scores, but in *all* cases it is surely indicating an overfitting. Could you share the code? Is the dataset split to train/validate/test? Maybe you should shuffle them or make a proper cross-validation?
2nd Sep 2017, 8:05 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 4
I'll add a comment in the code section in a while..
2nd Sep 2017, 8:49 PM
Kuba Siekierzyński
Kuba Siekierzyński - avatar
+ 1
Here is my code. if you have the dataset, you will see that the accuracy score is always 100. The accuracy score seems very abnormal to me. https://code.sololearn.com/cLlY2KmwlZr5/?ref=app
2nd Sep 2017, 8:17 PM
Sura Wankam
+ 1
The model is definitely overfitted as 100% percent accuracy for large datasets is not possible. You can fix it by: 1) Pruning 2) Using a different classifier
27th Dec 2022, 1:22 PM
Omanshu
Omanshu - avatar
0
For the train and split, I have used from sklearn.model_selection import train_test_split. I used this after doing get_dummies. It seems there is no problem there. There is really no need to do any more cross-validation. For my code, please wait for a while. I have to insert it. Maybe you need the dataset too, could you please send me your email or something I can use to send the file to you.
2nd Sep 2017, 8:11 PM
Sura Wankam
0
Ah... I forget a thing. You can download the dataset from the website I commented in the code. Also, when I try other datasets using import sklearn.datasets, I always get 100% accuracy score. That's why I have to ask.
2nd Sep 2017, 8:20 PM
Sura Wankam