Lowest price ever! Learn Generative AI for 48% less!
Get my discount0
Machine Learning-Split to Achieve Gain
This code works for the 2 first problems and not for the others....could someone please give me a hint as why it is not working? Thanks in advance S = [int(x) for x in input().split()] A = [int(x) for x in input().split()] B = [int(x) for x in input().split()] Sp =[] for n in S: if n == 1: Sp.append(n) posS = sum(Sp)/len(S) giniS = float(2*posS*(1-posS)) Ap =[] for n in A: if n == 1: Ap.append(n) posA = sum(Ap)/(len(A)) giniA = float(2*posA*(1-posA)) Bp =[] for n in B: if n == 1: Bp.append(n) posB = sum(Bp)/len(A) giniB = float(2*posB*(1-posB)) Info_gain = giniS - ((sum(A)/sum(S))*giniA) - ((sum(B)/sum(S))*giniB) print(float(round(Info_gain, 5)))
3 Answers
+ 6
S = [int(x) for x in input().split()]
A = [int(x) for x in input().split()]
B = [int(x) for x in input().split()]
# Calculate the Gini impurity of the original dataset
Sp = [n for n in S if n == 1]
posS = len(Sp) / len(S)
giniS = 2 * posS * (1 - posS)
# Calculate the Gini impurity of subset A
Ap = [n for n in A if n == 1]
posA = len(Ap) / len(A)
giniA = 2 * posA * (1 - posA)
# Calculate the Gini impurity of subset B
Bp = [n for n in B if n == 1]
posB = len(Bp) / len(B)
giniB = 2 * posB * (1 - posB)
# Calculate the information gain of the split
split_ratio_A = len(A) / len(S)
split_ratio_B = len(B) / len(S)
info_gain = giniS - split_ratio_A * giniA - split_ratio_B * giniB
print(round(info_gain, 5))
+ 5
The first thing I would recommend is to make sure that the input data is correctly formatted and that the code is being called with the correct inputs.
It's also possible that the issue is with the calculation of the information gain. There are a few things to check here:
Make sure that the probability values (posS, posA, posB) are being calculated correctly. These should be the proportion of 1's in each subset (Sp, Ap, Bp), divided by the total number of elements in the subset.
Make sure that the Gini impurity values (giniS, giniA, giniB) are being calculated correctly. The formula for Gini impurity is 2 * pos * (1 - pos), where pos is the probability of an element being a 1.
Check that the final information gain calculation is correct. This should be the difference between the Gini impurity of the original dataset and the weighted average of the Gini impurities of the two subsets, where the weight is the proportion of elements in each subset.
0
Thank you for your help....