Machine Learning-Split to Achieve Gain

This code works for the 2 first problems and not for the others....could someone please give me a hint as why it is not working? Thanks in advance S = [int(x) for x in input().split()] A = [int(x) for x in input().split()] B = [int(x) for x in input().split()] Sp =[] for n in S: if n == 1: Sp.append(n) posS = sum(Sp)/len(S) giniS = float(2*posS*(1-posS)) Ap =[] for n in A: if n == 1: Ap.append(n) posA = sum(Ap)/(len(A)) giniA = float(2*posA*(1-posA)) Bp =[] for n in B: if n == 1: Bp.append(n) posB = sum(Bp)/len(A) giniB = float(2*posB*(1-posB)) Info_gain = giniS - ((sum(A)/sum(S))*giniA) - ((sum(B)/sum(S))*giniB) print(float(round(Info_gain, 5)))

learning trees machine decision

27th Dec 2022, 5:15 PM

Catherine

3 Answers

+ 6

S = [int(x) for x in input().split()] A = [int(x) for x in input().split()] B = [int(x) for x in input().split()] # Calculate the Gini impurity of the original dataset Sp = [n for n in S if n == 1] posS = len(Sp) / len(S) giniS = 2 * posS * (1 - posS) # Calculate the Gini impurity of subset A Ap = [n for n in A if n == 1] posA = len(Ap) / len(A) giniA = 2 * posA * (1 - posA) # Calculate the Gini impurity of subset B Bp = [n for n in B if n == 1] posB = len(Bp) / len(B) giniB = 2 * posB * (1 - posB) # Calculate the information gain of the split split_ratio_A = len(A) / len(S) split_ratio_B = len(B) / len(S) info_gain = giniS - split_ratio_A * giniA - split_ratio_B * giniB print(round(info_gain, 5))

27th Dec 2022, 6:42 PM

Sadaam Linux

+ 5

The first thing I would recommend is to make sure that the input data is correctly formatted and that the code is being called with the correct inputs. It's also possible that the issue is with the calculation of the information gain. There are a few things to check here: Make sure that the probability values (posS, posA, posB) are being calculated correctly. These should be the proportion of 1's in each subset (Sp, Ap, Bp), divided by the total number of elements in the subset. Make sure that the Gini impurity values (giniS, giniA, giniB) are being calculated correctly. The formula for Gini impurity is 2 * pos * (1 - pos), where pos is the probability of an element being a 1. Check that the final information gain calculation is correct. This should be the difference between the Gini impurity of the original dataset and the weighted average of the Gini impurities of the two subsets, where the weight is the proportion of elements in each subset.

27th Dec 2022, 6:41 PM

Sadaam Linux

Thank you for your help....

28th Dec 2022, 10:20 AM

Catherine