How to normalise only specific columns of a pandas Dataframe? | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
0

How to normalise only specific columns of a pandas Dataframe?

I am currently working on a Pandas dataframe in which I have a total of 52 columns (the features). I added a 53rd column which is my "Y" or the output column which contains numerical values. When applying the MinMax normalisation to the dataframe, I don't want it to apply to the "Y" column. How can we do that?

6th Mar 2020, 12:17 AM
Amarjeet Singh Khera
Amarjeet Singh Khera - avatar
11 Answers
+ 2
Amarjeet Singh tries this one i forget that index can't be copied in case of the data frame. try this code. x = df.iloc[:,0:53] min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(x) dataset = pd.DataFrame(x_scaled) dataset["Your 53rd column name"] = df["Your 53rd column name"]
7th Mar 2020, 5:20 PM
Maninder $ingh
Maninder $ingh - avatar
+ 2
Amarjeet Singh i don't why you are getting these types of errors in my case it works perfect. You can do one thing, first of all, drop your 53rd column from dataset then save this column data in another CSV file. Then do min_max normalise or do what you want to do with the left database. After done normalise on your data then again add your y column or your 53rd column with your normalised data.
8th Mar 2020, 8:33 AM
Maninder $ingh
Maninder $ingh - avatar
+ 1
Try this code. import pandas as pd from sklearn import preprocessing x = df.iloc[:, 0:-1] #returns a numpy array min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(x) df = pd.DataFrame(x_scaled)
6th Mar 2020, 2:47 AM
Maninder $ingh
Maninder $ingh - avatar
+ 1
Tibor Santa Okay so I have two separate datasets for testing and training and this is the code I am using to normalise them: import pandas as pd from sklearn.preprocessing import minmax_scale #train_df is my training dataframe. #train_norm is my required new normalised train dataframe. #labels is a list that includes the names of the 52+1 columns in my dataframe. #I only want to normalise the first 52 columns, and let the 53rd column be as it is, but I don't wana drop it because that is my target or "Y" column. #I am aware that the code below normalises all the 53 columns. But I have tried many variations and nothing is working. #I am also aware that I can add the 53rd column after normalisation but then it would be a tideous process as I will have to particularly fill all the rows of the last column. #10580 rows x 53 columns is the size of my train_df. train_norm = minmax_scale(train_df, feature_range=(0,1), axis=0) train_norm = pd.DataFrame(train_norm) train_norm.columns = labels train_norm
7th Mar 2020, 12:36 PM
Amarjeet Singh Khera
Amarjeet Singh Khera - avatar
+ 1
try this code. x = df.iloc[:,0:53] min_max_scaler = preprocessing.MinMaxScaler() x_scaled = min_max_scaler.fit_transform(x) dataset = pd.DataFrame(x_scaled) dataset["Your 53rd column name"] = df.iloc[53]
7th Mar 2020, 4:34 PM
Maninder $ingh
Maninder $ingh - avatar
0
Maninder $ingh hello, I appreciate your effort for helping out but the code your posted results in final two columns getting dropped out of the dataframe. I don't want to drop any columns. I need all 53 of them but I simply don't want to normalise the last 53rd column.
6th Mar 2020, 11:08 PM
Amarjeet Singh Khera
Amarjeet Singh Khera - avatar
0
Amarjeet Singh how are you applying the MinMax normalization? Share your code, otherwise it will be more difficult to help you.
7th Mar 2020, 6:48 AM
Tibor Santa
Tibor Santa - avatar
0
Nope. Doesn't work. Says, "cannot copy index"
7th Mar 2020, 5:02 PM
Amarjeet Singh Khera
Amarjeet Singh Khera - avatar
0
It says "cannot reindex from a duplicate axis"
7th Mar 2020, 5:26 PM
Amarjeet Singh Khera
Amarjeet Singh Khera - avatar
0
X = df.iloc[:,0:53] from sklearn.preprocessing import MinMaxScaler min_max_scaler = MinMaxScaler() X_scaled = min_max_scaler.fit_transform(X) df_norm = pd.DataFrame(X_scaled) df_norm ["name of 53rd column"] = df["name of 53rd column"] This will work
1st Mar 2023, 4:31 PM
A Hashmi