Assistance with replicating the code from data science>linear regression>exploratory data analysis in Google Co Lab? | Sololearn: Learn to code for FREE!
New course! Every coder should learn Generative AI!
Try a free lesson
+ 1

Assistance with replicating the code from data science>linear regression>exploratory data analysis in Google Co Lab?

I am trying to replicate this bit of code, taken from a SoloLearn module, in Google Colab: import pandas as pd from sklearn.datasets import load_boston boston_dataset = load_boston() boston = pd.DataFrame(boston_dataset.data, columns = boston_dataset.feature_names) boston['MEDV']=boston_dataset.target print(boston.shape) However, it is outputting an error message which I am unsure what to make of as I am but a lowly learner on this app. The people in the comments recommend using Google Colab to replicate many of the codes in the Data Science module (pretty much everything past the data visualization part) This is the error that I have no idea what to make of. --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-e3316386af48> in <cell line: 2>() 1 import pandas as pd ----> 2 from sklearn.datasets import load_boston 3 boston_dataset = load_boston() 4 boston = pd.DataFrame(boston_dataset.data, 5 /usr/local/lib/python3.10/dist-packages/sklearn/datasets/__init__.py in __getattr__(name) 154 """ 155 ) --> 156 raise ImportError(msg) 157 try: 158 return globals()[name] ImportError: `load_boston` has been removed from scikit-learn since version 1.2. The Boston housing prices dataset has an ethical problem: as investigated in [1], the authors of this dataset engineered a non-invertible variable "B" assuming that racial self-segregation had a positive impact on house prices [2]. Furthermore the goal of the research that led to the creation of this dataset was to study the impact of air quality but it did not give adequate demonstration of the validity of this assumption. The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning. In this special case, you can fetc

17th Sep 2023, 7:58 PM
Sean Pohlod
Sean Pohlod - avatar
3 Answers
+ 1
I figured it out! The answer was literally within the error message that was being printed. It was before my very eyes and I can't believe that I did not see it. This is what it said: #They provide this little note The scikit-learn maintainers therefore strongly discourage the use of this dataset unless the purpose of the code is to study and educate about ethical issues in data science and machine learning. In this special case, you can fetch the dataset from the original source:: """They provide this little snippett of code. It did not initially click in my head that I was supposed to copy and paste this into my code.""" import pandas as pd import numpy as np data_url = "http://lib.stat.cmu.edu/datasets/boston" raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None) data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]]) target = raw_df.values[1::2, 2] """And then they recommend alternative data sources, which also appear to be formatted in such a way that I can copy and paste them into my code. That was pretty cool of them. Makes life easier.""" Alternative datasets include the California housing dataset and the Ames housing dataset. You can load the datasets as follows:: from sklearn.datasets import fetch_california_housing housing = fetch_california_housing() for the California housing dataset and:: from sklearn.datasets import fetch_openml housing = fetch_openml(name="house_prices", as_frame=True) for the Ames housing dataset. #Lastly they provide what appears to be citations, but it could be usable code. I am unsure becauseI am still learning. [1] M Carlisle. "Racist data destruction?" <https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8> [2] Harrison Jr, David, and Daniel L. Rubinfeld. "Hedonic housing prices and the demand for clean air." Journal of environmental economics and management 5.1 (1978): 81-102. <https://www.researchgate.net/publication/4974606_Hedonic_housing_prices_and_the_demand_for_clean_air>
17th Sep 2023, 9:00 PM
Sean Pohlod
Sean Pohlod - avatar
0
Eh don't sweat it. Sometimes long error messages can be daunting, especially to us newbies lol.
17th Sep 2023, 11:11 PM
Aaron Lee
Aaron Lee - avatar