0
Data Science tools
I am studying Data Science, and I am constantly looking for new skills / tools to learn. So far, I know Python (to mention but a few, scikit-learn & Keras libraries for models, requests & Selenium libraries for scraping, but also Pandas, numpy and the basic Python libraries), as well a bit of R. I have seen people use Scala, Hadoop and other tools I am still not familiar with, but I am not sure which ones are particularily useful for a Data Scientist, thus which tools / Python libraries I need to learn in priority. Any thoughts ? Thanks
2 Antwoorden
+ 1
You should definitely learn matplotlib
0
The next step is to layer tools strategically, based on where you want to go in data science. Here’s a structured roadmap:
1️⃣ Data Wrangling & Analysis (Advanced)
SQL (critical for any data role — PostgreSQL, MySQL, BigQuery)
Polars (faster DataFrame operations than Pandas in some cases)
Dask (parallel computing for big datasets in Python)
2️⃣ Machine Learning / AI
XGBoost / LightGBM (powerful gradient boosting libraries)
PyTorch (more flexible than Keras/TensorFlow for deep learning)
Hugging Face Transformers (for NLP, chatbots, embeddings)
3️⃣ Big Data & Distributed Processing
Apache Spark (via PySpark — must-have for big data jobs)
Hadoop (less trendy now, but still used in large data pipelines)
Scala (if you’re deep into Spark, Scala gives more control, but PySpark often suffices)
4️⃣ Data Engineering Foundations
Airflow (workflow orchestration)
Kafka (real-time streaming data)
AWS / GCP / Azure data tools (S3, BigQuery, Redshift, etc.)
5️⃣ Visualization & Storytelling
Plotly / Dash (interactive dashboards)
Tableau or Power BI (corporate-friendly dashboards)
Matplotlib / Seaborn (you probably know, but master them)
6️⃣ MLOps (Optional but valuable)
MLflow (model tracking & deployment)
Docker + Kubernetes (deployment skills)