Python Data Engineering - Learn Data Engineering with Python: visualize, manage and manipulate data at scale
Intermediate18h 44minLast updated 08/2025
16
Skills1
AI SimEmbark on the Python Data Engineering skill path, tailored for software engineers aiming to master data management. This course guides you from the fundamentals of Python to advanced techniques in data manipulation and management. You'll learn to efficiently process and analyze large datasets using pandas and NumPy, and visualize your insights with Matplotlib. The curriculum covers all phases of data treatment—from initial data wrangling to the final presentation of your findings. Ideal for software engineers seeking to enhance their data engineering skills and work effectively with data at scale.
Content (39)
What is Data Engineering
In this chapter we are going to learn more about the definition of data engineering and what that really takes into account. You will learn the tasks and core competencies of data engineers and we will introduce you to a few key concepts like ETL
Data engineering skills
Python Data Tools: how to manage data at scale with Python
This chapter is fully focused on the tools you can use with Python to manage large datasets of data. You will learn how to use Jupyter notebooks, Numpy and Pandas.
Jupyter Notebook
NumPy
Pandas
Python
Extracting data with Python and other tools
In this chapter, we dive deep into data extraction using various techniques. We will use Python and SQL in a variety of environments and use cases.
Data Mapping and Extraction
Microsoft Excel
Selenium
SQL
Exploratory Data Analysis: understand your data and their characteristics
Exploratory Data Analysis is an approach to analyzing datasets that involves summarizing their main characteristics, often using visual methods. It’s a critical first step in data analysis, allowing analysts to discover patterns and more
Data Mapping
Exploratory Analysis
Data organization and cleaning
In this chapter, you will learn how to clean up data before working on it. This is usually related to the removal of columns, information, and other data that is not needed.
Data Cleaning and Preprocessing
Reorganizing the data can be a step in exploratory data analysis or in the data organization part. To go into depth, this article outlines key methods to reorganize your DataFrames to suit your data.
Data visualization MatPlotLib: a complete Python library for data visualization
Now that you have cleaned your data it's time to visualize them and it's chapter is all about learning how to use a few Python libraries to easily visualize data and build any type of chart.
Data Visualization
Python
Building a data pipeline
A data pipeline automates the flow of data from sources to storage and analysis. It is the final result of everything we have seen in this path: It involves extraction, transformation, and loading (ETL) processes to prepare and move data efficiently.
Data engineering skills
Data Manipulation
Data Mapping
Data Modeling
Data Pipelines
Introduction
Chris KochInstructional designer, Instructor and Sr. Software Engineer
This path has been curated by the Anthropos team in collaboration with Chris Koch
Skill objectives