Description
In the “Intermediate Python for Data Engineers” training, you'll learn how to perform common Data Engineering tasks in Python: from loading common file formats to unlocking APIs and saving and later loading Python objects (such as trained Machine Learning models). Afterwards, you can effectively use Python to write scripts for data processing anywhere, for example in Databricks or Azure Functions.
At the end of the course, you can write Python scripts to unlock and process data from various sources. Here, the focus is on loading, storing and implementing more complex sources, APIs and file formats.
Audience
The “Intermediate Python for Data Engineers” training is aimed at Data Engineers, Data Analysts and Data Scientists who want to be able to process data effectively. In terms of cloud use, we focus on Azure, but the ways of working are not Azure-specific: participants who work more on-premises, in private clouds or on other public clouds (e.g. AWS, GCP or Oracle Cloud) will also benefit from this training.
Experience with Python is required for this training. We expect you have already mastered at least the following:
- Reading simple CSV files.
- Loading and using Python modules.
- Do simple data edits with DataFrames, for example in Pandas, Koalas, or PySpark.
Methods
Contact us for all information about this course.
Contents
Over the course of two days, we work with many hands-on assignments in Python. Afterwards, you will have achieved the following learning objectives:
- Being able to process complex (er) files, such as nested JSON files, XML, and Parquet
- Understanding how file systems differ in Windows and Linux environments
- Be able to copy and move files
- Knowing when to run things in Python or better within a shell environment
- Can use Pickle to store Python objects such as trained ML models or processed Data Frames on a Data Lake or disk.
- Able to read and write to an Azure Data Lake using the Azure modules
- Be able to unlock APIs and know smart ways to do this on a larger scale
- Use logging to monitor progress in a structured way during the output of your program code and to connect to existing logging solutions.
Certification
Participation certificate: At the end of the training, participants will receive a certificate that they have completed this course.