Data · Blended · 11 weeks
Data Pipelines with Python
₩1,680,000
Design batch and streaming pipelines with pandas, Polars, and orchestration basics. Emphasis on data quality checks and reproducible notebooks that hiring managers can audit.
Modules & labs
- Schema validation with pydantic models
- Airflow-style DAG concepts (local runner)
- Data quality dashboards
- Parquet and columnar storage labs
- Privacy-aware sampling techniques
- Mentor review of pipeline diagrams
Outcomes
- Document a pipeline with failure recovery steps
- Implement validation gates before warehouse loads
- Present lineage maps for a capstone dataset
Soyeon Lee
Data engineer mentoring on reliable ingestion patterns.
FAQ
Basic SELECT/JOIN knowledge helps. We include a two-week SQL refresher module.
Participant notes
Our team adopted the validation checklist from week 5 on a legacy CSV import — fewer silent failures.
Polars section moved fast; office hours helped me catch up on lazy evaluation.