I will build a production ready etl data pipeline using AWS, airflow, and pyspark
Data Engineer, AWS, Apache Airflow, Spark, PostgreSQL, ETL
Informazioni su questo servizio
Are you drowning in raw data with no reliable way to process it?
I build production-grade data pipelines that run automatically, scale with your data, and never break silently. No spaghetti scripts. No manual steps. Just clean, reliable data exactly where you need it.
What I Build
- ETL pipelines using Python and PySpark extract, transform, load, done
- Apache Airflow DAGs for fully automated, scheduled workflows
- Medallion Architecture pipelines (Bronze Silver Gold) with data quality at every layer
- AWS data platforms S3 data lake, Glue, EMR on EKS, IAM, Terraform
- Cloud ingestion pipelines from any source into PostgreSQL, MySQL, ClickHouse, or Supabase
- Fully containerised setups with Docker and Docker Compose
- One-command deployments with CI/CD no manual SSH, no runbooks
Il mio portfolio
FAQ
Q: What information do you need to get started?
A: Your data source (S3, API, database, CSV), your target destination, transformation requirements, and how often the pipeline should run.
Q: Can you work with my existing infrastructure?
A: Yes. Send me details and I will assess compatibility before we start.
Q: Do I need an AWS account?
A: For AWS-based work yes — you will need your own account. I can guide you through the setup if needed.
Q: Will I own the code?
A: Completely. All source code is handed over to you on delivery.
Q: Can you handle large datasets?
A: Yes. I use PySpark and EMR on EKS specifically because they are built for large-scale data processing.
Q: What if something breaks after delivery?
A: I offer post-delivery support. Message me and I will fix it.

