
Ansar H
Data Engineering, Azure,AWS, Databricks, Lakehouse , Spark, Fabric
Competenze

Consulta i miei servizi


Portfolio
Esperienza lavorativa
Senior Data Engineer
DIS • Full time
Sep 2023 - Jan 2026 • 2 yrs 4 mos
Responsible for: Led Databricks platform onboarding with cost analysis comparing AWS Glue and Databricks compute. Designed and managed end-to-end data pipelines using Databricks and AWS Glue. Utilized Apache Spark for batch and stream processing, ensuring data quality and consistency. Integrated data from PostgreSQL and Amazon S3. Performed ETL/ELT operations and maintained Delta Lake-based data lakes. Developed and deployed a DIS Analytics alerting process leveraging Amazon S3 event triggers, AWS Lambda, and Amazon SNS to automatically detect and notify on duplicate records, missing source files during data ingestion in Databricks. Built a conversion layer using Databricks Genie AI. Exposed data via Databricks SQL API & Databricks Delta Share with secure access controls (RBAC, authentication). Used Spark OCR to extract data from financial invoice PDFs stored in S3 and loaded results into Delta tables. Implemented data governance practices with Unity Catalog. Optimized Spark job performance, reducing costs by 50%. Monitored and tuned clusters for performance and reliability. Integrated Databricks with Power BI for reporting and dashboards.
Senior Data Consultant
Systems Limited (Regeneron) • Full time
Jul 2019 - Aug 2023 • 4 yrs 1 mo
Responsible for: Delivered big data solutions to various clients using AWS data platform services. Ingested data from SharePoint, SFTP, and cloud storage into the data lake using Apache NiFi and Pypark. Performed data transformation using PySpark; deployed scripts via Jenkins and scheduled DAGs in Airflow, running on EMR clusters. Collaborated with the Databricks team to implement solutions using AWS Databricks E2. Provisioned S3 buckets for audit logs and Terraform state files using Terraform scripts. Built CI/CD pipelines integrated with Bitbucket. Created and managed Databricks workspaces and policies. Configured Unravel with Databricks for Hive and S3 access control. Worked with Informatica Cloud to move data into Amazon Redshift. Developed Kafka producer/consumer applications on clusters managed with Zookeeper. Leveraged Kafka APIs and connectors (MySQL, PostgreSQL, MongoDB) for smooth message processing. Used Alteryx to build ETL pipelines; created functional diagrams and documented data flows in Confluence.
Data Engineer
The ENTERTAINER • Full time
Jul 2019 - Oct 2021 • 2 yrs 3 mos
Responsible for: Worked with structured, semi-structured, and unstructured data. Designed and implemented a scalable Big Data architecture using Databricks, Data Lake, and Delta Lake, incorporating automated data quality and governance controls. Delivered end-to-end data engineering solutions leveraging Databricks, Delta Lake, and Azure Synapse Analytics to support enterprise analytics needs. Developed high-performance batch data pipelines using PySpark and Spark SQL to ingest and transform data from MongoDB into the enterprise data lake. Built, scheduled, and orchestrated ETL/ELT pipelines in Azure Data Factory for Azure Synapse Analytics environments. Designed and implemented real-time streaming data pipelines using Azure Event Hub, enabling near real-time ingestion into Delta Lake. Created executive and operational dashboards using Tableau Desktop and Tableau Server, supporting automated and scheduled reporting. Utilized advanced MongoDB aggregation and indexing techniques to support ad hoc and analytical reporting use cases.