giridhar reddy tatiparthi
About Candidate
IBM Certified Data Engineer with 1+ years of experience building, optimizing, and supporting large-scale real-time and batch ETL
pipelines in financial data environments. Hands-on expertise in Apache Airflow, Kafka, Spark, Python, SQL, and Google Cloud.
Proven track record of resolving complex ETL failures, implementing data quality frameworks, and improving pipeline reliability.
Actively seeking new opportunities to contribute to scalable data platform and analytics-driven solutions.
Location
Education
Work & Experience
EBS – Fintech | FDLH & FCP Teams | Hybrid GCP + Azure • Engineered and maintained production-grade Spark-based ETL pipelines on Google Dataproc, processing daily store and eCommerce sales transaction data from upstream SAS systems landed in GCS, directly supporting finance reporting and reconciliation workflows. • Designed partition-based incremental processing using batch_id and batch_date driven ingestion logic with rolling 7-day timestamp windows, storing transformed outputs in Parquet format partitioned by business date — reducing full dataset scans and improving query performance for downstream consumers. • Built and maintained a layered data processing architecture (Raw → Curated → Enriched → Consumption), applying business transformation rules across FDLH and FCP aligned to downstream finance requirements. • Supported hybrid cloud data architecture (GCP + Azure), enabling downstream consumption through Azure Data Factory (ADF) and Azure Databricks, while operating within Automic for SLA-driven batch orchestration and touch file-based data arrival validation. • Proactively raised and tracked Jira tickets for pipeline failures, job execution errors, data quality issues, schema changes, and SLA breaches — coordinating resolution across upstream SAS teams and downstream reporting stakeholders to minimize business impact. • Performed data validation and reconciliation including record count verification, data mismatch analysis, and root cause investigation, ensuring data accuracy across finance-critical reporting pipelines. • Monitored end-to-end production pipelines supporting the BigQuery-based consumption layer, triaged and resolved job failures, and maintained consistent reporting SLA adherence across high-volume daily batch processing cycles.
• Supported end-to-end ETL workflows in Hive and Big Query for reporting and analytics teams. • Maintained incremental and historical datasets using Apache Hudi, ensuring accuracy and preventing data duplication datasets. • Troubleshot Spark jobs and Kafka stream failures to reduce downtime for critical analytics pipelines.


