Machine Learning Data Engineer

Sushma Vunnam.

Building data infrastructure that moves 500M+ records daily — Spark, Kafka, Delta Lake, and GenAI at enterprise scale.

▶ █

📄 Resume Get in Touch LinkedIn ↗

Portfolio Pipeline

🧑‍💻 About Me Sushma Vunnam

⚙️ Tech Stack Spark · Kafka · dbt · AWS

💼 Experience Walmart · UCode · Accenture

🚀 Projects Pipelines · RAG · GenAI

🎓 Education GMU · SASTRA · Certs

Apache Spark Apache Kafka Delta Lake dbt AWS Bedrock Databricks Python PySpark Snowflake Apache Airflow Vector DB RAG Pipelines Claude 3 Java 17 Apache Spark Apache Kafka Delta Lake dbt AWS Bedrock Databricks Python PySpark Snowflake Apache Airflow Vector DB RAG Pipelines Claude 3 Java 17

Impact by the numbers

0 Records per day

0 Years experience

0 Spark perf. boost

0 Records per batch

About me

I've always been drawn to bringing structure to complexity.

Walmart Global Tech
Dec 2024 – Present

Whether organizing an event where every detail needs to fall into place, or building a data pipeline where every record needs to land exactly where it should — the mindset is the same. It started at Accenture, where I found myself genuinely enjoying not just the technical side of handling large datasets, but the idea that clean, well-structured data actually helps people make better decisions.

That realization pushed me to pursue a Master's in Data Analytics Engineering at George Mason University. Today at Walmart Global Tech, I design and maintain the pipelines that supply chain, forecasting, and personalization teams depend on — processing 100M+ daily records across the full stack: ingestion, transformation, validation, and delivery.

What drives me is the moment a stakeholder finally understands something that used to feel opaque — or another team moves faster because a pipeline I built just works. That intersection of solid infrastructure and real business impact is where I do my best work.

📄 Resume Let's Connect

What I work with

ML Data Engineer.

⚡Apache Spark / PySpark 95%

📨Apache Kafka 90%

☁️AWS / Azure / Databricks 88%

🐍Python / FastAPI 92%

🧠GenAI / RAG Pipelines 82%

🗄️dbt / Delta Lake / Snowflake 87%

⚡Spark

📨Kafka

☁️AWS

🔷Azure

🧱Delta Lake

🔄dbt

🐍Python

☕Java

❄️Snowflake

🌀Airflow

🤖Bedrock

🧠RAG

🐬Databricks

🐳Docker

📊Grafana

🔭Milvus

Career

Work experience.

Dec 2024 – Present

Walmart Global Tech

Data Engineer

Designed real-time marketplace pipeline handling 50M+ events/day via Apache Kafka + Java consumers, powering live pricing and inventory decisions
Achieved 80% Spark performance boost via AQE tuning, broadcast joins, and dynamic partition strategy
Processed 6B+ records per batch on Harmony Platform with Delta Lake — Bronze → Silver → Gold lakehouse
Modeled star schema + SCD Type 2 dimensional schemas for downstream BI and ML teams
Built observability with Prometheus + Grafana for real-time SLA monitoring

Kafka Spark Delta Lake Java Prometheus Grafana

Jun – Nov 2024

UCode Technologies

Software Trainee

Software Trainee (Intern)

Built backend data processing services with Python (Flask/FastAPI) for analytics and automation
Developed ETL pipelines integrating MySQL, internal APIs, and preprocessing modules
Supported CI/CD automation for backend deployment pipelines

Python FastAPI MySQL CI/CD

Feb 2021 – Jun 2022

Accenture

Associate Engineer

Associate Software Engineer

Designed ETL pipelines processing 10M+ daily records from SAP, Oracle, and flat files using Python + Spark
Built data validation frameworks with schema checks, null handling, and business rule enforcement
Optimised pipeline performance via partitioning, caching, and query tuning
Supported data migration from legacy systems with mapping, transformation, and validation

PySpark Python SQL SAP Airflow

Built by me

Featured projects.

⚡ Big Data

Enterprise PySpark Pipeline

Scalable ETL processing 50M+ records end-to-end with 45% throughput improvement via AQE and partition strategy on Walmart's Harmony Platform.

PySpark AQE Delta Lake Python

🤖 Generative AI

Generative AI Chatbot — AWS Bedrock

Multi-model benchmarking on AWS Bedrock — Claude 3 Sonnet vs. Titan with zero-shot / few-shot prompt engineering. Claude 3 outperformed across all evaluation metrics.

AWS Bedrock Claude 3 Prompt Eng. Python

🧠 ML Infrastructure

Kafka → Milvus RAG Pipeline

Streaming Kafka topics into Milvus vector database for low-latency RAG within enterprise network boundaries. Targeting sub-100ms retrieval at scale.

Kafka Milvus RAG Vector DB

🏗️ Data Architecture

Cloud Data Lakehouse Platform

Multi-layer lakehouse (Bronze → Silver → Gold) on AWS S3 + Delta Lake with dbt quality checks and Airflow orchestration. Reduced reporting latency 70% vs legacy.

Delta Lake dbt AWS S3 Airflow

Academic

Education & certs.

2022 – 2024

MS, Data Analytics Engineering

George Mason University, USA

GPA: 3.70 / 4.0

2017 – 2021

B.Tech, Electrical & Electronics Engineering

SASTRA University, India

GPA: 7.0 / 10

Certifications

🤖

Leveraging AI & DE for Sustainable Solutions

LinkedIn Learning

In Progress

⚡

Claude Code 101

Anthropic

In Progress

🏢

Completion of Stream Training

Accenture India

Feb 2021

My Story

The engineer
behind the pipeline.

I've always been drawn to bringing structure to complexity — whether that's organizing an event where every detail needs to fall into place, or building a data pipeline where every record needs to land exactly where it should.

Outside of work, I enjoy planning and hosting events — decorating, organizing, making sure everything comes together smoothly. There's something deeply satisfying about starting with a blank space and a lot of moving pieces, and ending with something that just works. I didn't realize it at the time, but that's exactly the mindset that makes a good data engineer.

It started at Accenture, where my work involved handling large datasets and cleaning messy, inconsistent data. I found myself genuinely enjoying it — not just the technical side, but the idea that clean, well-structured data actually helps people make better decisions. That realization pushed me to pursue a Master's in Data Analytics Engineering at George Mason University, where I built a deeper foundation in data modeling, distributed systems, and applied machine learning.

Today, at Walmart Global Tech, I design and maintain the pipelines that teams across supply chain, forecasting, and personalization depend on. I work across the full stack — ingestion, transformation, validation, and delivery — processing 100M+ daily records with a focus on reliability and accuracy.

What drives me isn't just the engineering. It's the moment a stakeholder looks at a dashboard and finally understands something that used to feel opaque. It's another team moving faster because a pipeline I built just works. That intersection of solid infrastructure and real business impact is where I do my best work.

I'm currently open to full-time Data Engineer roles in the US. If you're building something that needs reliable, scalable data systems behind it — I'd love to connect.

Let's Talk 📄 Resume

Open to new roles

Let's
talk.

Open to Data Engineer & ML Engineer roles. Let's build something that scales.

in LinkedIn ⌥ GitHub @ Email ↓ Resume