Machine Learning Data Engineer

Sushma Vunnam.

Building data infrastructure that moves 500M+ records daily — Spark, Kafka, Delta Lake, and GenAI at enterprise scale.

Portfolio Pipeline

🧑‍💻 About Me Sushma Vunnam
⚙️ Tech Stack Spark · Kafka · dbt · AWS
💼 Experience Walmart · UCode · Accenture
🚀 Projects Pipelines · RAG · GenAI
🎓 Education GMU · SASTRA · Certs

Impact by the numbers

0 Records per day
0 Years experience
0 Spark perf. boost
0 Records per batch

About me

I've always been drawn to bringing structure to complexity.

Sushma Vunnam
Walmart Global Tech
Dec 2024 – Present

Whether organizing an event where every detail needs to fall into place, or building a data pipeline where every record needs to land exactly where it should — the mindset is the same. It started at Accenture, where I found myself genuinely enjoying not just the technical side of handling large datasets, but the idea that clean, well-structured data actually helps people make better decisions.

That realization pushed me to pursue a Master's in Data Analytics Engineering at George Mason University. Today at Walmart Global Tech, I design and maintain the pipelines that supply chain, forecasting, and personalization teams depend on — processing 100M+ daily records across the full stack: ingestion, transformation, validation, and delivery.

What drives me is the moment a stakeholder finally understands something that used to feel opaque — or another team moves faster because a pipeline I built just works. That intersection of solid infrastructure and real business impact is where I do my best work.

What I work with

ML Data Engineer.

Apache Spark / PySpark 95%
📨Apache Kafka 90%
☁️AWS / Azure / Databricks 88%
🐍Python / FastAPI 92%
🧠GenAI / RAG Pipelines 82%
🗄️dbt / Delta Lake / Snowflake 87%

Career

Work experience.

Dec 2024 – Present

Walmart Global Tech

Data Engineer

Data Engineer

  • Designed real-time marketplace pipeline handling 50M+ events/day via Apache Kafka + Java consumers, powering live pricing and inventory decisions
  • Achieved 80% Spark performance boost via AQE tuning, broadcast joins, and dynamic partition strategy
  • Processed 6B+ records per batch on Harmony Platform with Delta Lake — Bronze → Silver → Gold lakehouse
  • Modeled star schema + SCD Type 2 dimensional schemas for downstream BI and ML teams
  • Built observability with Prometheus + Grafana for real-time SLA monitoring
Kafka Spark Delta Lake Java Prometheus Grafana

Jun – Nov 2024

UCode Technologies

Software Trainee

Software Trainee (Intern)

  • Built backend data processing services with Python (Flask/FastAPI) for analytics and automation
  • Developed ETL pipelines integrating MySQL, internal APIs, and preprocessing modules
  • Supported CI/CD automation for backend deployment pipelines
Python FastAPI MySQL CI/CD

Feb 2021 – Jun 2022

Accenture

Associate Engineer

Associate Software Engineer

  • Designed ETL pipelines processing 10M+ daily records from SAP, Oracle, and flat files using Python + Spark
  • Built data validation frameworks with schema checks, null handling, and business rule enforcement
  • Optimised pipeline performance via partitioning, caching, and query tuning
  • Supported data migration from legacy systems with mapping, transformation, and validation
PySpark Python SQL SAP Airflow

Built by me

Featured projects.

Big Data

Enterprise PySpark Pipeline

Scalable ETL processing 50M+ records end-to-end with 45% throughput improvement via AQE and partition strategy on Walmart's Harmony Platform.

PySpark AQE Delta Lake Python
🤖 Generative AI

Generative AI Chatbot — AWS Bedrock

Multi-model benchmarking on AWS Bedrock — Claude 3 Sonnet vs. Titan with zero-shot / few-shot prompt engineering. Claude 3 outperformed across all evaluation metrics.

AWS Bedrock Claude 3 Prompt Eng. Python
🧠 ML Infrastructure

Kafka → Milvus RAG Pipeline

Streaming Kafka topics into Milvus vector database for low-latency RAG within enterprise network boundaries. Targeting sub-100ms retrieval at scale.

Kafka Milvus RAG Vector DB
🏗️ Data Architecture

Cloud Data Lakehouse Platform

Multi-layer lakehouse (Bronze → Silver → Gold) on AWS S3 + Delta Lake with dbt quality checks and Airflow orchestration. Reduced reporting latency 70% vs legacy.

Delta Lake dbt AWS S3 Airflow

Academic

Education & certs.

2022 – 2024

MS, Data Analytics Engineering

George Mason University, USA

GPA: 3.70 / 4.0

2017 – 2021

B.Tech, Electrical & Electronics Engineering

SASTRA University, India

GPA: 7.0 / 10

Certifications

🤖

Leveraging AI & DE for Sustainable Solutions

LinkedIn Learning

In Progress

Claude Code 101

Anthropic

In Progress

🏢

Completion of Stream Training

Accenture India

Feb 2021

My Story

The engineer
behind the pipeline.

I've always been drawn to bringing structure to complexity — whether that's organizing an event where every detail needs to fall into place, or building a data pipeline where every record needs to land exactly where it should.

Outside of work, I enjoy planning and hosting events — decorating, organizing, making sure everything comes together smoothly. There's something deeply satisfying about starting with a blank space and a lot of moving pieces, and ending with something that just works. I didn't realize it at the time, but that's exactly the mindset that makes a good data engineer.

It started at Accenture, where my work involved handling large datasets and cleaning messy, inconsistent data. I found myself genuinely enjoying it — not just the technical side, but the idea that clean, well-structured data actually helps people make better decisions. That realization pushed me to pursue a Master's in Data Analytics Engineering at George Mason University, where I built a deeper foundation in data modeling, distributed systems, and applied machine learning.

Today, at Walmart Global Tech, I design and maintain the pipelines that teams across supply chain, forecasting, and personalization depend on. I work across the full stack — ingestion, transformation, validation, and delivery — processing 100M+ daily records with a focus on reliability and accuracy.

What drives me isn't just the engineering. It's the moment a stakeholder looks at a dashboard and finally understands something that used to feel opaque. It's another team moving faster because a pipeline I built just works. That intersection of solid infrastructure and real business impact is where I do my best work.

I'm currently open to full-time Data Engineer roles in the US. If you're building something that needs reliable, scalable data systems behind it — I'd love to connect.

Open to new roles

Let's
talk.

Open to Data Engineer & ML Engineer roles. Let's build something that scales.