Open to full-time AI / ML Engineer roles · US

Raghav Upadhyay

> _

I build production-grade LLM systems — from retrieval to evaluation to deployment.

RAG pipelines with hybrid retrieval and cross-encoder reranking, hallucination detection, LLM-as-a-judge evaluation wired into CI, and deep learning for predictive maintenance. M.S. Data Science, University of Arizona (May 2026).

View Projects Try AskMyDocs Live

0 AI / ML Projects

0 GPA / 4.00

0 Experiments Run

0 Live LLM App

SCROLL

// 01 — about

Engineer, not just experimenter

I'm Raghav Upadhyay, an AI Engineer focused on the unglamorous part of LLM work: making systems reliable. I care about retrieval quality, grounded answers, measurable evaluation, and shipping things people can actually use.

My flagship project, AskMyDocs, is a production-grade RAG app with hybrid retrieval, cross-encoder reranking, hallucination detection, and an LLM-as-a-judge harness that fails the CI build when answer quality drops below threshold. Beyond LLMs, I've built uncertainty-aware deep learning for predictive maintenance and run large cross-cultural studies of LLM behaviour.

Currently finishing my M.S. in Data Science at the University of Arizona (May 2026) and open to full-time AI / ML Engineer roles across the US.

📍 United States 🎓 MS Data Science · UArizona ⚡ Available May 2026

// 02 — projects

Things I've built & shipped

★ Featured Production RAG

AskMyDocs — Production-Grade RAG with CI-Gated Evaluation

End-to-end RAG app answering questions over uploaded PDFs with citation-grounded responses. Hybrid retrieval (ChromaDB vector 60% + BM25 40%) re-scored by a cross-encoder reranker, hallucination detection on uncited answers, and an LLM-as-a-judge harness in GitHub Actions that fails the build below faithfulness 0.70 / relevance 0.70 / citation rate 0.80. Deployed live on Hugging Face Spaces.

60/40vector / BM25

≥0.70faithfulness gate

≥0.80citation rate

LLaMA (Groq) ChromaDB BM25 Cross-Encoder GitHub Actions Gradio

Live Demo → GitHub →

LLM Research

LLM-Generated Social Networks — Cross-Cultural Study (Capstone)

Replicated and extended an ICWSM 2025 paper on whether LLMs generate structurally realistic social networks. Built a full generation–analysis–benchmark pipeline across a 4×4×4×3 experimental matrix (prompting × cultures × languages × GPT-4.1 tiers, 96+ verified conditions), quantifying inter-model divergence and identifying political affiliation as the dominant homophily dimension.

GPT-4.1 NetworkX pandas Experiment Design

View on GitHub →

Deep Learning

RUL Prediction with LSTM — Uncertainty-Aware Predictive Maintenance

Benchmarked four LSTM architectures (Vanilla, Stacked BiLSTM, LSTM-Attention, CNN-LSTM) on NASA C-MAPSS for turbofan Remaining Useful Life. Monte Carlo Dropout and Deep Ensembles for uncertainty quantification (mean ± 2σ risk-adjusted decisions), gradient-based XAI with temporal attention heatmaps, and an ipywidgets dashboard with tiered maintenance alerts.

PyTorch TensorFlow Uncertainty XAI

View on GitHub →

NLP

Commonsense Reasoning with Pre-trained Language Models

Benchmarked RoBERTa-MNLI and OPT-1.3B on commonsense reasoning datasets, analyzing zero-shot and fine-tuned performance across PIQA and related tasks with Hugging Face Transformers.

RoBERTa OPT-1.3B PyTorch Hugging Face

View on GitHub →

ML / NLP

Hate Speech Detection

Text classification pipeline detecting hate speech in online content with Logistic Regression and Random Forest, reaching 92% accuracy on a Kaggle-sourced dataset to support content-moderation use cases.

Python NLP scikit-learn

View on GitHub →

// 03 — experience

Where I've worked

Data Analyst Intern

Jan 2025 – Aug 2025

Sudhir Mehrotra & Associates, Chartered Accountants · Bareilly, India (Hybrid)

Built Python ETL pipelines that automated financial workflows, cutting manual processing effort by 30%.
Developed a time-series cash-flow forecasting module that improved estimation accuracy by 15% over baseline.
Automated recurring Excel reporting with Python and VBA, eliminating multi-hour weekly manual tasks for the audit team.
Designed structured data-reporting systems for audit and compliance teams, improving traceability across reviews.

Python ETL Time-Series Forecasting Excel / VBA

// 04 — skills

Tools of the trade

🤖

LLMs & RAG

OpenAI GPT-4.1LLaMA (Groq)RAG pipelinesHybrid searchCross-encoder rerankChromaDBsentence-transformersPrompt engineeringCitation grounding

⚖️

LLM Eval & Reliability

LLM-as-a-judgeHallucination detectionFaithfulness / relevanceCitation metricsCI-gated thresholdsVersioned prompts

🔥

Deep Learning

PyTorchTensorFlowKerasHF TransformersLSTMAttentionMC DropoutDeep ensemblesUncertaintyXAI

</>

Languages

PythonSQLRBashJavaScript

📊

Data & Viz

pandasNumPyNetworkXMatplotlibSeabornJupyteripywidgetsGradio

⚙️

MLOps & Tools

GitGitHub ActionsAWSLinuxHF SpacesREST APIsRAPIDS (GPU)

🗄️

Databases

MySQLPostgreSQLMongoDBChromaDB (vector)

// 05 — certifications

Verified credentials

Fundamentals of Accelerated Data Science

NVIDIA · Issued Oct 2025

GPU-Accelerated Computing (RAPIDS Ecosystem)
Core Data Science Foundations
Applied Machine Learning
Accelerated Ecosystem Integration

View Certificate →

AWS Academy Cloud Operations

Amazon Web Services · Issued Nov 2022

Cloud infrastructure operations
Monitoring & management on AWS
Deployment & automation fundamentals

View Certificate →

// 06 — education

Academic background

2024 – 2026

M.S. in Data Science

University of Arizona

GPA: 3.77 / 4.00

2020 – 2024

B.Tech, Computer Software Engineering

SRM University, Chennai

Bachelor of Technology

// 08 — contact

Let's build something

Open to full-time AI / ML Engineer roles in the US (on-site, hybrid, or remote). If you're hiring — or just want to talk LLMs — reach out.

Email raghav0408upadhyay@gmail.com LinkedIn raghavupadhya2002 GitHub raghav-upadhyay2002 Hugging Face raghavupadhyay

Raghav Upadhyay

> _

Engineer, not just experimenter

Things I've built & shipped

AskMyDocs — Production-Grade RAG with CI-Gated Evaluation

LLM-Generated Social Networks — Cross-Cultural Study (Capstone)

RUL Prediction with LSTM — Uncertainty-Aware Predictive Maintenance

Commonsense Reasoning with Pre-trained Language Models

Hate Speech Detection

Where I've worked

Data Analyst Intern

Sudhir Mehrotra & Associates, Chartered Accountants · Bareilly, India (Hybrid)

Tools of the trade

LLMs & RAG

LLM Eval & Reliability

Deep Learning

Languages

Data & Viz

MLOps & Tools

Databases

Verified credentials

Fundamentals of Accelerated Data Science

AWS Academy Cloud Operations

Academic background

M.S. in Data Science

B.Tech, Computer Software Engineering

Want the one-pager?

Let's build something