I'm an ex-Apple ML engineer building production AI systems: from on-device NLP features used by 65M+ users to model evaluation infrastructure, RAG tooling, and multimodal data pipelines.

Interests

Production MLLLM evaluationApplied AIAgentic workflowsMultimodal perceptionOn-device intelligenceAI safety

What I Build

ML systems end-to-end: model training, evaluation pipelines, RAG tooling, on-device inference, and the CI/CD infrastructure that gets it all to production reliably. At Apple, I shipped production ML across multiple iOS releases and built Python-based CI/CD systems for dataset validation, model regression testing, and release quality gates.


AI Evaluation + Multimodal Systems

  • Multi-turn LLM safety evaluation, co-advised by Prof. Rosanna Bellini and Prof. Damon McCoy : built an automated harness to collect and evaluate chatbot behavior across thousands of multi-turn conversations, using LLM-as-judge scoring validated at Cohen's Kappa 0.80–0.85.
  • Multimodal video and data pipelines for robotic policy learning with Prof. Lerrel Pinto, NYU CILVR: 3× dataset generation via diffusion-based augmentation, cross-modal grounding with CLIP and VLMs, imitation learning workflows in JAX.

How I Work

I care about the full path from model behavior to user impact; where latency hides, how failures are caught, what it takes to ship AI outside a notebook.


Education

NYU Tandon

Master's, Computer Science

May 2026 · New York

NIT Surathkal

Bachelor's, Computer Science & Engineering

May 2021 · India


Outside the Code

Basketball, table tennis, and photography, preferably in a city I've never been to before.

Profile photo

New York, USA

Languages

PythonSwiftSQLC++Java

ML / AI

PyTorchJAXHuggingFaceCoreMLLangChainOpenAI APIGoogle ADK

Infra / Tools

GitMLflowCI/CDSparkFastAPIJupyter

Up for talking ML systems, AI evaluation, or interesting research problems.

Contact Me
Abha Wadjikar