Full Stack AI Engineer · Author of the Mastering GenAI Series
I've done it all across the AI stack: trained models, built evaluation datasets, designed benchmarks the community ships to, and written 100+ articles before distilling it all into 4 books on production GenAI.
Four comprehensive guides on the most critical topics in modern AI engineering: RAG, agents, evaluation and LLM judges. All free.
Part of the Galileo Mastering GenAI Series · All books available free online
I'm a full stack AI engineer at Galileo (employee #26, Series A), leading open-source evaluations and developer relations. I built the Agent Leaderboard, an open benchmark for AI agents in real-world tasks, along with the Hallucination Index and the BRAG model series for RAG evaluation.
Before Galileo, I was founding engineer at Enterpret (employee #6, pre-seed) building NLP systems for customer feedback at scale, and a principal data scientist at TaskHuman designing semantic search using transformers. Earlier, I launched end-to-end AI initiatives at Morningstar as their first quantitative research hire.
My RAG evaluation research was featured in Andrew Ng's newsletter, I was a guest on the Latent Space Podcast talking agent evaluation, and I've been named one of the Top AI Developers to Watch in 2023.
Models, datasets, and benchmarks built in the open. Because evaluation should never be locked away behind a paywall.
An open, live benchmark ranking LLMs as enterprise support agents across real-world tasks and domains. The community's go-to reference for production agent evaluation. 445+ likes on HuggingFace.
A multi-domain evaluation dataset powering the Agent Leaderboard, covering diverse enterprise support scenarios to stress-test agent reliability, consistency, and safety across domains.
A series of state-of-the-art open-source LLMs fine-tuned for retrieval-augmented generation. BRAG models outperform much larger models on RAG evaluation benchmarks while staying production-friendly in size and cost.
A living index tracking factual reliability and hallucination rates across foundation models. Gives engineering teams the data they need to pick the right model for accuracy-critical production workloads.
From the Latent Space podcast to PyData to community livestreams. Speaking forces you to clarify your thinking in ways writing cannot.
Leading open-source evaluations and developer relations. Built the Agent Leaderboard (open benchmark for AI agents), the Hallucination Index, and the BRAG model series. Authored 4 books on GenAI published under the Mastering GenAI Series.
Founded and grew Maxpool, an open community of AI professionals building production GenAI systems. Built Max, an AI research assistant for the Discord community. Organizes talks, workshops, and knowledge sharing across the global AI engineering community.
Built semantic search, reranking, text generation, MLOps, and NLP pipelines to categorize 50K+ topics of customer feedback. Extensive work on auto-labeling and fine-tuning transformer models for enterprise NLP.
Contributed to Jina, the open-source neural search framework powering multi-modal search at scale.
Led development of core semantic search using transformers for autocompletion and search. Developed and deployed unsupervised deep learning recommendation engine. Designed personalized notification system.
First hire in the new quantitative research team. Led full-stack NLP with deep learning: financial document extraction, sentiment analysis (BERT/ULMFiT), fund rating with ML, and search/recommendation with Elasticsearch. 1st prize at Morningstar Hackathon.
Founded in 2019, Maxpool is a thriving community of AI engineers building production systems. Join practitioners from around the world sharing research, tools, and real-world lessons.
I write about AI engineering, evaluation, and the business of AI. My Substack newsletter "Pratik's Pakodas" covers tech and business insights. Deep technical dives live on galileo.ai/blog.
Whether it is AI evaluation, production systems, speaking, or community, I am always up for a good conversation.