Currently building at Galileo

Pratik
Bhavsar

Full Stack AI Engineer · Author of the Mastering GenAI Series

I've done it all across the AI stack: trained models, built evaluation datasets, designed benchmarks the community ships to, and written 100+ articles before distilling it all into 4 books on production GenAI.

25K+
LinkedIn
5K+
Substack
100+
Blogs
4
Books
4
Models
4
Benchmarks

Guides on Building
Production AI

Four comprehensive guides on the most critical topics in modern AI engineering: RAG, agents, evaluation and LLM judges. All free.

Part of the Galileo Mastering GenAI Series · All books available free online

My AI adventures
so far

I'm a full stack AI engineer at Galileo (employee #26, Series A), leading open-source evaluations and developer relations. I built the Agent Leaderboard, an open benchmark for AI agents in real-world tasks, along with the Hallucination Index and the BRAG model series for RAG evaluation.

Before Galileo, I was founding engineer at Enterpret (employee #6, pre-seed) building NLP systems for customer feedback at scale, and a principal data scientist at TaskHuman designing semantic search using transformers. Earlier, I launched end-to-end AI initiatives at Morningstar as their first quantitative research hire.

My RAG evaluation research was featured in Andrew Ng's newsletter, I was a guest on the Latent Space Podcast talking agent evaluation, and I've been named one of the Top AI Developers to Watch in 2023.

🎓
IIT Bombay M.Tech · NIT Surat B.Tech Computer Science & Engineering
Pratik holding his book

Tools the AI community
actually uses

Models, datasets, and benchmarks built in the open. Because evaluation should never be locked away behind a paywall.

Live Benchmark
🏆

Agent Leaderboard

An open, live benchmark ranking LLMs as enterprise support agents across real-world tasks and domains. The community's go-to reference for production agent evaluation. 445+ likes on HuggingFace.

Dataset
🗂️

Agent Eval Dataset

A multi-domain evaluation dataset powering the Agent Leaderboard, covering diverse enterprise support scenarios to stress-test agent reliability, consistency, and safety across domains.

SoTA Models

BRAG: SoTA RAG Models

A series of state-of-the-art open-source LLMs fine-tuned for retrieval-augmented generation. BRAG models outperform much larger models on RAG evaluation benchmarks while staying production-friendly in size and cost.

Research Index
🔬

Hallucination Index

A living index tracking factual reliability and hallucination rates across foundation models. Gives engineering teams the data they need to pick the right model for accuracy-critical production workloads.

Sharing what I learn
at scale

From the Latent Space podcast to PyData to community livestreams. Speaking forces you to clarify your thinking in ways writing cannot.

From pre-seed to Enterprise,
applying AI at the edge

Galileo
Galileo
AI Engineer
June 2023 – Present · Employee #26
Current

Leading open-source evaluations and developer relations. Built the Agent Leaderboard (open benchmark for AI agents), the Hallucination Index, and the BRAG model series. Authored 4 books on GenAI published under the Mastering GenAI Series.

Agent EvaluationRAGOpen SourceDeveloper RelationsSeries A
Maxpool
Maxpool
Founder
January 2019 – Present

Founded and grew Maxpool, an open community of AI professionals building production GenAI systems. Built Max, an AI research assistant for the Discord community. Organizes talks, workshops, and knowledge sharing across the global AI engineering community.

Community BuildingGenAIDiscord
Enterpret
Enterpret
Founding Engineer, NLP Scientist
Feb 2021 – May 2023 · Employee #6 · Pre-seed

Built semantic search, reranking, text generation, MLOps, and NLP pipelines to categorize 50K+ topics of customer feedback. Extensive work on auto-labeling and fine-tuning transformer models for enterprise NLP.

Semantic SearchFine-tuningMLOpsNLP Pipelines
Jina AI
Jina AI
AI Engineer
Sep 2020 – Dec 2020 · Employee #6 · Seed

Contributed to Jina, the open-source neural search framework powering multi-modal search at scale.

Neural SearchOpen Source
TaskHuman
TaskHuman
Principal Data Scientist
Oct 2019 – Sep 2020 · Employee #15 · Seed

Led development of core semantic search using transformers for autocompletion and search. Developed and deployed unsupervised deep learning recommendation engine. Designed personalized notification system.

TransformersElasticsearchFAISSAWS
Morningstar
Morningstar
Senior Data Scientist
Sep 2017 – Oct 2019 · 1st hire in quant research

First hire in the new quantitative research team. Led full-stack NLP with deep learning: financial document extraction, sentiment analysis (BERT/ULMFiT), fund rating with ML, and search/recommendation with Elasticsearch. 1st prize at Morningstar Hackathon.

BERTPyTorchElasticsearchFinance NLP

Building in public,
learning in public

Maxpool Community

Founded in 2019, Maxpool is a thriving community of AI engineers building production systems. Join practitioners from around the world sharing research, tools, and real-world lessons.

✍️

Writing & Newsletter

I write about AI engineering, evaluation, and the business of AI. My Substack newsletter "Pratik's Pakodas" covers tech and business insights. Deep technical dives live on galileo.ai/blog.

Let's build something
worth evaluating

Whether it is AI evaluation, production systems, speaking, or community, I am always up for a good conversation.