Why I Built SentinelFinance

AI Systems Financial AI RAG LangGraph

Most financial AI demos sound smart until you ask a question that actually matters. They can summarize a tax article, explain what SIP means, or talk confidently about diversification. But the moment you ask, "Can I afford this loan based on my salary, expenses, and current obligations?" the cracks show. Generic chatbots are great at sounding informed; they are much less reliable at grounding advice in your real numbers.

SentinelFinance started from that gap. I wanted to build an AI personal financial adviser that could do three things well at the same time: understand a user's question, retrieve evidence from both financial knowledge bases and the user's own uploaded documents, and compute outputs like EMI, SIP growth, tax impact, or ROI deterministically instead of guessing. That sounds straightforward until you try to make all three happen in one system without the answer becoming vague, inconsistent, or invented.

The result is SentinelFinance: an AI-powered financial adviser built on LangGraph, with a FastAPI backend, MySQL for users and chat history, FAISS for retrieval, Tavily and yfinance for current data, and a sandboxed Python math tool for calculations that should never be left to model intuition.

The Core Problem: Financial Advice Needs More Than Fluent Text

The main design constraint behind this project was simple: in finance, a polished answer is not enough. A useful system has to know when to retrieve, when to calculate, when to ask for clarification, and when it does not have enough evidence yet. That is why I did not want a single prompt doing everything in one pass.

In practice, user questions mix several jobs together. A prompt like "Should I prepay my home loan or increase my SIP?" requires knowledge retrieval, personalized context, calculations, and a final recommendation that compares tradeoffs. If one model tries to do all of that at once, it tends to either over-calculate with bad assumptions or over-explain without actually deciding anything.

To make that work reliably, I split the system into a small pipeline of specialized agents. Each one handles a specific part of the problem, and together they turn a broad financial question into something the system can research, calculate, and answer more cleanly.

What Each Agent Actually Does

Router. The router analyzes intent and decides whether the query needs research, a calculation, or both. Its job is to route the request to the right agents so the rest of the system can focus on specialized work.

Researcher. The researcher handles RAG and web search. It looks across uploaded documents through FAISS, the financial knowledge base, and the live web through Tavily for prices and market data, with yfinance in the research stack as well.

Analyst. The analyst handles the calculation side. The LLM writes the formula and Python executes it exactly through the Python Math REPL, for things like EMI, SIP future value, tax liability, and ROI.

Strategist. The strategist combines research, calculations, and the user's profile into the final advice. This is the part of the system that turns all of that evidence into a direct, opinionated recommendation using the user's actual numbers.

Why the Math Tool Matters So Much

This is probably the most important architectural decision in the entire project. Large language models are strong at interpreting intent and generating explanations, but finance is full of places where approximate reasoning is unacceptable. If a system miscalculates loan affordability, tax impact, or investment growth, the answer is not just slightly wrong. It becomes operationally misleading.

SentinelFinance handles this by running calculations in a restricted Python execution environment. The math tool supports safe built-ins and exposes modules like math, with optional support for numpy and pandas when available. That means the LLM can generate code such as an EMI formula or SIP projection, but the execution happens in a controlled runtime that returns deterministic output and captures failures cleanly.

I think this pattern is useful beyond finance too. Whenever an AI system needs to combine language understanding with arithmetic, simulation, or structured logic, it is safer to let the model choose the operation while delegating the actual computation to a tool that can be inspected and tested.

Using User Data

Another part of the project I cared about was personalization. A lot of AI finance products claim to be personalized, but they are really just templated advice with a few variables inserted. SentinelFinance is built to work from a user's actual profile, uploaded documents, and prior context.

The knowledge side uses FAISS with Hugging Face embeddings to retrieve general finance information such as tax rules, investment principles, and planning frameworks. On top of that, each user can have their own indexed document set. So the answer can combine general financial knowledge with specific evidence from that user's salary slip, tax return, or expense sheet.

This also creates a better interaction model. Instead of asking the user to re-enter all of their numbers every time, the system can remember profile information, inspect stored documents, and only ask follow-up questions when something is genuinely missing.

Real-Time Research

Finance is one of those domains where static knowledge is never enough. Some questions depend on enduring principles, but others depend on live context: gold rates, stock prices, current deposit rates, or product availability. SentinelFinance handles that through a search tool that uses yfinance for market data and Tavily for broader web search.

What I like about this setup is that the system does not treat retrieval as one monolithic thing. Personal documents, financial knowledge, and live market information are all different evidence sources with different trust profiles. Keeping them separate inside the workflow makes it easier to reason about what the final answer is actually based on.

What Was Harder Than Expected

The hardest part was not wiring up an LLM or even getting LangGraph running. The difficult part was the handoffs. Multi-agent systems look clean on diagrams, but in code every transition creates a quality problem. Did the router send the query down the right path? Did the researcher retrieve signal or just more text? Did the analyst calculate the right thing, or just something that looked plausible? Did the strategist stay grounded in evidence instead of smoothing over gaps with polished language?

I also found that prompting for JSON reliably matters more than people think. Several nodes depend on parsing structured outputs from the model, so helper logic to extract and parse JSON from responses becomes part of the real system, not just prompt polish.

Retrieval quality was another real challenge. Pulling from a knowledge base, personal documents, and live market sources sounds powerful, but it also means the system can gather uneven evidence very quickly. That is why the evidence scorer and the overall workflow structure matter so much: they create a buffer between raw retrieval and the final recommendation.

What This Project Taught Me

Building SentinelFinance changed how I think about AI products. I came away with a much stronger belief that useful AI systems are really coordination systems. The model matters, but the bigger difference often comes from how you structure retrieval, tools, state, and control flow around it.

It also reinforced a practical lesson: if the answer has real-world consequences, the system should expose its reasoning ingredients. In this project that means keeping calculations, research results, and tool calls visible enough that you can debug where an answer came from.

If you want to explore the implementation, the project is on GitHub.