Turning complex real estate data into natural language analysis

Programming Languages: Python
Natural Language Processing: LangChain, LangGraph
Large Language Models (LLMs): Gemini
LLM Evaluation & Observability: Arize Phoenix, DeepEval, LangSmith
Infrastructure: GCP
Databases and Data Warehouses: PostgreSQL, BigQuery
Containerization: Docker, Docker Compose, Kubernetes
OS: Unix

Real estate data is valuable, but often hard to work with. We've built an agentic AI interface that lets users explore complex analytics through natural language, without navigating dashboards and tables.

The Client

The client is a business intelligence platform for smarter urban growth. The platform combines location intelligence and large volumes of real estate data to help developers, retailers, and investors analyse real-world situations and make informed decisions around land and commercial property.

The Challenge: High-Value Data, High Barrier to Use

The platform already had powerful analytics and extensive datasets covering real estate markets. However, working with this information required users to navigate complex dashboards, tables and diagrams — effectively thinking like analysts.

For many clients, the main challenge wasn’t access to data, but how difficult it was to explore it. Users had to manually search for relevant indicators, understand multiple interfaces and translate their business questions into analytical queries. As a result, the barrier to entry was high, and many insights remained hard to reach without deep expertise.

The goal was to make complex real estate analytics accessible through natural language, allowing users to interact with the platform the same way they would with a real estate consultant — without simplifying the underlying logic or compromising accuracy.

The goal was to make real estate analytics accessible through natural language, allowing users to interact with the platform the same way they would with a real estate consultant — without simplifying the underlying logic or compromising accuracy.

Why a Chat Interface Wasn’t Enough

A conversational UI alone would not solve the problem. In this context, free-form text generation without control over logic and data access would introduce unacceptable risks for financial decision-making.

Traditional BI interfaces exposed data, but couldn’t support the full range of analytical paths users needed to explore — especially when questions spanned multiple datasets and dimensions.

The solution had to reliably translate natural language into structured analytical actions, not just generate convincing answers.

Our Approach: An Agentic Analytical Interface

We approached the task as an engineering problem, starting with a deep analysis of the client’s data ecosystem, analytics services and internal tooling.

Based on this, we designed an LLM-powered agent that acts as an intelligent interface between natural language input and the platform’s analytical core.

Key aspects of the implementation included:

Intent and parameter extraction

The agent interprets user queries, extracts structured parameters (location, property type, size, time range) and determines which analytical tools to invoke.
Tool selection and execution

Instead of responding directly, the agent selects and calls the appropriate analytical services, ensuring answers are grounded in actual data.
Controlled reasoning flow

Each step in the agent’s decision process is constrained, preventing irrelevant tool usage or incorrect data paths.

Evaluation and Production Stability

Accuracy was a critical requirement. A response that sounds correct is not sufficient in a real estate analytics context.

To address this, we built a dedicated evaluation pipeline that validates the agent’s behaviour at multiple levels:

correctness of parameter extraction,
correctness of tool selection,
consistency of the final analytical output.

Synthetic validation datasets were generated to simulate realistic user queries and edge cases, allowing the system to be tested statistically rather than anecdotally. This made it possible to detect errors in reasoning flow even when the final answer was phrased differently but contained the same information.

Security and Data Isolation

Because the platform operates in a corporate environment, the agent includes safeguards against prompt injection and agent-level attacks. These protections prevent users from accessing internal system details, competitor data or restricted analytical tools outside their intended scope.

The Solution: Conversational Analysis, Not Predictions

The resulting system enables users to analyse existing real estate data through natural language. For example, a user can ask:

What is the average rent for a 62 m² apartment in Helsinki?

The agent interprets the intent, extracts the relevant parameters, queries the underlying datasets and returns a structured explanation — without exposing raw tables or requiring analytical expertise.

The system focuses on analysis of current market conditions, not speculative forecasting, keeping outputs explainable and aligned with real-world decision-making.

Outcome

The agentic interface significantly lowered the barrier to working with complex real estate data. Both external users and internal teams gained a flexible way to explore information, ask nuanced questions and access insights that previously required great analytical skills.

The solution is live in production and is currently being expanded into a multi-agent system to support additional real estate use cases and user groups as the platform scales.

Smartificial