Integrating Looker and LangChain with langchain-looker-agent

We’ve just released a prototype of a new Langchain tool that enables LangChain applications and agentic workflows to connect to Looker, via its Open SQL Interface.

In the past when we’ve worked on AI-powered chatbots, dashboard narrative generators and data apps that use LangChain to orchestrate and connect LLMs such as Open AI GPT-4o and Google Gemini to data in our data warehouse, we’d ideally like to have routed that connection through our Looker semantic layer instead rather than directly to the warehouse table schema.

 
 

When AI systems query databases directly, they’re working with raw schemas without business context. This means different applications might interpret the same data differently, leading to inconsistent results across your organization. More importantly, AI agents have to infer business logic from technical schemas, which can lead to technically correct but business-meaningless results.

Looker’s LookML semantic layer, in-contrast, provides pre-validated business logic and standardized metric definitions. Instead of asking an AI to figure out what “revenue” means from database field names, the semantic layer provides explicit business definitions and calculations, reducing the risk of hallucinations and answering the user’s query using the wrong set of metrics and dimension values.

Integrating Looker with LangChain

The prototype langchain-looker-agent we’ve developed was designed to be as familiar to work with as possible for anyone who’s used other Langchain SQL Toolkits, as shown in the code example below.

 
from langchain_looker_agent import create_looker_sql_agent, LookerSQLDatabase
from langchain_openai import ChatOpenAI

# Initialize Looker connection
looker_db = LookerSQLDatabase(
    looker_instance_url=os.getenv('LOOKER_BASE_URL'),
    lookml_model_name=os.getenv('LOOKML_MODEL_NAME'),
    client_id=os.getenv('LOOKER_CLIENT_ID'),
    client_secret=os.getenv('LOOKER_CLIENT_SECRET'),
    jdbc_driver_path=os.getenv('JDBC_DRIVER_PATH')
)

# Create the agent
llm = ChatOpenAI(model="gpt-4o", temperature=0)
toolkit = LookerSQLToolkit(db=looker_db, llm=llm)
agent = create_looker_sql_agent(llm=llm, toolkit=toolkit)

The langchain-looker-agent tool maintains conversation memory, supports follow-up questions, and includes explore discovery capabilities. Importantly, all responses are grounded in your validated business logic rather than AI interpretations of raw data.

Key components within the langchain-looker-agent package are:

  • LookerSQLDatabase: A custom class that wraps the Looker JDBC connection, mimicking the interface of LangChain’s SQLDatabase utility. It handles connecting to Looker, fetching metadata (Explores as tables, Fields as columns) and executing SQL queries. It’s specifically designed to work with Looker’s SQL dialect (Calcite) including backticked identifiers and AGGREGATE() for measures.

  • LookerSQLToolkit: A LangChain toolkit that bundles database interaction tools (list tables/Explores, get schema, query database) using the LookerSQLDatabase.

  • create_looker_sql_agent: A factory function to create a LangChain ReAct agent pre-configured with the LookerSQLToolkit and a specialized system prompt guiding the LLM on Looker’s specific SQL syntax and data structures.

The Github repo that accompanies the PyPI package download comes with quickstart example Python code and two Jupyter notebooks that provide further code examples that will help you get started, and we’ve also put-together an example Looker Chatbot web application in another Github repo to show the tool being used in a more complex use-case.

 
 

How Does the Integration Work?

As the Looker Open SQL Interface uses JDBC to connect to Looker it requires a Java JDK to be downloaded and installed along with the Avatica/Calcite JDBC driver JAR file that’s hosted for download on the Looker open-source Github repository.

 
sudo apt-get update
sudo apt-get install -y openjdk-11-jdk --no-install-recommends
java -version # Verify installation
export JAVA_HOME=$(java -XshowSettings:properties -version 2>&1 > /dev/null | grep 'java.home' | awk '{print $3}')
echo $JAVA_HOME
mkdir drivers
cd drivers
wget https://github.com/looker-open-source/calcite-avatica/releases/download/avatica-1.26.0-looker/avatica-1.26.0-looker.jar

An important consideration to note with this dependency on a Java JDK is that it precludes the use of serverless environments such as Google Cloud Run Functions to run this package, due to the restrictions they place on installing third-party packages such as a Java JDK.

Our recommended approach instead is to use Cloud Run that provides containerized environments where you can install multiple language runtimes within a single container, and allows you to create a Docker image that includes both Python for your application code and Java for the JDBC driver requirements, as we do when deploying AI apps using the langchain-looker-agent package:

 
FROM python:3.10-slim
RUN apt-get update && \
    apt-get install -y wget openjdk-17-jre-headless && \
    rm -rf /var/lib/apt/lists/*
ENV JAVA_HOME=/usr/lib/jvm/java-17-openjdk-amd64
WORKDIR /app
COPY cloud-function/requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
RUN mkdir drivers && \
    cd drivers && \
    wget https://github.com/looker-open-source/calcite-avatica/releases/download/avatica-1.26.0-looker/avatica-1.26.0-looker.jar
ENV LOOKER_JDBC_DRIVER_PATH=/app/drivers/avatica-1.26.0-looker.jar
COPY cloud-function/ .
ENV PORT=8080
CMD ["functions-framework", "--target=hello_http", "--host", "0.0.0.0", "--port", "8080"]

Why Connect Looker to LangChain and its EcoSystem?

Like the forthcoming official Conversational Analytics API coming soon from Google, this new AI integration for Looker enables businesses build AI applications that maintain conversation memory across interactions and handle cyclical reasoning patterns that go far beyond simple question-and-answer scenarios.

But by integrating with LangChain’s extensive library of over 700 integrations with external tools, databases, and APIs along with enterprise-grade features like LangSmith for observability and evaluation and LangGraph for more complex agentic workflows, it also now opens-up use-cases such as:

  • Multi-Agent Analytical Workflows: Build sophisticated data analysis agents, for example using the open-source agentic orchestration tool n8n, that can coordinate multiple specialized sub-agents (data validators, trend analyzers, report generators) working together on complex analytical tasks, rather than limiting interactions to single-query responses

  • Autonomous Decision-Making Pipelines: You can create agents that can automatically trigger business actions based on Looker insights, such as inventory reordering systems that analyze sales trends and automatically generate purchase orders through connected ERP systems

  • Cross-Platform Data Orchestration: You could also combine Looker’s semantic layer with data from CRM systems, external APIs, and real-time feeds to create comprehensive business intelligence agents that provide holistic insights rather than siloed analytics

  • Enterprise-Grade Scalability: Importantly, you’ll be able to leverage LangChain’s production-ready infrastructure for deploying conversational analytics across thousands of users while maintaining governance, monitoring, and evaluation capabilities through LangSmith

INTERESTED? FIND-OUT MORE

The langchain-looker-agent, available now as a preview/prototype release suitable for testing and non-production use-cases as of today, is available as a pip-installable package on PyPI and includes comprehensive documentation on GitHub. You’ll need Looker API credentials, a JDBC driver, and your LookML models properly configured.

The source code for the langchain-looker-agent tool is available for access and cloning on our Github site, along with the source code for a demo chatbot app built using Python, Replit and the langchain-looker-agent tool.

If you’re interested in implementing AI-powered analytics with semantic layer integration using Looker or Cube, we’d be happy to help. Rittman Analytics is a boutique data analytics consultancy that helps ambitious, digital-native businesses scale-up their approach to data, analytics and generative AI.

We’re authorised delivery partners for Google Cloud along with Oracle, Segment, Cube, Dagster, Preset, dbt Labs and Fivetran and are experts at helping you design an analytics solution that’s right for your organisation’s needs, use-cases and budget and working with you and your data team to successfully implement it.

If you’re looking for some help and assistance with your AI initiative or would just like to talk shop and share ideas and thoughts on what’s going on in your organisation and the wider data analytics world, contact us now to organise a 100%-free, no-obligation call — we’d love to hear from you!

Mark Rittman

CEO of Rittman Analytics, host of the Drill to Detail Podcast, ex-product manager and twice company founder.

https://rittmananalytics.com
Next
Next

Introducing Conversational Analytics, Data Agents and Code Interpreter in Looker and Looker Studio