
SQL has been around for decades but on Databricks, it is more powerful than ever.
Today, SQL is not just a language for reporting. It’s the foundation for building data pipelines, powering AI-ready analytics, managing governance through Unity Catalog, and even performing geospatial analysis, all within the Databricks Lakehouse.
Whether you are an analyst, engineer, or business user, you already hold the key to building the foundation of your data platflorm. You don’t need to wait for months or years to start your journey, you just need to use the existing skills on SQL that you and your team already have.
Why SQL Still Matters
In most organisations, SQL is still the most widely understood language. But too often, it is limited to static dashboards and ad-hoc reports. A missed opportunity when modern platforms like Databricks have elevated what SQL can do.
On Databricks, SQL can:
- Build ETL and ELT pipelines with Delta Live Tables (DLT)
- Transform streaming and batch data in the same syntax
- Enforce governance and lineage with Unity Catalog
- Power AI/BI Dashboards and Genie natural language queries
- Support geospatial analysis natively
- Leverage AI SQL functions
The beauty is that you don’t need to learn a new language to do any of this, Databricks expands SQL’s capabilities into every stage of the data lifecycle.
The Medallion Mindset
At the heart of Databricks lies the Medallion Architecture, a layered approach to building reliable, reusable, and scalable data pipelines.
Each layer has a purpose:
🥉 Bronze Layer :
Raw, unfiltered data from multiple sources.
Use SQL ingestion or Autoloader to bring in structured and semi-structured data from APIs, S3, or Blob Storage.
🥈 Silver Layer :
Cleansed and standardised data.
Use SQL in Delta Live Tables to apply business rules, deduplicate records, and create consistent schema definitions.
🥇 Gold Layer :
Aggregated and enriched data for business consumption.
Use SQL queries to create curated datasets, UC Metric Views, and AI/BI Dashboards that drive real-time insights.
Every step, from raw ingestion to refined analytics, can be written, tested, and deployed using SQL.
The Medallion Architecture isn’t about complexity. It’s about flow and SQL is the thread that connects every layer.
From Queries to Pipelines: If you can write a SELECT statement, you can build a pipeline.
Using Databricks Delta Live Tables, data engineers and analysts can define transformations directly in SQL:
CREATE LIVE TABLE silver_sales
AS SELECT
s.id,
s.amount,
c.region,
current_timestamp() AS last_updated
FROM LIVE.bronze_sales s
JOIN LIVE.customers c ON s.customer_id = c.id
WHERE s.amount > 0;
SQL Meets Geospatial
Spatial analytics is no longer the domain of niche GIS systems.
With Databricks, geospatial data becomes first-class, directly queryable, visualisable, and scalable using SQL.
Databricks supports:
- Native GEOMETRY and GEOGRAPHY data types
- 80+ Spatial SQL functions such as ST_Intersects, ST_Distance, ST_Buffer, and ST_Contains
- H3 indexing for global hexagonal aggregation
- Integration with Felt for interactive geospatial visualisation
This means you can run advanced spatial analysis directly in SQL:
SELECT region,
COUNT(*) AS repairs,
AVG(repair_cost) AS avg_cost
FROM repairs
WHERE ST_Within(location, ST_GeomFromText('POLYGON(...)'))
GROUP BY region;
Governance Through SQL
With the introduction of Unity Catalog and UC Metric Views, governance and trust are no longer barriers, they’re enablers.
SQL can now define consistent business metrics directly in Unity Catalog, ensuring that everyone, from data scientists to executives, uses the same logic when referencing KPIs like:
Revenue
- Conversion rate
- Repair cost per region
- Average service time
These metric views are reusable, governed, and versioned ensuring “your numbers match my numbers” across the entire organisation.
And because Databricks tracks lineage, permissions, and audit logs automatically, teams can move fast without sacrificing compliance.
Real-Time Insight with Familiar Tools
SQL connects naturally with the tools business users already know:
- Databricks AI/BI Dashboards for real-time monitoring
- Genie for conversational queries (“Show me repairs in the North within 10 km of the depot”)
- AI SQL functions to analyse sentiment and sumarise documents
- Excel live connections for governed, up-to-date reporting
- Felt for interactive spatial visualisation
What used to take hours of waiting for IT or data teams can now happen instantly, within a secure, governed environment.
Building the Future with Familiar Skills
On Databricks, SQL isn’t a legacy skill, it is a launchpad for data engineering , data science and AI integration.
You already speak the language of the Lakehouse, now it’s time to use it to build your next-generation data platform.
You don’t need to reinvent yourself to thrive in the AI era. You just need to reimagine what SQL can do.
AI + SQL : From Queries to Cognitive Intelligence
The evolution of SQL on Databricks doesn’t stop at joins and aggregates, it now speaks the language of AI.
With Databricks AI Functions, you can run sentiment analysis, text summarisation, classification, and entity extraction directly in SQL. No Python notebooks, API integrations, or external LLM orchestration required.
That means analysts, marketers, or operations teams can enrich and interpret data directly where it lives, in governed Delta tables.
Sentiment Analysis using SQL
SELECT customer_id,
feedback_text,
ai_analyze_sentiment(feedback_text) AS sentiment,
ai_extract_emotions(feedback_text) AS emotions
FROM bronze_customer_feedback;
Imagine a housing association automatically detecting rising dissatisfaction in a specific area before it becomes a complaint spike.
Summarising Text at Scale
SELECT claim_id,
ai_summarize_text(claim_description) AS claim_summary,
ai_classify_topic(claim_description) AS claim_category
FROM silver_insurance_claims;
For operations teams, it means faster triage.
For housing or insurance workflows, it means instant prioritisation and routing.
Spatial Meets Semantic
AI and geospatial analysis can now work together.
SELECT region,
ai_analyze_sentiment(comments) AS sentiment,
COUNT(*) AS feedback_count
FROM silver_feedback
WHERE ST_Within(location, ST_GeomFromText('POLYGON(...)'))
GROUP BY region, sentiment.
In retail, it identifies underperforming stores.
In housing, it highlights neighbourhoods needing intervention.
In healthcare, it surfaces areas where patients express higher anxiety or satisfaction.
Executive Summaries at the Speed of Conversation
SELEC
T ai_summarize_text(
CONCAT_WS(' ', report_title, report_content)
) AS executive_summary
FROM gold_performance_reports;
The board no longer waits for analysts to brief them, the Lakehouse becomes the analyst.
AI SQL functions mark a fundamental shift in how teams interact with data.
It’s no longer about just querying information, but understanding and reasoning with it.
By embedding LLM-powered functions natively into the Lakehouse, Databricks bridges the gap between structured analytics and unstructured intelligence, unifying them under the same governance, lineage, and scalability model.
Your existing SQL workflows can now:
- Detect sentiment in customer feedback
- Summarise documents, reports, and claims
- Classify issues and extract entities automatically
- Combine location and emotion data for richer insights
- Feed results directly into AI/BI Dashboards, Genie, or downstream ML models
All through simple, declarative SQL, the same syntax your teams already use every day.
AI + SQL = the new language of intelligent analytics.
The Takeaway : SQL Is Your Superpower
You don’t need to learn ten new frameworks, you need to rediscover the one that’s been powering data for decades.
