If Knowledge Catalog builds the semantic map of an enterprise's data automatically, who owns that map, and what happens to it if the enterprise decides to leave Google Cloud?

This question is explored in depth in the article "Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI" on TechFastForward.

Google's cross-cloud access eliminates egress fees but routes through Google's interconnect infrastructure: is that a genuine reduction in vendor dependency, or a restructuring of it?

This question is explored in depth in the article "Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI" on TechFastForward.

For enterprises that have built significant data engineering investment around Snowflake's Polaris Catalog or Databricks' Unity Catalog, what would it take for the Agentic Data Cloud to be worth a platform migration?

This question is explored in depth in the article "Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI" on TechFastForward.

Back to feed

Big Tech

Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI

Google Cloud's Apache Iceberg Cross-Cloud Lakehouse and AI Knowledge Catalog eliminate data egress fees and enable AI agents across AWS, Azure, and GCP.

ByTFF Editorial15 hours ago11 min read

Share:X LinkedIn

Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI

Key Takeaways

Cross-Cloud Lakehouse uses Apache Iceberg REST Catalog over Google's Cross-Cloud Interconnect to enable zero-copy BigQuery queries against AWS S3 Iceberg tables, with Azure support coming H2 2026.
Named integrations cover Databricks, Palantir, Salesforce Data360, SAP, ServiceNow, Snowflake, and Workday, making the lakehouse relevant to most large enterprise data architectures without bespoke integration.
Knowledge Catalog auto-catalogs data relationships across the entire enterprise estate, eliminating months of manual curation that currently blocks enterprise AI agent deployment at scale.
Enterprise data egress fees can run $20 million to $40 million annually for large data estates, making zero-copy cross-cloud queries a direct cost reduction play as well as an AI capability enabler.
The Gemini Enterprise Agent Platform provides the orchestration layer on top of the Agentic Data Cloud, enabling data scientists to build agents across the full lakehouse via the Data Agent Kit without infrastructure code.

Enterprises pay an estimated $4 billion per year in data egress fees, moving information between cloud providers so their applications can see it all in one place. Google just made that problem optional. The Agentic Data Cloud announced at Google Cloud Next '26 lets BigQuery query tables sitting in AWS S3 without moving a single byte, using Apache Iceberg as the shared format and Google's own Cross-Cloud Interconnect as the network layer. The announcement was underreported the week it dropped, buried under flashier Gemini demos. That was a mistake.

What Actually Happened

At Google Cloud Next '26, Google unveiled the Agentic Data Cloud as a new architectural framework built specifically for the speed and scale that AI agents require. It has two core components: the Cross-Cloud Lakehouse and the Knowledge Catalog. The Cross-Cloud Lakehouse is standardized on Apache Iceberg and integrates directly with Google's Cross-Cloud Interconnect at the network level, providing dedicated, high-speed private connections between Google Cloud and other cloud environments. The practical result: BigQuery and Managed Service for Apache Spark can query AWS Iceberg tables at scale with low latency, without paying data egress fees, and without copying data into Google Cloud. Azure support is on the roadmap for H2 2026.

The integrations list is comprehensive and deliberate. Google has pre-built connectors to Databricks, Palantir, Salesforce Data360, SAP, ServiceNow, Snowflake, and Workday, covering the major enterprise data platforms where production data actually lives. The architecture uses the Apache Iceberg REST Catalog as the interoperability layer, meaning any system that speaks Iceberg can participate without bespoke integration work. For enterprises that have spent years building multi-cloud data architectures out of necessity rather than preference, this is the first credible offer of unified query access without the cost of data centralization.

The Knowledge Catalog is the second pillar and the less-discussed one. It auto-catalogs all Google Cloud and third-party data assets, tagging entities and mapping relationships across the entire enterprise data estate without manual curation. The catalog provides what Google calls "grounded enterprise truth" for AI agents, meaning an agent querying company data doesn't just get a result back; it gets a result with the company-specific context, terminology, and data lineage required to act on that result correctly. A Data Agent Kit for Gemini-powered data science authoring rounds out the Agentic Data Cloud, giving analysts a way to build data agents that operate across the full lakehouse without writing infrastructure code.

Stay Ahead

Get daily AI signals before the market moves.

Join founders, investors, and operators reading TechFastForward.

Why This Matters More Than People Think

Enterprise AI agents are failing in production at a rate that is not making headlines. The failure mode is not hallucination, though that gets the coverage. The dominant failure mode is data blindness: agents that can only see the slice of company data that lives in the system they were deployed in, producing outputs that are locally coherent but globally wrong because they can't see what's in the other three data stores. A customer service agent that can't see the CRM record because the CRM is in Salesforce and the agent is running on AWS. A financial agent that misses the SAP ledger because it was trained on BigQuery data. The cross-cloud lakehouse directly addresses this problem by making the data boundary between cloud providers a routing decision rather than a hard wall.

The egress fee economics reinforce why this matters. A company running 10 petabytes of queries per month between AWS and Google Cloud can spend $20 million to $40 million annually in egress fees alone, depending on query patterns and transfer volumes. Those fees don't buy any capability. They're a tax on having data in more than one place, which describes every enterprise that has existed since 2015. Eliminating that tax at the network level changes the build-versus-consolidate calculus for enterprise data teams. Teams that were considering expensive and risky data migration projects to centralize everything on one cloud can now simply query across clouds, which is both cheaper and less operationally risky than moving petabytes of production data.

The Competitive Landscape

The multi-cloud data access market has been contested for several years, and Google is not the first to plant a flag here. Snowflake launched Polaris Catalog in 2025, also built on Apache Iceberg, as an open catalog that competes directly with Google's cross-cloud approach. Databricks has Unity Catalog with Delta Lake, plus growing Apache Iceberg compatibility, and its Lakehouse Federation feature already enables queries across Snowflake, Redshift, BigQuery, MySQL, and PostgreSQL without data movement. Microsoft Fabric's OneLake architecture also offers multi-cloud access, though with a stronger gravitational pull toward Azure. The market Google is entering is genuinely competitive, and several of the named integration partners, Snowflake and Databricks in particular, are simultaneously partners in the integration list and competitors in the data platform market.

The strategic difference between Google's approach and the field is the network layer. Snowflake's Polaris Catalog and Databricks' Lakehouse Federation both rely on the public internet or customer-managed VPN connections for cross-cloud data access. Google's Cross-Cloud Interconnect is a dedicated, private fiber network with guaranteed bandwidth and latency, which is why the lakehouse can promise low-latency query performance rather than just theoretical connectivity. This network advantage is real and hard to replicate: neither Snowflake nor Databricks owns physical network infrastructure between cloud regions. Google, AWS, and Microsoft do. The question is whether Google will use that network advantage to lock customers into Google infrastructure or to genuinely enable portability, and those two outcomes are harder to distinguish from the outside than the marketing language suggests.

Hidden Insight: The Catalog Is the Moat

The Knowledge Catalog is the announcement that will matter most in three years, and almost nobody is writing about it. Here is the reason: an AI agent operating on enterprise data is only as good as its understanding of what that data means. Column names don't explain business logic. Table schemas don't capture the difference between a "closed" opportunity in the CRM that a sales rep marked closed versus one that finance actually recognized as revenue. The semantic context, the relationships, the terminology, the lineage, the business rules that determine what the numbers mean, is the hardest part of enterprise AI deployment, and it currently requires months of manual curation by data engineering teams before an AI agent can be trusted with any consequential query.

Knowledge Catalog automates this. It builds the semantic map of the enterprise data estate automatically, using Google's AI to infer relationships and context from the data itself. If this works at the accuracy Google is claiming, it compresses the most expensive and time-consuming phase of enterprise AI deployment from months to days. Every AI agent deployed on top of the Agentic Data Cloud inherits this context layer automatically, which means the value compounds: each new agent built on the catalog benefits from the semantic work done for every previous agent. The company that controls the semantic layer for an enterprise's data controls what its AI agents can understand, which is arguably a stronger position than controlling the data storage itself.

The historical parallel is search. Google built its consumer dominance on the semantic understanding of web content, not just the crawling and indexing of it. PageRank was about understanding relationships between documents. Knowledge Catalog is attempting something similar for enterprise data: not just cataloging what exists, but understanding how the pieces relate to each other and what they mean in context. If Google succeeds in making the Knowledge Catalog the default semantic layer for enterprise AI, it creates a data network effect that becomes harder to displace with every new data source connected and every new agent deployed.

What to Watch Next

The Azure support announcement, promised for H2 2026, is the first critical milestone. AWS cross-cloud access is valuable for enterprises that are AWS-primary, but Azure is the dominant enterprise cloud platform by revenue, with a client base that skews toward Fortune 500 companies with the largest data estates and the highest AI investment budgets. When Azure support ships, the Agentic Data Cloud becomes relevant to the majority of enterprise data conversations rather than a subset of them. Watch whether Microsoft responds by accelerating Fabric OneLake's cross-cloud access features, which would signal that the market has accepted cross-cloud query access as table stakes rather than a differentiator.

Skeptics point out, however, that Google's cross-cloud solution still routes data through Google's Cross-Cloud Interconnect infrastructure, which is not free. The egress fee elimination is real for data movement, but enterprises that build on the Cross-Cloud Lakehouse are still paying for Google's interconnect bandwidth, which Google prices and controls. The risk is that enterprises adopt the platform believing they've achieved true cloud neutrality, only to find they've shifted their vendor dependency from AWS egress pricing to Google interconnect pricing. Watch the pricing disclosures that accompany the Azure support launch for evidence of whether Google is pricing interconnect bandwidth competitively or using the market position to extract the same economics that cloud egress fees currently represent. Also monitor whether Snowflake and Databricks build native Apache Iceberg REST Catalog interoperability with Google's cross-cloud system in the next 180 days: if they do, it signals the open-standard ecosystem is winning. If they don't, it signals that the market expects Google's approach to remain proprietary despite the open-standard framing.

Google didn't solve the multi-cloud data problem by making clouds disappear: it made cloud boundaries invisible at query time, which is the only solution enterprises will actually deploy.

Key Takeaways

Cross-Cloud Lakehouse uses Apache Iceberg REST Catalog over Google's Cross-Cloud Interconnect to enable zero-copy BigQuery queries against AWS S3 Iceberg tables, with Azure support coming H2 2026.
Named integrations cover the major enterprise data platforms: Databricks, Palantir, Salesforce Data360, SAP, ServiceNow, Snowflake, and Workday, making the lakehouse relevant to most large enterprise data architectures without bespoke integration work.
Knowledge Catalog auto-catalogs data relationships across the entire estate, eliminating the months of manual curation that currently block enterprise AI agent deployment and creating a compounding semantic advantage with each additional data source.
Enterprise data egress fees can run $20 million to $40 million annually for large data estates, making the cross-cloud query capability a direct cost reduction play as well as an AI capability enabler for finance and procurement teams.
The Gemini Enterprise Agent Platform provides the orchestration layer on top of the Agentic Data Cloud, enabling data scientists to build agents that query across the full lakehouse without writing infrastructure code using the Data Agent Kit.

Questions Worth Asking

If Knowledge Catalog builds the semantic map of an enterprise's data automatically, who owns that map, and what happens to it if the enterprise decides to leave Google Cloud?
Google's cross-cloud access eliminates egress fees but routes through Google's interconnect infrastructure: is that a genuine reduction in vendor dependency, or a restructuring of it?
For enterprises that have built significant data engineering investment around Snowflake's Polaris Catalog or Databricks' Unity Catalog, what would it take for the Agentic Data Cloud to be worth a platform migration?

Newsletter

Enjoyed this analysis? Get the next one in your inbox.

Daily AI signals. No noise. Built for founders, investors, and operators.

Share:X LinkedIn

</> Embed this article

Copy the iframe code below to embed on your site:

<iframe src="https://techfastforward.com/embed/google-cloud-cross-cloud-lakehouse-ends-egress-fees-ai-2026" width="480" height="260" frameborder="0" style="border-radius:16px;max-width:100%;" loading="lazy"></iframe>

Google Cloud's Cross-Cloud Lakehouse Ends Egress Fees for AI

What Actually Happened

Why This Matters More Than People Think

The Competitive Landscape

Hidden Insight: The Catalog Is the Moat

What to Watch Next

Key Takeaways

Questions Worth Asking

Continue reading

Amazon, Google, Meta, Microsoft Bet $725B on AI in 2026

Meta AI Gets Zero-Log Mode: Not Even Meta Can See It

Google's Hope Architecture Beats Catastrophic Forgetting