Integration Testing

This guide covers the automated integration test scenarios that validate the playbooks documented in docs/playbook.md, the setup_test_data.py helper, and the run_integration_tests.py test runner.

Recommended Testing Workflow

One command for CI/CD

For automated pipelines (PR checks, nightly runs), use make test-ci:

make test-ci

This single command runs the full pipeline in order:

Unit tests — fast Python-only checks, no cloud resources needed
Provision — fresh isolated Databricks workspace + metastore (~10–15 min)
Integration tests — all scenarios (~90 min)
Teardown — always runs, even if tests fail, so no cloud resources are left behind

Exit code is non-zero if any phase fails. Teardown is always executed.

Options:

# Use a custom credentials file (useful for CI secrets injection)
make test-ci ACCOUNT_ADMIN_ENV=/path/to/credentials.env

# Run only one scenario (faster iteration)
make test-ci SCENARIO=quickstart

# Pin a specific SQL warehouse
make test-ci WAREHOUSE_ID=abc123

GitHub Actions example:

- name: Integration tests
  run: make test-ci
  env:
    # Write the credentials file from a secret, then pass the path
    ACCOUNT_ADMIN_ENV: ${{ runner.temp }}/account-admin.env

Manual step-by-step (for local development)

Use unit tests first to catch logic bugs quickly (< 1 second, no credentials required), then provision a fresh isolated environment for integration tests to avoid stale quota counter issues in a shared metastore.

make test-unit                              # fast — pure Python, no LLM/Terraform/Databricks
       ↓ (all pass)
python scripts/provision_test_env.py provision   # ~10-15 min — creates fresh workspace + metastore
       ↓
python scripts/run_integration_tests.py          # slow — deploys real resources (~hours)
       ↓
python scripts/provision_test_env.py teardown    # wipe the environment when done

Why provision a fresh environment?

Databricks metastore-wide FGAC policy quotas use an eventually consistent counter that can lag behind actual policy deletions by several minutes. In a long-lived shared metastore, the counter can accumulate drift and incorrectly block new policy creation even when no policies actually exist. Provisioning a fresh workspace + metastore for each test run gives a clean counter that always starts at zero.

Unit Tests

The tests/ directory contains pytest-based unit tests for the core Python functions — all autofix functions in generate_abac.py and all validation functions in validate_abac.py.

Run:

# Install deps once (if not already installed)
pip install pytest python-hcl2

# Run all 60+ unit tests (~1 second, no Databricks connection needed)
make test-unit

# Or invoke pytest directly for richer output
python3 -m pytest tests/ -v
python3 -m pytest tests/test_generate_abac.py -v   # autofix functions only
python3 -m pytest tests/test_validate_abac.py -v   # validation functions only
python3 -m pytest tests/ -k "TagPolicies" -v        # filter by name

What is tested:

Test file	Functions covered
`tests/test_generate_abac.py`	`fix_hcl_syntax`, `autofix_tag_policies`, `autofix_invalid_tag_values`, `autofix_undefined_tag_refs`, `autofix_missing_fgac_policies`, `autofix_fgac_policy_count`
`tests/test_validate_abac.py`	`validate_groups`, `validate_tag_policies`, `validate_tag_assignments`, `validate_fgac_policies`, `parse_sql_functions`, `parse_sql_function_arg_counts`, `_condition_matches_tags`
`tests/test_schema_drift.py`	PII column pattern regex, env file parsing (both `uc_tables` and `genie_spaces` shapes), governed-key resolution (4-level fallback), delta merge/dedup, delta validation (reject unknown keys/values), stale assignment removal

Unit tests catch the most common failure categories without incurring the cost of a full LLM + Terraform run:

LLM output contains missing commas between HCL objects → fix_hcl_syntax
LLM uses a tag value not in the allowed list → autofix_tag_policies
LLM generates an assignment with a typo'd value → autofix_invalid_tag_values
LLM references a tag key that was never defined → autofix_undefined_tag_refs
An uncovered sensitive column is left without an FGAC policy → autofix_missing_fgac_policies
Too many FGAC policies for one catalog → autofix_fgac_policy_count

Provisioning a Fresh Integration-Test Environment

scripts/provision_test_env.py creates a brand-new serverless Databricks workspace and Unity Catalog metastore specifically for integration testing, then writes all auth.auto.tfvars files so the test runner uses that environment.

One-time setup

Run make setup from the aws/ or azure/ directory — it automatically copies the matching example file to scripts/account-admin.<cloud>.env if it does not yet exist. Then fill in your credentials:

# AWS
vi scripts/account-admin.aws.env

# Azure
vi scripts/account-admin.azure.env

The file has two sections — shared Databricks credentials and cloud-specific credentials.

Section 1 — Databricks credentials (both clouds)

Key	Where to find it
`DATABRICKS_ACCOUNT_ID`	Account Console → top-right menu → Account ID
`DATABRICKS_CLIENT_ID`	Account Console → User Management → Service Principals → `<SP>` → Application ID
`DATABRICKS_CLIENT_SECRET`	Same SP → OAuth Secrets → Generate Secret

Note: The Account Console URL differs by cloud:

AWS: https://accounts.cloud.databricks.com

Azure: https://accounts.azuredatabricks.net

Section 2 — AWS credentials (`account-admin.aws.env` only)

Key	Where to find it
`DATABRICKS_AWS_REGION`	AWS region for the new workspace (e.g. `ap-southeast-2`)
`AWS_ACCESS_KEY_ID`	AWS credentials with IAM + S3 write permissions (see below)
`AWS_SECRET_ACCESS_KEY`	Same IAM user or role
`AWS_SESSION_TOKEN`	Only needed for temporary STS credentials (see recommendation below)

AWS credential type recommendation:

Use long-lived IAM user credentials (AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY, no AWS_SESSION_TOKEN) whenever possible. Temporary STS tokens expire after 1–12 hours, which can cause teardown to fail if the full test run (~90 min + review time) outlasts the token lifetime.

Credential type	`AWS_SESSION_TOKEN` required	Expires	Recommended for
IAM user access keys	No	Never	Local dev and CI/CD
AWS SSO / `aws sso login`	Yes (auto-set by CLI)	1–8 h	Interactive use only
STS `AssumeRole`	Yes	15 min – 12 h	Short-lived pipelines

Required IAM permissions:

iam:CreateRole, iam:DeleteRole, iam:PutRolePolicy, iam:DeleteRolePolicy, iam:ListRolePolicies, iam:ListAttachedRolePolicies, iam:DetachRolePolicy, iam:UpdateAssumeRolePolicy, sts:GetCallerIdentity

Required S3 permissions:

s3:CreateBucket, s3:DeleteBucket, s3:PutPublicAccessBlock, s3:ListBucketVersions, s3:DeleteObject, s3:DeleteObjectVersion

Auto-created AWS resources:

The provision script auto-creates an S3 bucket named genie-uc-test-<aws-account-id> in the configured region. The bucket is reused across test runs and only deleted on teardown if the script created it.

Step	What the script creates
`provision`	A unique S3 prefix inside the bucket: `s3://<bucket>/genie-test-<run-id>/`
`provision`	An AWS IAM role (`genie-test-uc-role-<run-id>`) scoped to that prefix
`provision`	A Databricks storage credential backed by the IAM role
`provision`	A Databricks External Location covering `s3://<bucket>/genie-test-<run-id>/`
Integration tests	Each catalog gets its own subfolder: `.../genie-test-<run-id>/<catalog-name>/`
`teardown`	Deletes the IAM role; the metastore deletion cascades to catalogs/schemas/policies

S3 objects written during the test are not deleted by teardown — the metastore and workspace are destroyed at the Databricks layer, but the underlying S3 prefixes remain. They are cheap (a few MB of small Delta files) and isolated by run-id. Clean them up periodically with:

aws s3 rm s3://<your-bucket>/ --recursive --exclude "*" --include "genie-test-*"

Section 2 — Azure credentials (`account-admin.azure.env` only)

Key	Where to find it
`AZURE_SUBSCRIPTION_ID`	Azure Portal → Subscriptions → Subscription ID
`AZURE_RESOURCE_GROUP`	Azure Portal → Resource Groups → name (must already exist)
`AZURE_REGION`	Must match the workspace region (e.g. `eastus2`, `australiaeast`, `westeurope`)
`AZURE_TENANT_ID`	Azure Portal → Microsoft Entra ID → Overview → Tenant ID
`AZURE_CLIENT_ID`	Azure Portal → Microsoft Entra ID → App registrations → Application (client) ID
`AZURE_CLIENT_SECRET`	Same App registration → Certificates & secrets → New client secret

Note: Azure client secrets expire (default 6 months or 2 years). If teardown fails with an auth error, generate a new secret and re-run.

Required Azure RBAC roles on the resource group:

Role	Why it's needed
`Contributor`	Create/delete storage accounts, access connectors
`Storage Blob Data Contributor`	Manage blob data in ADLS Gen2 containers
`User Access Administrator` (optional)	Assign managed-identity roles; if absent, the script falls back to your local `az login` for role assignments

Assign roles with:

az role assignment create \
  --assignee <AZURE_CLIENT_ID> \
  --role "Contributor" \
  --scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>"

az role assignment create \
  --assignee <AZURE_CLIENT_ID> \
  --role "Storage Blob Data Contributor" \
  --scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>"

Auto-created Azure resources:

Step	What the script creates
`provision`	An ADLS Gen2 storage account (`genietest<run-id>`) with a blob container
`provision`	A Databricks Access Connector with a managed identity
`provision`	A Databricks storage credential backed by the Access Connector
`provision`	A Databricks External Location pointing to the ADLS container
`teardown`	Deletes all Azure resources (storage account, access connector, role assignments)

Provision

python scripts/provision_test_env.py provision

This will:

Look up the SP's SCIM identity in the Databricks account.
Create a serverless workspace (genie-test-<id>).
Create a fresh Unity Catalog metastore with a unique storage path.
Assign the metastore to the workspace.
Create cloud-specific storage infrastructure:
- AWS: An IAM role scoped to the test S3 prefix, registered as a storage credential
- Azure: An ADLS Gen2 storage account + Access Connector with managed identity, registered as a storage credential
Create an External Location so catalogs can be created without a metastore root.
Set the SP as metastore admin and workspace admin.
Write auth.auto.tfvars for all env directories (dev, bu2, prod, account).
Save a state file (scripts/.test_env_state.json) for teardown.

Workspace provisioning typically takes 10–15 minutes.

Run tests

# Check what environment is provisioned
python scripts/provision_test_env.py status

# Run all scenarios (warehouse is auto-detected from the workspace)
python scripts/run_integration_tests.py

# Run a specific scenario
python scripts/run_integration_tests.py --scenario quickstart

Tear down

python scripts/provision_test_env.py teardown

This deletes cloud-specific resources, the workspace, metastore (and all catalogs/schemas/policies inside it), admin group, and removes the generated auth.auto.tfvars files.

AWS: Deletes the IAM role created during provisioning. S3 objects remain (see cleanup note above).
Azure: Deletes the storage account, access connector, and any role assignments.

If teardown fails with an auth error:

AWS — "ExpiredToken": Your STS session token expired during the test run. Export fresh credentials and re-run:
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...   # omit if using long-lived keys
python scripts/provision_test_env.py teardown
Azure — "ClientSecretExpired" or "InvalidAuthenticationToken": Your Azure client secret has expired. Generate a new secret in the Azure Portal (Microsoft Entra ID → App registrations → Certificates & secrets), update account-admin.azure.env, and re-run teardown.

The Databricks workspace and metastore are always deleted by teardown regardless of whether the cloud resource cleanup succeeds.

Options

Flag	Description
`--env-file PATH`	Path to credentials file (default: `scripts/account-admin.<cloud>.env`)
`--dry-run`	Print what would happen without creating/deleting anything
`--force`	With `provision`: overwrite an existing provisioned environment

Scenarios

scripts/run_integration_tests.py runs each playbook.md scenario end-to-end with full data setup, LLM generation, Terraform apply, assertions, and teardown. Each scenario is isolated — state from a previous run is destroyed and cleaned before the next one starts.

Scenario	playbook.md section	What it validates
quickstart	§ 1	Single Genie Space backed by a single UC catalog (`dev_fin`)
multi-catalog	§ 1 (multi-catalog)	One Genie Space drawing tables from two catalogs (`dev_fin` + `dev_clinical`)
multi-space	§ 1 (multi-space)	Two Genie Spaces with separate catalogs — Finance Analytics + Clinical Analytics
per-space	§ 4	Add Clinical Analytics incrementally without touching Finance Analytics (isolation guarantee)
promote	§ 5	Full dev → prod promotion with catalog remapping across both spaces
multi-env	§ 6	Two independent envs on the same account: `dev` (Finance), `bu2` (Clinical)
attach-promote	§ 3	Import a Genie Space already configured in the UI — govern it, then promote to prod
self-service-genie	§ 7	Central governance team + two BU Genie teams self-serve; second BU isolation check; BU promote to prod via `apply-genie`; governance state verified unchanged throughout
abac-only	§ 2	ABAC governance only (no Genie Space) + §2→§4 upgrade path: add Genie Space later without disturbing governance
multi-space-import	§ 3 (multi-space)	Import two UI-configured Genie Spaces in one `make generate`; assert both configs present, Terraform creates no new spaces
schema-drift	—	Detects and classifies new columns after initial ABAC deployment; tests `make audit-schema` and `make generate-delta` across ADD/DROP/RENAME COLUMN scenarios
genie-only	§ 7 (genie_only)	Minimal-privilege SP (workspace USER + SQL entitlement) creates Genie Space with `genie_only=true`; no account-level resources
genie-import-no-abac	§ 3 + § 7	Import an existing Genie Space and deploy to prod without any ABAC governance — validates the genie-only import-to-prod workflow when a separate team manages ABAC centrally
country-overlay	—	Country/region overlays (ANZ, IN, SEA) — full cycle per region + multi-region generation
industry-overlay	—	Industry overlays (financial_services, healthcare, retail) — full cycle per industry + multi-industry + country+industry composition (COUNTRY=ANZ INDUSTRY=healthcare)
aus-bank-demo	—	Australian bank demo — champion flow (ANZ + financial_services, import + promote with `dev_bank`→`prod_bank` catalog remap)
india-bank-demo	—	India bank demo — champion flow (IN + financial_services, Aadhaar/PAN/GSTIN/UPI masking, import + promote with `dev_lakshmi`→`prod_lakshmi`)
asean-bank-demo	—	ASEAN bank demo — champion flow (SEA + financial_services, 6 nullable national ID columns, multi-currency, import + promote with `dev_asean_bank`→`prod_asean_bank`)

Test Catalogs

setup_test_data.py creates the following Unity Catalog resources:

Dev

Catalog	Schema	Table	Rows	Sensitive data
`dev_fin`	`finance`	`customers`	10	SSN, DOB, email, phone (PII)
`dev_fin`	`finance`	`transactions`	15	AML flag, risk score (AML)
`dev_fin`	`finance`	`credit_cards`	10	card number, CVV (PCI)
`dev_clinical`	`clinical`	`patients`	10	SSN, DOB, insurance ID (PHI)
`dev_clinical`	`clinical`	`encounters`	12	diagnosis, treatment notes (PHI)

Prod (mirror schema, different data)

Catalog	Schema	Table	Rows
`prod_fin`	`finance`	`customers`	10
`prod_fin`	`finance`	`transactions`	15
`prod_fin`	`finance`	`credit_cards`	10
`prod_clinical`	`clinical`	`patients`	10
`prod_clinical`	`clinical`	`encounters`	12

Quick Start

Prerequisites

envs/dev/auth.auto.tfvars configured with workspace credentials
A SQL warehouse available in the workspace (or pass WAREHOUSE_ID=<id> to avoid cold-start delay)

Run all scenarios

# Run all six scenarios sequentially (full teardown after each)
make test-all

# Keep data and Terraform resources after the run for inspection
make test-all KEEP_DATA=1

# Pin a warehouse to avoid cold-start delay
make test-all WAREHOUSE_ID=abc123ef

Run a single scenario

make test-quickstart
make test-multi-catalog
make test-multi-space
make test-per-space
make test-promote
make test-multi-env
make test-attach-promote
make test-self-service-genie
make test-abac-only
make test-multi-space-import
make test-genie-only
make test-genie-import-no-abac
make test-country-overlay
make test-industry-overlay
make test-aus-bank-demo
make test-india-bank-demo
make test-asean-bank-demo

# All targets accept WAREHOUSE_ID= and KEEP_DATA=1
make test-promote WAREHOUSE_ID=abc123ef KEEP_DATA=1

Run directly with Python

# List available scenarios
python scripts/run_integration_tests.py --list

# Run all
python scripts/run_integration_tests.py

# Run one scenario
python scripts/run_integration_tests.py --scenario quickstart
python scripts/run_integration_tests.py --scenario per-space --keep-data

# Pin a warehouse
python scripts/run_integration_tests.py --warehouse-id abc123ef

# Non-default auth file
python scripts/run_integration_tests.py --auth-file envs/dev/auth.auto.tfvars

Scenario Details

1. quickstart — Single space, single catalog

Validates the core quickstart from docs/playbook.md § 1 with a single Genie Space backed by dev_fin.

Steps:

Step	Action
1	Create `dev_fin` test catalogs and sample data
2	Configure `dev` env: one space "Finance Analytics" with `dev_fin.*` tables
3	`make generate ENV=dev` — LLM generates ABAC config and masking functions
4	Assert `generated/abac.auto.tfvars` and `generated/spaces/finance_analytics/` created
5	`make apply ENV=dev` — deploys account, data_access, workspace layers
6	Assert `.genie_space_id_finance_analytics` file exists
7	`setup_test_data.py --verify` — row counts, column tags, column masks
8	Teardown data + Terraform resources

Key assertions:

generated/abac.auto.tfvars contains Finance Analytics genie_space_configs entry
generated/spaces/finance_analytics/abac.auto.tfvars exists (per-space dir bootstrapped)
.genie_space_id_* file created after apply
Row counts ≥ expected for all dev_fin tables
Column tags and masking policies applied

2. multi-catalog — One space spanning two catalogs

Validates the "single space spanning multiple catalogs" pattern from playbook.md § 1. One space ("Combined Analytics") draws tables from both dev_fin and dev_clinical.

Key assertions:

generated/abac.auto.tfvars contains Combined Analytics and references both dev_fin and dev_clinical
Only one Genie Space deployed
Column tags applied across both catalogs

3. multi-space — Two spaces, separate catalogs

Validates the two-space multi-catalog flow from playbook.md § 1. Finance Analytics uses dev_fin; Clinical Analytics uses dev_clinical. This is the core of the original make integration-test flow.

Key assertions:

generated/abac.auto.tfvars contains both Finance Analytics and Clinical Analytics entries
generated/spaces/finance_analytics/ and generated/spaces/clinical_analytics/ both bootstrapped
Two .genie_space_id_* files created
Row counts and ABAC verified for both catalogs

4. per-space — Incremental space addition (isolation test)

Validates the per-space generation isolation guarantee from playbook.md § 4.

Phase 1: Deploy Finance Analytics only.

Phase 2: Add Clinical Analytics using make generate SPACE="Clinical Analytics" — without triggering a full LLM re-run over Finance Analytics.

Key assertions:

After full generate: Finance Analytics in assembled output, Clinical Analytics absent
generated/spaces/finance_analytics/abac.auto.tfvars content is byte-for-byte unchanged after the per-space generate for Clinical Analytics
Assembled generated/abac.auto.tfvars contains both spaces after merge
generated/spaces/clinical_analytics/abac.auto.tfvars created by SPACE= generate
Both Genie Spaces deployed after final apply

5. promote — dev → prod cross-env promotion

Validates the full dev → prod promotion from playbook.md § 5.

Catalog mapping: dev_fin → prod_fin, dev_clinical → prod_clinical

Steps:

Step	Action
1	Create dev + prod test catalogs
2	Configure dev: two spaces (Finance + Clinical)
3	`make generate ENV=dev` + `make apply ENV=dev`
4	Verify dev data + ABAC
5	`make promote SOURCE_ENV=dev DEST_ENV=prod DEST_CATALOG_MAP=dev_fin=prod_fin,dev_clinical=prod_clinical`
6	`make apply ENV=prod`
7	`setup_test_data.py --verify-prod`
8	Teardown both envs

Key assertions:

envs/prod/env.auto.tfvars written by promote with prod_fin catalog references
envs/prod/generated/abac.auto.tfvars contains remapped prod catalog names
Prod column tags and masking policies applied

6. multi-env — Two independent envs (BU scenario)

Validates the independent second environment from playbook.md § 6.

dev env: Finance Analytics backed by dev_fin
bu2 env: Clinical Analytics backed by dev_clinical
Both envs use the same Databricks workspace and account
Each has its own make generate + make apply cycle with completely separate generated config and Terraform state

Key assertions:

dev/generated/abac.auto.tfvars contains Finance Analytics, not Clinical Analytics
bu2/generated/abac.auto.tfvars contains Clinical Analytics, not Finance Analytics
envs/dev/terraform.tfstate and envs/bu2/terraform.tfstate exist and differ
Finance Analytics Genie Space deployed in dev, Clinical Analytics deployed in bu2

7. attach-promote — Attach to UI-created space and promote to prod

Validates the "Import an existing Genie Space" flow from playbook.md § 3, combined with a dev → prod promotion. This is the adoption story: a data team already built a Genie Space in the Databricks UI and now wants to bring it under ABAC governance.

Phase 1 — Simulate UI configuration:

A Finance Analytics Genie Space is created directly via the Genie REST API (POST /api/2.0/genie/spaces) with dev_fin tables. This represents the space a data team built in the UI before this tool was adopted.

Phase 2 — Attach with explicit uc_tables:

env.auto.tfvars is configured with genie_space_id and explicit uc_tables. The Genie API's serialized_space field is not immediately available for newly-created spaces (async processing, can take several minutes), so the test provides the table list directly rather than relying on auto-discovery. This simulates the playbook.md manual step: the user runs make generate (which logs discovered tables), then copies them into data_access/env.auto.tfvars.

make generate imports the space's config (instructions, benchmarks, sample questions) verbatim from the Genie API response — not re-generated by the LLM. The ABAC governance (groups, tag policies, masking functions) is generated fresh from the table DDLs.

Phase 3 — Apply:

make apply deploys ABAC governance (group ACLs, column tags, masking functions, FGAC policies) without creating or deleting the Genie Space. Terraform operates only on existing_spaces resources — no genie_space_create provisioner runs.

Phase 4 — Promote:

make promote SOURCE_ENV=dev DEST_ENV=prod DEST_CATALOG_MAP=dev_fin=prod_fin followed by make apply ENV=prod applies the same governance to prod.

Steps:

Step	Action
1	Create `dev_fin` + `prod_fin` test catalogs
2	Create a Genie Space via `POST /api/2.0/genie/spaces` (simulating UI setup)
3	Configure `dev` env: `genie_space_id = "<id>"`, no `uc_tables`
4	`make generate ENV=dev` — discovers tables from Genie API, generates ABAC
5	Assert generated config references `dev_fin` catalog and `Finance Analytics`
6	Update `env.auto.tfvars` with discovered `uc_tables` (simulating playbook.md manual step)
7	`make apply ENV=dev` — applies ABAC; no new space created
8	Assert no `.genie_space_id_*` file was created (space not created by Terraform)
9	`setup_test_data.py --verify` — row counts, column tags, masks
10	`make promote ... DEST_CATALOG_MAP=dev_fin=prod_fin`
11	`make apply ENV=prod`
12	`setup_test_data.py --verify-prod`
13	Delete the UI-created space via `DELETE /api/2.0/genie/spaces/{id}` (teardown)

Key assertions:

generated/abac.auto.tfvars contains dev_fin catalog references (tables discovered from API)
generated/abac.auto.tfvars contains Finance Analytics genie_space_configs entry
No .genie_space_id_* file exists after make apply — Terraform did not create a new space
envs/prod/generated/abac.auto.tfvars contains prod_fin after promote
Column tags and masking policies applied in both dev and prod

8. self-service-genie — Central governance + BU teams self-serve Genie

Validates the self-service Genie pattern from playbook.md § 7 and self-service-genie.md.

Phase 1 — Governance team:

A governance env is set up with both dev_fin + dev_clinical table references and no genie_spaces block. make generate MODE=governance is run — only ABAC content is generated. make apply-governance applies the account and data_access layers without touching the workspace layer.

Phase 2 — BU Finance team:

A bu_fin env is set up with a Finance Analytics space pointing at dev_fin tables. make generate MODE=genie is run — only genie_space_configs is generated (no ABAC, no masking SQL). make apply-genie applies only the workspace layer and creates the Genie Space.

Phase 3 — Adding a second BU (isolation check):

A bu_clin env is set up with a Clinical Analytics space. make generate MODE=genie + make apply-genie runs for bu_clin. The test then asserts that governance/data_access/terraform.tfstate is byte-for-byte unchanged — proving that adding a second BU team has zero effect on governance state.

Phase 4 — BU Finance team promote to prod:

make promote SOURCE_ENV=bu_fin DEST_ENV=bu_fin_prod DEST_CATALOG_MAP=dev_fin=prod_fin followed by make apply-genie ENV=bu_fin_prod (not make apply). Asserts bu_fin_prod has a .genie_space_id_* file but no data_access/terraform.tfstate, and that the governance state remains unmodified.

Steps:

Step	Action
1	Create `dev_fin` + `dev_clinical` + `prod_fin` test catalogs
2	Configure `governance` env with both dev catalogs' tables (no `genie_spaces`)
3	`make generate ENV=governance MODE=governance`
4	Assert `tag_assignments` + `fgac_policies` present; `genie_space_configs` absent
5	`make apply-governance ENV=governance`
6	Assert `data_access/terraform.tfstate` exists; no `.genie_space_id_*` file
7	Configure `bu_fin` env with Finance Analytics space
8	`make generate ENV=bu_fin MODE=genie`
9	Assert `genie_space_configs` present; `tag_assignments` + `fgac_policies` absent; no `masking_functions.sql`
10	`make apply-genie ENV=bu_fin`
11	Assert `.genie_space_id_finance_analytics` file exists; no `data_access/terraform.tfstate`
12	Snapshot `governance/data_access/terraform.tfstate` content
13	Configure `bu_clin` env with Clinical Analytics space; `make generate MODE=genie` + `make apply-genie`
14	Assert `bu_clin` has `.genie_space_id_clinical_analytics`; governance state byte-for-byte unchanged
15	`make promote SOURCE_ENV=bu_fin DEST_ENV=bu_fin_prod DEST_CATALOG_MAP=dev_fin=prod_fin`
16	`make apply-genie ENV=bu_fin_prod`
17	Assert `bu_fin_prod` has `.genie_space_id_*`; no `data_access/terraform.tfstate`; governance state still unchanged
18	Teardown all envs

Key assertions:

governance/generated/abac.auto.tfvars contains tag_assignments and fgac_policies
governance/generated/abac.auto.tfvars does NOT contain genie_space_configs
governance/generated/masking_functions.sql exists
bu_fin/generated/abac.auto.tfvars contains genie_space_configs
bu_fin/generated/abac.auto.tfvars does NOT contain tag_assignments or fgac_policies
bu_fin/generated/masking_functions.sql does NOT exist (governance team owns it)
Cross-layer state isolation: governance has data_access state; BU envs have workspace state only
governance/data_access/terraform.tfstate byte-for-byte unchanged after second BU + BU prod promote

9. abac-only — ABAC governance only (no Genie Space) + upgrade path

Validates the "ABAC governance only" flow from playbook.md § 2 and the § 2 → § 4 upgrade path.

Phase 1 — ABAC-only deploy:

env.auto.tfvars is configured with uc_tables only — no genie_spaces block. Plain make generate (no MODE= flag) generates groups, tag policies, tag assignments, FGAC policies, and masking functions, but no genie_space_configs. make apply applies all three layers — account, data_access, and workspace — but creates no Genie Space.

Phase 2 — § 2 → § 4 upgrade path:

A genie_spaces block is added to env.auto.tfvars and make generate SPACE="Finance Analytics" is run (per-space generation, not a full re-generate). make apply then creates the Genie Space. The test asserts that the existing data_access/terraform.tfstate is preserved (governance not disturbed) and ABAC verification still passes.

Key assertions:

generated/abac.auto.tfvars does NOT declare genie_space_configs after Phase 1
generated/masking_functions.sql IS generated (full ABAC mode)
No .genie_space_id_* file after Phase 1 apply
data_access/terraform.tfstate exists after Phase 1 apply (all layers applied)
generated/abac.auto.tfvars contains Finance Analytics after Phase 2 generate
.genie_space_id_finance_analytics file exists after Phase 2 apply
data_access/terraform.tfstate still exists after Phase 2 apply (governance preserved)
Column tags and masks verified after both phases

10. multi-space-import — Import two UI-created Genie Spaces at once

Validates the multi-space import pattern from playbook.md § 3.

Two Genie Spaces are created directly via the Genie REST API (simulating spaces built in the Databricks UI). The env.auto.tfvars is configured with two genie_space_id entries. make generate imports both spaces' configs verbatim from the API and generates shared ABAC governance. make apply attaches to both spaces (applies governance and ACLs) without creating any new spaces.

Steps:

Step	Action
1	Create `dev_fin` + `dev_clinical` test catalogs
2	Create Finance Analytics Genie Space via `POST /api/2.0/genie/spaces`
3	Create Clinical Analytics Genie Space via `POST /api/2.0/genie/spaces`
4	Configure `dev` env: two `genie_space_id` entries, each with explicit `uc_tables`
5	`make generate ENV=dev` — imports both spaces, generates ABAC for both catalogs
6	Assert both `Finance Analytics` and `Clinical Analytics` in `generated/abac.auto.tfvars`
7	`make apply ENV=dev` — applies governance; no new spaces created
8	Assert no `.genie_space_id_*` files (both spaces attached, not created)
9	`setup_test_data.py --verify` — column tags and masks applied across both catalogs
10	Teardown: delete both API-created spaces + destroy Terraform resources

Key assertions:

generated/abac.auto.tfvars contains Finance Analytics and Clinical Analytics genie_space_configs entries
generated/abac.auto.tfvars references both dev_fin and dev_clinical catalogs
No .genie_space_id_* files after apply — Terraform attached, not created
Column tags and masking policies applied across both catalogs

9. schema-drift — Column tag drift detection

Validates the schema evolution workflow: detecting new untagged columns, stale tag assignments for deleted columns, and combined drift from column renames. Tests make audit-schema and make generate-delta.

Phase A — Baseline:

Uses the quickstart setup (Finance Analytics with dev_fin tables). After make generate + make apply, verifies the baseline audit does not report emergency_ssn (the test column that will be added later).

Phase B — Forward drift (ADD COLUMN):

ALTER TABLE dev_fin.finance.customers ADD COLUMN emergency_ssn STRING adds a new PII column. make audit-schema detects it as forward drift (exit code 1). make generate-delta classifies it using the LLM (constrained to existing governed keys/values) and merges the new tag_assignment into generated/abac.auto.tfvars. make apply deploys the tag. Re-running make audit-schema confirms drift is resolved (exit code 0).

Phase C — Reverse drift (DROP COLUMN):

Tags are unset, then ALTER TABLE DROP COLUMN emergency_ssn. make audit-schema detects the stale tag_assignment in config that references the now-deleted column. make generate-delta removes it automatically (no LLM call needed). Re-running make audit-schema confirms the stale assignment is gone.

Phase D — Combined drift (RENAME COLUMN):

Tags are unset on email, then ALTER TABLE RENAME COLUMN email TO contact_email. make audit-schema detects both reverse drift (stale email assignment) and forward drift (untagged contact_email). make generate-delta removes the old and classifies the new. make apply deploys. Audit confirms clean.

Step	Action
1	Quickstart baseline: setup data, generate, apply, verify
2	`make audit-schema` — assert `emergency_ssn` not reported
3	`ALTER TABLE ADD COLUMN emergency_ssn STRING`
4	`make audit-schema` — assert exit 1, `emergency_ssn` in output
5	`make generate-delta` — assert new `tag_assignment` added
6	`make apply` — assert tag applied in `column_tags`
7	`make audit-schema` — assert exit 0
8	Unset tags + `ALTER TABLE DROP COLUMN emergency_ssn`
9	`make audit-schema` — assert exit 1 (stale assignment)
10	`make generate-delta` — assert stale assignment removed
11	`make audit-schema` — assert exit 0
12	Unset tags + `ALTER TABLE RENAME COLUMN email TO contact_email`
13	`make audit-schema` — assert exit 1 (both directions)
14	`make generate-delta` — old removed, new classified
15	`make apply` — assert tag on `contact_email`
16	`make audit-schema` — assert exit 0

10. genie-import-no-abac — Import Genie Space, deploy to prod without ABAC

Validates the full workflow of importing an existing Genie Space and deploying it to production without generating or managing any ABAC governance. This is a valid use case when a separate governance team manages ABAC centrally.

Steps:

Step	Action
1	Create `dev_fin` + `prod_fin` test catalogs
2	Create a Genie Space via REST API (simulating a UI-configured space)
3	`make setup ENV=import_noabac` — scaffold env
4	Write `env.auto.tfvars` with `genie_only = true` and `genie_space_id` pointing to the API-created space
5	`make generate ENV=import_noabac MODE=genie` — generate genie config only
6	Assert: `genie_space_configs` present, `tag_assignments` / `fgac_policies` absent, no `masking_functions.sql`
7	`make apply-genie ENV=import_noabac` — deploy workspace layer
8	Assert: no `.genie_space_id_*` file (space attached, not created); space accessible via API
9	`make promote SOURCE_ENV=import_noabac DEST_ENV=import_noabac_prod DEST_CATALOG_MAP=dev_fin=prod_fin` — promote remaps genie config or gracefully skips
10	`make apply-genie ENV=import_noabac_prod` — deploy prod workspace
11	Assert: no `data_access/terraform.tfstate`, no account resources, no `masking_functions.sql`, no `tag_assignments` / `fgac_policies`

Key assertions:

Imported space is attached (not created) — no .genie_space_id_* in dev env; space verified via API
make promote either remaps genie config or exits 0 with a skip message (not a hard error)
No governance artifacts are produced at any stage
Only the workspace layer is managed — no account or data_access state

Verify Checks (setup_test_data.py --verify)

Check	Source	Pass condition
Row counts	`SELECT COUNT(*) FROM <table>`	Actual ≥ expected
Column tags	`system.information_schema.column_tags`	At least 1 tag per catalog
Column masks	`system.information_schema.column_masks`	At least 1 mask per catalog

Using `setup_test_data.py` Standalone

Run from your cloud wrapper root directory (genie/aws/ or genie/azure/).

Setup

# Dev catalogs only
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars

# Dev + prod catalogs (needed before make apply ENV=prod)
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --prod

Verify (run after `make apply`)

python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --verify
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --verify-prod

Teardown

python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
  --teardown --teardown-prod

CLI reference

Flag	Description
`--auth-file <path>`	Path to `auth.auto.tfvars` (default: `./auth.auto.tfvars`)
`--prod`	Also create prod catalogs (`prod_fin`, `prod_clinical`)
`--verify`	Assert dev table row counts + ABAC governance; exits non-zero on failure
`--verify-prod`	Same as `--verify` but for prod catalogs
`--teardown`	Drop dev catalogs (`dev_fin`, `dev_clinical`)
`--teardown-prod`	Drop prod catalogs (`prod_fin`, `prod_clinical`)
`--warehouse-id <id>`	Use a specific SQL warehouse instead of auto-selecting
`--dry-run`	Print SQL to stdout without executing

Legacy: `make integration-test`

The original monolithic integration test is still available. It combines the multi-space and promote scenarios (playbook.md § 1 + § 5) into a single pipeline without isolation:

# Full run — destroys everything at the end
make integration-test

# Keep data and deployed resources for inspection
make integration-test KEEP_DATA=1

# Pin a warehouse
make integration-test WAREHOUSE_ID=abc123ef

Pipeline steps:

Step	Command	Purpose
1	`setup_test_data.py --prod`	Create dev + prod UC catalogs and sample data
2	`make setup`	Scaffold env directories
3	`make apply ENV=account`	Deploy groups and tag policies
4	`make generate ENV=dev`	Full LLM generation (both spaces)
5	`make apply ENV=dev`	Deploy dev governance
6	`setup_test_data.py --verify`	Assert dev ABAC governance
7	`make generate SPACE="Finance Analytics"`	Per-space isolation check
8	`make promote ... DEST_CATALOG_MAP=...`	Remap dev → prod catalogs
9	`make apply ENV=prod`	Deploy prod governance
10	`setup_test_data.py --verify-prod`	Assert prod ABAC governance
11	Teardown	Drop data + destroy Terraform (skipped if `KEEP_DATA=1`)

Use make test-all instead for isolated, individually-reportable scenarios.

Cleanup

# Drop test data only (leave Terraform resources in place)
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
  --teardown --teardown-prod

# Destroy Terraform resources only (leaves UC catalogs in place)
make destroy ENV=prod
make destroy ENV=dev
make destroy ENV=account

# Full cleanup — data + Terraform
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
  --teardown --teardown-prod && \
make destroy ENV=prod && make destroy ENV=dev && make destroy ENV=account

Note: Always run make destroy before dropping UC catalogs. If catalogs are dropped first, the deploy_masking_functions destroy provisioner will fail with Catalog not found. If this happens, remove the stuck resource with terraform state rm module.data_access.null_resource.deploy_masking_functions in the affected env's data_access/ directory, then re-run make destroy.

Troubleshooting

AWS: IAM role not deleted — `ExpiredToken`

Symptom:

⚠  Could not delete IAM role 'genie-test-uc-role-*': An error occurred (ExpiredToken)
   when calling the ListRolePolicies operation: The security token included in the request is expired

Cause: You are using temporary AWS STS credentials (AWS_SESSION_TOKEN). The integration test suite takes ~90 minutes; if the STS token lifetime is shorter than the total run time (provision + test + teardown), the token expires before teardown can delete the IAM role.

Fix (preferred) — switch to long-lived IAM user keys:

Remove AWS_SESSION_TOKEN from scripts/account-admin.aws.env and replace AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY with permanent IAM user credentials. Long-lived keys never expire and work reliably across the full CI pipeline.

Fix (immediate) — refresh the token and re-run teardown:

# Export fresh credentials in your shell (overrides the stale file values)
export AWS_ACCESS_KEY_ID=ASIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...

# Re-run teardown — it will pick up the fresh env vars
python scripts/provision_test_env.py teardown

Fix (manual) — delete the role directly in AWS:

If you cannot obtain fresh credentials, delete the orphaned role in the AWS Console:

Go to IAM → Roles
Search for genie-test-uc-role-
Select the role → Delete

The Databricks workspace and metastore are always removed by teardown regardless of whether the IAM step succeeds, so only the IAM role requires manual cleanup.

Prevention for make test-ci: If your organisation requires STS tokens, extend the session duration to at least 4 hours before starting the pipeline:

# Request a longer-lived token (max depends on your IAM policy, up to 12 h for roles)
aws sts assume-role --role-arn arn:aws:iam::<account>:role/<role> \
  --role-session-name genie-ci --duration-seconds 14400   # 4 hours

Azure: Storage account or access connector not deleted — auth error

Symptom:

⚠  Could not delete storage account 'genietest*': The client secret has expired.

Cause: Azure AD client secrets have a finite lifetime (default 6 months or 2 years). If the secret expires between provisioning and teardown, Azure API calls fail.

Fix — generate a new client secret and re-run teardown:

Azure Portal → Microsoft Entra ID → App registrations → your app → Certificates & secrets
Generate a new client secret
Update AZURE_CLIENT_SECRET in scripts/account-admin.azure.env

Re-run teardown:

python scripts/provision_test_env.py teardown

Fix (manual) — delete resources directly in Azure Portal:

Go to Resource Groups → your RG
Search for genietest — delete the storage account and access connector
Go to Microsoft Entra ID → Enterprise applications — remove any test managed identities

Prevention: Use a client secret with a longer expiry (2 years), or automate secret rotation in your CI pipeline.

Orphaned workspace or metastore after failed teardown

If teardown fails completely, check the Databricks Account Console:

Workspaces: Account Console → Workspaces → filter by name genie-test-* → Delete
Metastores: Account Console → Data → Unity Catalog → filter by name genie-test-* → Delete (check "Force delete")
Groups: Account Console → User Management → Groups → filter by name genie-test-admins-* → Delete

After manually cleaning up, remove the stale state file so subsequent runs start clean:

rm -f scripts/.test_env_state.json

FilesExpand file tree

integration-testing.md

Latest commit

History

integration-testing.md

File metadata and controls

Integration Testing

Recommended Testing Workflow

One command for CI/CD

Manual step-by-step (for local development)

Why provision a fresh environment?

Unit Tests

Provisioning a Fresh Integration-Test Environment

One-time setup

Section 1 — Databricks credentials (both clouds)

Section 2 — AWS credentials (account-admin.aws.env only)

Section 2 — Azure credentials (account-admin.azure.env only)

Provision

Run tests

Tear down

Options

Scenarios

Test Catalogs

Dev

Prod (mirror schema, different data)

Quick Start

Prerequisites

Run all scenarios

Run a single scenario

Run directly with Python

Scenario Details

1. quickstart — Single space, single catalog

2. multi-catalog — One space spanning two catalogs

3. multi-space — Two spaces, separate catalogs

4. per-space — Incremental space addition (isolation test)

5. promote — dev → prod cross-env promotion

6. multi-env — Two independent envs (BU scenario)

7. attach-promote — Attach to UI-created space and promote to prod

8. self-service-genie — Central governance + BU teams self-serve Genie

9. abac-only — ABAC governance only (no Genie Space) + upgrade path

10. multi-space-import — Import two UI-created Genie Spaces at once

9. schema-drift — Column tag drift detection

10. genie-import-no-abac — Import Genie Space, deploy to prod without ABAC

Verify Checks (setup_test_data.py --verify)

Using setup_test_data.py Standalone

Setup

Verify (run after make apply)

Teardown

CLI reference

Legacy: make integration-test

Cleanup

Troubleshooting

AWS: IAM role not deleted — ExpiredToken

Azure: Storage account or access connector not deleted — auth error

Orphaned workspace or metastore after failed teardown

Section 2 — AWS credentials (`account-admin.aws.env` only)

Section 2 — Azure credentials (`account-admin.azure.env` only)

Using `setup_test_data.py` Standalone

Verify (run after `make apply`)

Legacy: `make integration-test`

AWS: IAM role not deleted — `ExpiredToken`