This guide covers the automated integration test scenarios that validate the
playbooks documented in docs/playbook.md, the setup_test_data.py helper, and the
run_integration_tests.py test runner.
For automated pipelines (PR checks, nightly runs), use make test-ci:
make test-ciThis single command runs the full pipeline in order:
- Unit tests — fast Python-only checks, no cloud resources needed
- Provision — fresh isolated Databricks workspace + metastore (~10–15 min)
- Integration tests — all scenarios (~90 min)
- Teardown — always runs, even if tests fail, so no cloud resources are left behind
Exit code is non-zero if any phase fails. Teardown is always executed.
Options:
# Use a custom credentials file (useful for CI secrets injection)
make test-ci ACCOUNT_ADMIN_ENV=/path/to/credentials.env
# Run only one scenario (faster iteration)
make test-ci SCENARIO=quickstart
# Pin a specific SQL warehouse
make test-ci WAREHOUSE_ID=abc123GitHub Actions example:
- name: Integration tests
run: make test-ci
env:
# Write the credentials file from a secret, then pass the path
ACCOUNT_ADMIN_ENV: ${{ runner.temp }}/account-admin.envUse unit tests first to catch logic bugs quickly (< 1 second, no credentials required), then provision a fresh isolated environment for integration tests to avoid stale quota counter issues in a shared metastore.
make test-unit # fast — pure Python, no LLM/Terraform/Databricks
↓ (all pass)
python scripts/provision_test_env.py provision # ~10-15 min — creates fresh workspace + metastore
↓
python scripts/run_integration_tests.py # slow — deploys real resources (~hours)
↓
python scripts/provision_test_env.py teardown # wipe the environment when done
Databricks metastore-wide FGAC policy quotas use an eventually consistent counter that can lag behind actual policy deletions by several minutes. In a long-lived shared metastore, the counter can accumulate drift and incorrectly block new policy creation even when no policies actually exist. Provisioning a fresh workspace + metastore for each test run gives a clean counter that always starts at zero.
The tests/ directory contains pytest-based unit tests for the core Python
functions — all autofix functions in generate_abac.py and all validation
functions in validate_abac.py.
Run:
# Install deps once (if not already installed)
pip install pytest python-hcl2
# Run all 60+ unit tests (~1 second, no Databricks connection needed)
make test-unit
# Or invoke pytest directly for richer output
python3 -m pytest tests/ -v
python3 -m pytest tests/test_generate_abac.py -v # autofix functions only
python3 -m pytest tests/test_validate_abac.py -v # validation functions only
python3 -m pytest tests/ -k "TagPolicies" -v # filter by nameWhat is tested:
| Test file | Functions covered |
|---|---|
tests/test_generate_abac.py |
fix_hcl_syntax, autofix_tag_policies, autofix_invalid_tag_values, autofix_undefined_tag_refs, autofix_missing_fgac_policies, autofix_fgac_policy_count |
tests/test_validate_abac.py |
validate_groups, validate_tag_policies, validate_tag_assignments, validate_fgac_policies, parse_sql_functions, parse_sql_function_arg_counts, _condition_matches_tags |
tests/test_schema_drift.py |
PII column pattern regex, env file parsing (both uc_tables and genie_spaces shapes), governed-key resolution (4-level fallback), delta merge/dedup, delta validation (reject unknown keys/values), stale assignment removal |
Unit tests catch the most common failure categories without incurring the cost of a full LLM + Terraform run:
- LLM output contains missing commas between HCL objects →
fix_hcl_syntax - LLM uses a tag value not in the allowed list →
autofix_tag_policies - LLM generates an assignment with a typo'd value →
autofix_invalid_tag_values - LLM references a tag key that was never defined →
autofix_undefined_tag_refs - An uncovered sensitive column is left without an FGAC policy →
autofix_missing_fgac_policies - Too many FGAC policies for one catalog →
autofix_fgac_policy_count
scripts/provision_test_env.py creates a brand-new serverless Databricks
workspace and Unity Catalog metastore specifically for integration testing, then
writes all auth.auto.tfvars files so the test runner uses that environment.
Run make setup from the aws/ or azure/ directory — it automatically copies the matching example file to scripts/account-admin.<cloud>.env if it does not yet exist. Then fill in your credentials:
# AWS
vi scripts/account-admin.aws.env
# Azure
vi scripts/account-admin.azure.envThe file has two sections — shared Databricks credentials and cloud-specific credentials.
| Key | Where to find it |
|---|---|
DATABRICKS_ACCOUNT_ID |
Account Console → top-right menu → Account ID |
DATABRICKS_CLIENT_ID |
Account Console → User Management → Service Principals → <SP> → Application ID |
DATABRICKS_CLIENT_SECRET |
Same SP → OAuth Secrets → Generate Secret |
Note: The Account Console URL differs by cloud:
- AWS:
https://accounts.cloud.databricks.com- Azure:
https://accounts.azuredatabricks.net
| Key | Where to find it |
|---|---|
DATABRICKS_AWS_REGION |
AWS region for the new workspace (e.g. ap-southeast-2) |
AWS_ACCESS_KEY_ID |
AWS credentials with IAM + S3 write permissions (see below) |
AWS_SECRET_ACCESS_KEY |
Same IAM user or role |
AWS_SESSION_TOKEN |
Only needed for temporary STS credentials (see recommendation below) |
AWS credential type recommendation:
Use long-lived IAM user credentials (
AWS_ACCESS_KEY_ID+AWS_SECRET_ACCESS_KEY, noAWS_SESSION_TOKEN) whenever possible. Temporary STS tokens expire after 1–12 hours, which can cause teardown to fail if the full test run (~90 min + review time) outlasts the token lifetime.
| Credential type | AWS_SESSION_TOKEN required |
Expires | Recommended for |
|---|---|---|---|
| IAM user access keys | No | Never | Local dev and CI/CD |
AWS SSO / aws sso login |
Yes (auto-set by CLI) | 1–8 h | Interactive use only |
STS AssumeRole |
Yes | 15 min – 12 h | Short-lived pipelines |
Required IAM permissions:
iam:CreateRole, iam:DeleteRole, iam:PutRolePolicy, iam:DeleteRolePolicy,
iam:ListRolePolicies, iam:ListAttachedRolePolicies, iam:DetachRolePolicy,
iam:UpdateAssumeRolePolicy, sts:GetCallerIdentity
Required S3 permissions:
s3:CreateBucket, s3:DeleteBucket, s3:PutPublicAccessBlock,
s3:ListBucketVersions, s3:DeleteObject, s3:DeleteObjectVersion
Auto-created AWS resources:
The provision script auto-creates an S3 bucket named genie-uc-test-<aws-account-id> in the configured region. The bucket is reused across test runs and only deleted on teardown if the script created it.
| Step | What the script creates |
|---|---|
provision |
A unique S3 prefix inside the bucket: s3://<bucket>/genie-test-<run-id>/ |
provision |
An AWS IAM role (genie-test-uc-role-<run-id>) scoped to that prefix |
provision |
A Databricks storage credential backed by the IAM role |
provision |
A Databricks External Location covering s3://<bucket>/genie-test-<run-id>/ |
| Integration tests | Each catalog gets its own subfolder: .../genie-test-<run-id>/<catalog-name>/ |
teardown |
Deletes the IAM role; the metastore deletion cascades to catalogs/schemas/policies |
S3 objects written during the test are not deleted by teardown — the metastore and workspace are destroyed at the Databricks layer, but the underlying S3 prefixes remain. They are cheap (a few MB of small Delta files) and isolated by run-id. Clean them up periodically with:
aws s3 rm s3://<your-bucket>/ --recursive --exclude "*" --include "genie-test-*"| Key | Where to find it |
|---|---|
AZURE_SUBSCRIPTION_ID |
Azure Portal → Subscriptions → Subscription ID |
AZURE_RESOURCE_GROUP |
Azure Portal → Resource Groups → name (must already exist) |
AZURE_REGION |
Must match the workspace region (e.g. eastus2, australiaeast, westeurope) |
AZURE_TENANT_ID |
Azure Portal → Microsoft Entra ID → Overview → Tenant ID |
AZURE_CLIENT_ID |
Azure Portal → Microsoft Entra ID → App registrations → Application (client) ID |
AZURE_CLIENT_SECRET |
Same App registration → Certificates & secrets → New client secret |
Note: Azure client secrets expire (default 6 months or 2 years). If teardown fails with an auth error, generate a new secret and re-run.
Required Azure RBAC roles on the resource group:
| Role | Why it's needed |
|---|---|
Contributor |
Create/delete storage accounts, access connectors |
Storage Blob Data Contributor |
Manage blob data in ADLS Gen2 containers |
User Access Administrator (optional) |
Assign managed-identity roles; if absent, the script falls back to your local az login for role assignments |
Assign roles with:
az role assignment create \
--assignee <AZURE_CLIENT_ID> \
--role "Contributor" \
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>"
az role assignment create \
--assignee <AZURE_CLIENT_ID> \
--role "Storage Blob Data Contributor" \
--scope "/subscriptions/<SUBSCRIPTION_ID>/resourceGroups/<RESOURCE_GROUP>"Auto-created Azure resources:
| Step | What the script creates |
|---|---|
provision |
An ADLS Gen2 storage account (genietest<run-id>) with a blob container |
provision |
A Databricks Access Connector with a managed identity |
provision |
A Databricks storage credential backed by the Access Connector |
provision |
A Databricks External Location pointing to the ADLS container |
teardown |
Deletes all Azure resources (storage account, access connector, role assignments) |
python scripts/provision_test_env.py provisionThis will:
- Look up the SP's SCIM identity in the Databricks account.
- Create a serverless workspace (
genie-test-<id>). - Create a fresh Unity Catalog metastore with a unique storage path.
- Assign the metastore to the workspace.
- Create cloud-specific storage infrastructure:
- AWS: An IAM role scoped to the test S3 prefix, registered as a storage credential
- Azure: An ADLS Gen2 storage account + Access Connector with managed identity, registered as a storage credential
- Create an External Location so catalogs can be created without a metastore root.
- Set the SP as metastore admin and workspace admin.
- Write
auth.auto.tfvarsfor all env directories (dev,bu2,prod,account). - Save a state file (
scripts/.test_env_state.json) for teardown.
Workspace provisioning typically takes 10–15 minutes.
# Check what environment is provisioned
python scripts/provision_test_env.py status
# Run all scenarios (warehouse is auto-detected from the workspace)
python scripts/run_integration_tests.py
# Run a specific scenario
python scripts/run_integration_tests.py --scenario quickstartpython scripts/provision_test_env.py teardownThis deletes cloud-specific resources, the workspace, metastore (and all catalogs/schemas/policies inside it), admin group, and removes the generated auth.auto.tfvars files.
- AWS: Deletes the IAM role created during provisioning. S3 objects remain (see cleanup note above).
- Azure: Deletes the storage account, access connector, and any role assignments.
If teardown fails with an auth error:
AWS — "ExpiredToken": Your STS session token expired during the test run. Export fresh credentials and re-run:
export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... export AWS_SESSION_TOKEN=... # omit if using long-lived keys python scripts/provision_test_env.py teardownAzure — "ClientSecretExpired" or "InvalidAuthenticationToken": Your Azure client secret has expired. Generate a new secret in the Azure Portal (Microsoft Entra ID → App registrations → Certificates & secrets), update
account-admin.azure.env, and re-run teardown.The Databricks workspace and metastore are always deleted by teardown regardless of whether the cloud resource cleanup succeeds.
| Flag | Description |
|---|---|
--env-file PATH |
Path to credentials file (default: scripts/account-admin.<cloud>.env) |
--dry-run |
Print what would happen without creating/deleting anything |
--force |
With provision: overwrite an existing provisioned environment |
scripts/run_integration_tests.py runs each playbook.md scenario end-to-end with
full data setup, LLM generation, Terraform apply, assertions, and teardown. Each
scenario is isolated — state from a previous run is destroyed and cleaned before
the next one starts.
| Scenario | playbook.md section | What it validates |
|---|---|---|
| quickstart | § 1 | Single Genie Space backed by a single UC catalog (dev_fin) |
| multi-catalog | § 1 (multi-catalog) | One Genie Space drawing tables from two catalogs (dev_fin + dev_clinical) |
| multi-space | § 1 (multi-space) | Two Genie Spaces with separate catalogs — Finance Analytics + Clinical Analytics |
| per-space | § 4 | Add Clinical Analytics incrementally without touching Finance Analytics (isolation guarantee) |
| promote | § 5 | Full dev → prod promotion with catalog remapping across both spaces |
| multi-env | § 6 | Two independent envs on the same account: dev (Finance), bu2 (Clinical) |
| attach-promote | § 3 | Import a Genie Space already configured in the UI — govern it, then promote to prod |
| self-service-genie | § 7 | Central governance team + two BU Genie teams self-serve; second BU isolation check; BU promote to prod via apply-genie; governance state verified unchanged throughout |
| abac-only | § 2 | ABAC governance only (no Genie Space) + §2→§4 upgrade path: add Genie Space later without disturbing governance |
| multi-space-import | § 3 (multi-space) | Import two UI-configured Genie Spaces in one make generate; assert both configs present, Terraform creates no new spaces |
| schema-drift | — | Detects and classifies new columns after initial ABAC deployment; tests make audit-schema and make generate-delta across ADD/DROP/RENAME COLUMN scenarios |
| genie-only | § 7 (genie_only) | Minimal-privilege SP (workspace USER + SQL entitlement) creates Genie Space with genie_only=true; no account-level resources |
| genie-import-no-abac | § 3 + § 7 | Import an existing Genie Space and deploy to prod without any ABAC governance — validates the genie-only import-to-prod workflow when a separate team manages ABAC centrally |
| country-overlay | — | Country/region overlays (ANZ, IN, SEA) — full cycle per region + multi-region generation |
| industry-overlay | — | Industry overlays (financial_services, healthcare, retail) — full cycle per industry + multi-industry + country+industry composition (COUNTRY=ANZ INDUSTRY=healthcare) |
| aus-bank-demo | — | Australian bank demo — champion flow (ANZ + financial_services, import + promote with dev_bank→prod_bank catalog remap) |
| india-bank-demo | — | India bank demo — champion flow (IN + financial_services, Aadhaar/PAN/GSTIN/UPI masking, import + promote with dev_lakshmi→prod_lakshmi) |
| asean-bank-demo | — | ASEAN bank demo — champion flow (SEA + financial_services, 6 nullable national ID columns, multi-currency, import + promote with dev_asean_bank→prod_asean_bank) |
setup_test_data.py creates the following Unity Catalog resources:
| Catalog | Schema | Table | Rows | Sensitive data |
|---|---|---|---|---|
dev_fin |
finance |
customers |
10 | SSN, DOB, email, phone (PII) |
dev_fin |
finance |
transactions |
15 | AML flag, risk score (AML) |
dev_fin |
finance |
credit_cards |
10 | card number, CVV (PCI) |
dev_clinical |
clinical |
patients |
10 | SSN, DOB, insurance ID (PHI) |
dev_clinical |
clinical |
encounters |
12 | diagnosis, treatment notes (PHI) |
| Catalog | Schema | Table | Rows |
|---|---|---|---|
prod_fin |
finance |
customers |
10 |
prod_fin |
finance |
transactions |
15 |
prod_fin |
finance |
credit_cards |
10 |
prod_clinical |
clinical |
patients |
10 |
prod_clinical |
clinical |
encounters |
12 |
envs/dev/auth.auto.tfvarsconfigured with workspace credentials- A SQL warehouse available in the workspace (or pass
WAREHOUSE_ID=<id>to avoid cold-start delay)
# Run all six scenarios sequentially (full teardown after each)
make test-all
# Keep data and Terraform resources after the run for inspection
make test-all KEEP_DATA=1
# Pin a warehouse to avoid cold-start delay
make test-all WAREHOUSE_ID=abc123efmake test-quickstart
make test-multi-catalog
make test-multi-space
make test-per-space
make test-promote
make test-multi-env
make test-attach-promote
make test-self-service-genie
make test-abac-only
make test-multi-space-import
make test-genie-only
make test-genie-import-no-abac
make test-country-overlay
make test-industry-overlay
make test-aus-bank-demo
make test-india-bank-demo
make test-asean-bank-demo
# All targets accept WAREHOUSE_ID= and KEEP_DATA=1
make test-promote WAREHOUSE_ID=abc123ef KEEP_DATA=1# List available scenarios
python scripts/run_integration_tests.py --list
# Run all
python scripts/run_integration_tests.py
# Run one scenario
python scripts/run_integration_tests.py --scenario quickstart
python scripts/run_integration_tests.py --scenario per-space --keep-data
# Pin a warehouse
python scripts/run_integration_tests.py --warehouse-id abc123ef
# Non-default auth file
python scripts/run_integration_tests.py --auth-file envs/dev/auth.auto.tfvarsValidates the core quickstart from docs/playbook.md § 1 with a single Genie
Space backed by dev_fin.
Steps:
| Step | Action |
|---|---|
| 1 | Create dev_fin test catalogs and sample data |
| 2 | Configure dev env: one space "Finance Analytics" with dev_fin.* tables |
| 3 | make generate ENV=dev — LLM generates ABAC config and masking functions |
| 4 | Assert generated/abac.auto.tfvars and generated/spaces/finance_analytics/ created |
| 5 | make apply ENV=dev — deploys account, data_access, workspace layers |
| 6 | Assert .genie_space_id_finance_analytics file exists |
| 7 | setup_test_data.py --verify — row counts, column tags, column masks |
| 8 | Teardown data + Terraform resources |
Key assertions:
generated/abac.auto.tfvarscontainsFinance Analyticsgenie_space_configs entrygenerated/spaces/finance_analytics/abac.auto.tfvarsexists (per-space dir bootstrapped).genie_space_id_*file created after apply- Row counts ≥ expected for all
dev_fintables - Column tags and masking policies applied
Validates the "single space spanning multiple catalogs" pattern from playbook.md § 1.
One space ("Combined Analytics") draws tables from both dev_fin and dev_clinical.
Key assertions:
generated/abac.auto.tfvarscontainsCombined Analyticsand references bothdev_finanddev_clinical- Only one Genie Space deployed
- Column tags applied across both catalogs
Validates the two-space multi-catalog flow from playbook.md § 1. Finance Analytics
uses dev_fin; Clinical Analytics uses dev_clinical. This is the core of the
original make integration-test flow.
Key assertions:
generated/abac.auto.tfvarscontains bothFinance AnalyticsandClinical Analyticsentriesgenerated/spaces/finance_analytics/andgenerated/spaces/clinical_analytics/both bootstrapped- Two
.genie_space_id_*files created - Row counts and ABAC verified for both catalogs
Validates the per-space generation isolation guarantee from playbook.md § 4.
Phase 1: Deploy Finance Analytics only.
Phase 2: Add Clinical Analytics using make generate SPACE="Clinical Analytics" —
without triggering a full LLM re-run over Finance Analytics.
Key assertions:
- After full generate:
Finance Analyticsin assembled output,Clinical Analyticsabsent generated/spaces/finance_analytics/abac.auto.tfvarscontent is byte-for-byte unchanged after the per-space generate for Clinical Analytics- Assembled
generated/abac.auto.tfvarscontains both spaces after merge generated/spaces/clinical_analytics/abac.auto.tfvarscreated by SPACE= generate- Both Genie Spaces deployed after final apply
Validates the full dev → prod promotion from playbook.md § 5.
Catalog mapping: dev_fin → prod_fin, dev_clinical → prod_clinical
Steps:
| Step | Action |
|---|---|
| 1 | Create dev + prod test catalogs |
| 2 | Configure dev: two spaces (Finance + Clinical) |
| 3 | make generate ENV=dev + make apply ENV=dev |
| 4 | Verify dev data + ABAC |
| 5 | make promote SOURCE_ENV=dev DEST_ENV=prod DEST_CATALOG_MAP=dev_fin=prod_fin,dev_clinical=prod_clinical |
| 6 | make apply ENV=prod |
| 7 | setup_test_data.py --verify-prod |
| 8 | Teardown both envs |
Key assertions:
envs/prod/env.auto.tfvarswritten by promote withprod_fincatalog referencesenvs/prod/generated/abac.auto.tfvarscontains remapped prod catalog names- Prod column tags and masking policies applied
Validates the independent second environment from playbook.md § 6.
devenv: Finance Analytics backed bydev_finbu2env: Clinical Analytics backed bydev_clinical- Both envs use the same Databricks workspace and account
- Each has its own
make generate+make applycycle with completely separate generated config and Terraform state
Key assertions:
dev/generated/abac.auto.tfvarscontains Finance Analytics, not Clinical Analyticsbu2/generated/abac.auto.tfvarscontains Clinical Analytics, not Finance Analyticsenvs/dev/terraform.tfstateandenvs/bu2/terraform.tfstateexist and differ- Finance Analytics Genie Space deployed in dev, Clinical Analytics deployed in bu2
Validates the "Import an existing Genie Space" flow from playbook.md § 3, combined with a dev → prod promotion. This is the adoption story: a data team already built a Genie Space in the Databricks UI and now wants to bring it under ABAC governance.
Phase 1 — Simulate UI configuration:
A Finance Analytics Genie Space is created directly via the Genie REST API
(POST /api/2.0/genie/spaces) with dev_fin tables. This represents the space a
data team built in the UI before this tool was adopted.
Phase 2 — Attach with explicit uc_tables:
env.auto.tfvars is configured with genie_space_id and explicit uc_tables. The
Genie API's serialized_space field is not immediately available for newly-created spaces
(async processing, can take several minutes), so the test provides the table list directly
rather than relying on auto-discovery. This simulates the playbook.md manual step: the user
runs make generate (which logs discovered tables), then copies them into
data_access/env.auto.tfvars.
make generate imports the space's config (instructions, benchmarks, sample questions)
verbatim from the Genie API response — not re-generated by the LLM. The ABAC governance
(groups, tag policies, masking functions) is generated fresh from the table DDLs.
Phase 3 — Apply:
make apply deploys ABAC governance (group ACLs, column tags, masking functions,
FGAC policies) without creating or deleting the Genie Space. Terraform operates
only on existing_spaces resources — no genie_space_create provisioner runs.
Phase 4 — Promote:
make promote SOURCE_ENV=dev DEST_ENV=prod DEST_CATALOG_MAP=dev_fin=prod_fin followed
by make apply ENV=prod applies the same governance to prod.
Steps:
| Step | Action |
|---|---|
| 1 | Create dev_fin + prod_fin test catalogs |
| 2 | Create a Genie Space via POST /api/2.0/genie/spaces (simulating UI setup) |
| 3 | Configure dev env: genie_space_id = "<id>", no uc_tables |
| 4 | make generate ENV=dev — discovers tables from Genie API, generates ABAC |
| 5 | Assert generated config references dev_fin catalog and Finance Analytics |
| 6 | Update env.auto.tfvars with discovered uc_tables (simulating playbook.md manual step) |
| 7 | make apply ENV=dev — applies ABAC; no new space created |
| 8 | Assert no .genie_space_id_* file was created (space not created by Terraform) |
| 9 | setup_test_data.py --verify — row counts, column tags, masks |
| 10 | make promote ... DEST_CATALOG_MAP=dev_fin=prod_fin |
| 11 | make apply ENV=prod |
| 12 | setup_test_data.py --verify-prod |
| 13 | Delete the UI-created space via DELETE /api/2.0/genie/spaces/{id} (teardown) |
Key assertions:
generated/abac.auto.tfvarscontainsdev_fincatalog references (tables discovered from API)generated/abac.auto.tfvarscontainsFinance Analyticsgenie_space_configsentry- No
.genie_space_id_*file exists aftermake apply— Terraform did not create a new space envs/prod/generated/abac.auto.tfvarscontainsprod_finafter promote- Column tags and masking policies applied in both dev and prod
Validates the self-service Genie pattern from playbook.md § 7 and self-service-genie.md.
Phase 1 — Governance team:
A governance env is set up with both dev_fin + dev_clinical table references and no genie_spaces block. make generate MODE=governance is run — only ABAC content is generated. make apply-governance applies the account and data_access layers without touching the workspace layer.
Phase 2 — BU Finance team:
A bu_fin env is set up with a Finance Analytics space pointing at dev_fin tables. make generate MODE=genie is run — only genie_space_configs is generated (no ABAC, no masking SQL). make apply-genie applies only the workspace layer and creates the Genie Space.
Phase 3 — Adding a second BU (isolation check):
A bu_clin env is set up with a Clinical Analytics space. make generate MODE=genie + make apply-genie runs for bu_clin. The test then asserts that governance/data_access/terraform.tfstate is byte-for-byte unchanged — proving that adding a second BU team has zero effect on governance state.
Phase 4 — BU Finance team promote to prod:
make promote SOURCE_ENV=bu_fin DEST_ENV=bu_fin_prod DEST_CATALOG_MAP=dev_fin=prod_fin followed by make apply-genie ENV=bu_fin_prod (not make apply). Asserts bu_fin_prod has a .genie_space_id_* file but no data_access/terraform.tfstate, and that the governance state remains unmodified.
Steps:
| Step | Action |
|---|---|
| 1 | Create dev_fin + dev_clinical + prod_fin test catalogs |
| 2 | Configure governance env with both dev catalogs' tables (no genie_spaces) |
| 3 | make generate ENV=governance MODE=governance |
| 4 | Assert tag_assignments + fgac_policies present; genie_space_configs absent |
| 5 | make apply-governance ENV=governance |
| 6 | Assert data_access/terraform.tfstate exists; no .genie_space_id_* file |
| 7 | Configure bu_fin env with Finance Analytics space |
| 8 | make generate ENV=bu_fin MODE=genie |
| 9 | Assert genie_space_configs present; tag_assignments + fgac_policies absent; no masking_functions.sql |
| 10 | make apply-genie ENV=bu_fin |
| 11 | Assert .genie_space_id_finance_analytics file exists; no data_access/terraform.tfstate |
| 12 | Snapshot governance/data_access/terraform.tfstate content |
| 13 | Configure bu_clin env with Clinical Analytics space; make generate MODE=genie + make apply-genie |
| 14 | Assert bu_clin has .genie_space_id_clinical_analytics; governance state byte-for-byte unchanged |
| 15 | make promote SOURCE_ENV=bu_fin DEST_ENV=bu_fin_prod DEST_CATALOG_MAP=dev_fin=prod_fin |
| 16 | make apply-genie ENV=bu_fin_prod |
| 17 | Assert bu_fin_prod has .genie_space_id_*; no data_access/terraform.tfstate; governance state still unchanged |
| 18 | Teardown all envs |
Key assertions:
governance/generated/abac.auto.tfvarscontainstag_assignmentsandfgac_policiesgovernance/generated/abac.auto.tfvarsdoes NOT containgenie_space_configsgovernance/generated/masking_functions.sqlexistsbu_fin/generated/abac.auto.tfvarscontainsgenie_space_configsbu_fin/generated/abac.auto.tfvarsdoes NOT containtag_assignmentsorfgac_policiesbu_fin/generated/masking_functions.sqldoes NOT exist (governance team owns it)- Cross-layer state isolation: governance has data_access state; BU envs have workspace state only
governance/data_access/terraform.tfstatebyte-for-byte unchanged after second BU + BU prod promote
Validates the "ABAC governance only" flow from playbook.md § 2 and the § 2 → § 4 upgrade path.
Phase 1 — ABAC-only deploy:
env.auto.tfvars is configured with uc_tables only — no genie_spaces block. Plain make generate (no MODE= flag) generates groups, tag policies, tag assignments, FGAC policies, and masking functions, but no genie_space_configs. make apply applies all three layers — account, data_access, and workspace — but creates no Genie Space.
Phase 2 — § 2 → § 4 upgrade path:
A genie_spaces block is added to env.auto.tfvars and make generate SPACE="Finance Analytics" is run (per-space generation, not a full re-generate). make apply then creates the Genie Space. The test asserts that the existing data_access/terraform.tfstate is preserved (governance not disturbed) and ABAC verification still passes.
Key assertions:
generated/abac.auto.tfvarsdoes NOT declaregenie_space_configsafter Phase 1generated/masking_functions.sqlIS generated (full ABAC mode)- No
.genie_space_id_*file after Phase 1 apply data_access/terraform.tfstateexists after Phase 1 apply (all layers applied)generated/abac.auto.tfvarscontainsFinance Analyticsafter Phase 2 generate.genie_space_id_finance_analyticsfile exists after Phase 2 applydata_access/terraform.tfstatestill exists after Phase 2 apply (governance preserved)- Column tags and masks verified after both phases
Validates the multi-space import pattern from playbook.md § 3.
Two Genie Spaces are created directly via the Genie REST API (simulating spaces built in the Databricks UI). The env.auto.tfvars is configured with two genie_space_id entries. make generate imports both spaces' configs verbatim from the API and generates shared ABAC governance. make apply attaches to both spaces (applies governance and ACLs) without creating any new spaces.
Steps:
| Step | Action |
|---|---|
| 1 | Create dev_fin + dev_clinical test catalogs |
| 2 | Create Finance Analytics Genie Space via POST /api/2.0/genie/spaces |
| 3 | Create Clinical Analytics Genie Space via POST /api/2.0/genie/spaces |
| 4 | Configure dev env: two genie_space_id entries, each with explicit uc_tables |
| 5 | make generate ENV=dev — imports both spaces, generates ABAC for both catalogs |
| 6 | Assert both Finance Analytics and Clinical Analytics in generated/abac.auto.tfvars |
| 7 | make apply ENV=dev — applies governance; no new spaces created |
| 8 | Assert no .genie_space_id_* files (both spaces attached, not created) |
| 9 | setup_test_data.py --verify — column tags and masks applied across both catalogs |
| 10 | Teardown: delete both API-created spaces + destroy Terraform resources |
Key assertions:
generated/abac.auto.tfvarscontainsFinance AnalyticsandClinical Analyticsgenie_space_configsentriesgenerated/abac.auto.tfvarsreferences bothdev_finanddev_clinicalcatalogs- No
.genie_space_id_*files after apply — Terraform attached, not created - Column tags and masking policies applied across both catalogs
Validates the schema evolution workflow: detecting new untagged columns, stale tag assignments for deleted columns, and combined drift from column renames. Tests make audit-schema and make generate-delta.
Phase A — Baseline:
Uses the quickstart setup (Finance Analytics with dev_fin tables). After make generate + make apply, verifies the baseline audit does not report emergency_ssn (the test column that will be added later).
Phase B — Forward drift (ADD COLUMN):
ALTER TABLE dev_fin.finance.customers ADD COLUMN emergency_ssn STRING adds a new PII column. make audit-schema detects it as forward drift (exit code 1). make generate-delta classifies it using the LLM (constrained to existing governed keys/values) and merges the new tag_assignment into generated/abac.auto.tfvars. make apply deploys the tag. Re-running make audit-schema confirms drift is resolved (exit code 0).
Phase C — Reverse drift (DROP COLUMN):
Tags are unset, then ALTER TABLE DROP COLUMN emergency_ssn. make audit-schema detects the stale tag_assignment in config that references the now-deleted column. make generate-delta removes it automatically (no LLM call needed). Re-running make audit-schema confirms the stale assignment is gone.
Phase D — Combined drift (RENAME COLUMN):
Tags are unset on email, then ALTER TABLE RENAME COLUMN email TO contact_email. make audit-schema detects both reverse drift (stale email assignment) and forward drift (untagged contact_email). make generate-delta removes the old and classifies the new. make apply deploys. Audit confirms clean.
| Step | Action |
|---|---|
| 1 | Quickstart baseline: setup data, generate, apply, verify |
| 2 | make audit-schema — assert emergency_ssn not reported |
| 3 | ALTER TABLE ADD COLUMN emergency_ssn STRING |
| 4 | make audit-schema — assert exit 1, emergency_ssn in output |
| 5 | make generate-delta — assert new tag_assignment added |
| 6 | make apply — assert tag applied in column_tags |
| 7 | make audit-schema — assert exit 0 |
| 8 | Unset tags + ALTER TABLE DROP COLUMN emergency_ssn |
| 9 | make audit-schema — assert exit 1 (stale assignment) |
| 10 | make generate-delta — assert stale assignment removed |
| 11 | make audit-schema — assert exit 0 |
| 12 | Unset tags + ALTER TABLE RENAME COLUMN email TO contact_email |
| 13 | make audit-schema — assert exit 1 (both directions) |
| 14 | make generate-delta — old removed, new classified |
| 15 | make apply — assert tag on contact_email |
| 16 | make audit-schema — assert exit 0 |
Validates the full workflow of importing an existing Genie Space and deploying it to production without generating or managing any ABAC governance. This is a valid use case when a separate governance team manages ABAC centrally.
Steps:
| Step | Action |
|---|---|
| 1 | Create dev_fin + prod_fin test catalogs |
| 2 | Create a Genie Space via REST API (simulating a UI-configured space) |
| 3 | make setup ENV=import_noabac — scaffold env |
| 4 | Write env.auto.tfvars with genie_only = true and genie_space_id pointing to the API-created space |
| 5 | make generate ENV=import_noabac MODE=genie — generate genie config only |
| 6 | Assert: genie_space_configs present, tag_assignments / fgac_policies absent, no masking_functions.sql |
| 7 | make apply-genie ENV=import_noabac — deploy workspace layer |
| 8 | Assert: no .genie_space_id_* file (space attached, not created); space accessible via API |
| 9 | make promote SOURCE_ENV=import_noabac DEST_ENV=import_noabac_prod DEST_CATALOG_MAP=dev_fin=prod_fin — promote remaps genie config or gracefully skips |
| 10 | make apply-genie ENV=import_noabac_prod — deploy prod workspace |
| 11 | Assert: no data_access/terraform.tfstate, no account resources, no masking_functions.sql, no tag_assignments / fgac_policies |
Key assertions:
- Imported space is attached (not created) — no
.genie_space_id_*in dev env; space verified via API make promoteeither remaps genie config or exits 0 with a skip message (not a hard error)- No governance artifacts are produced at any stage
- Only the workspace layer is managed — no account or data_access state
| Check | Source | Pass condition |
|---|---|---|
| Row counts | SELECT COUNT(*) FROM <table> |
Actual ≥ expected |
| Column tags | system.information_schema.column_tags |
At least 1 tag per catalog |
| Column masks | system.information_schema.column_masks |
At least 1 mask per catalog |
Run from your cloud wrapper root directory (genie/aws/ or genie/azure/).
# Dev catalogs only
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars
# Dev + prod catalogs (needed before make apply ENV=prod)
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --prodpython scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --verify
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars --verify-prodpython scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
--teardown --teardown-prod| Flag | Description |
|---|---|
--auth-file <path> |
Path to auth.auto.tfvars (default: ./auth.auto.tfvars) |
--prod |
Also create prod catalogs (prod_fin, prod_clinical) |
--verify |
Assert dev table row counts + ABAC governance; exits non-zero on failure |
--verify-prod |
Same as --verify but for prod catalogs |
--teardown |
Drop dev catalogs (dev_fin, dev_clinical) |
--teardown-prod |
Drop prod catalogs (prod_fin, prod_clinical) |
--warehouse-id <id> |
Use a specific SQL warehouse instead of auto-selecting |
--dry-run |
Print SQL to stdout without executing |
The original monolithic integration test is still available. It combines the multi-space and promote scenarios (playbook.md § 1 + § 5) into a single pipeline without isolation:
# Full run — destroys everything at the end
make integration-test
# Keep data and deployed resources for inspection
make integration-test KEEP_DATA=1
# Pin a warehouse
make integration-test WAREHOUSE_ID=abc123efPipeline steps:
| Step | Command | Purpose |
|---|---|---|
| 1 | setup_test_data.py --prod |
Create dev + prod UC catalogs and sample data |
| 2 | make setup |
Scaffold env directories |
| 3 | make apply ENV=account |
Deploy groups and tag policies |
| 4 | make generate ENV=dev |
Full LLM generation (both spaces) |
| 5 | make apply ENV=dev |
Deploy dev governance |
| 6 | setup_test_data.py --verify |
Assert dev ABAC governance |
| 7 | make generate SPACE="Finance Analytics" |
Per-space isolation check |
| 8 | make promote ... DEST_CATALOG_MAP=... |
Remap dev → prod catalogs |
| 9 | make apply ENV=prod |
Deploy prod governance |
| 10 | setup_test_data.py --verify-prod |
Assert prod ABAC governance |
| 11 | Teardown | Drop data + destroy Terraform (skipped if KEEP_DATA=1) |
Use make test-all instead for isolated, individually-reportable scenarios.
# Drop test data only (leave Terraform resources in place)
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
--teardown --teardown-prod
# Destroy Terraform resources only (leaves UC catalogs in place)
make destroy ENV=prod
make destroy ENV=dev
make destroy ENV=account
# Full cleanup — data + Terraform
python scripts/setup_test_data.py --auth-file envs/dev/auth.auto.tfvars \
--teardown --teardown-prod && \
make destroy ENV=prod && make destroy ENV=dev && make destroy ENV=accountNote: Always run
make destroybefore dropping UC catalogs. If catalogs are dropped first, thedeploy_masking_functionsdestroy provisioner will fail withCatalog not found. If this happens, remove the stuck resource withterraform state rm module.data_access.null_resource.deploy_masking_functionsin the affected env'sdata_access/directory, then re-runmake destroy.
Symptom:
⚠ Could not delete IAM role 'genie-test-uc-role-*': An error occurred (ExpiredToken)
when calling the ListRolePolicies operation: The security token included in the request is expired
Cause: You are using temporary AWS STS credentials (AWS_SESSION_TOKEN). The integration
test suite takes ~90 minutes; if the STS token lifetime is shorter than the total run time
(provision + test + teardown), the token expires before teardown can delete the IAM role.
Fix (preferred) — switch to long-lived IAM user keys:
Remove AWS_SESSION_TOKEN from scripts/account-admin.aws.env and replace AWS_ACCESS_KEY_ID /
AWS_SECRET_ACCESS_KEY with permanent IAM user credentials. Long-lived keys never expire and
work reliably across the full CI pipeline.
Fix (immediate) — refresh the token and re-run teardown:
# Export fresh credentials in your shell (overrides the stale file values)
export AWS_ACCESS_KEY_ID=ASIA...
export AWS_SECRET_ACCESS_KEY=...
export AWS_SESSION_TOKEN=...
# Re-run teardown — it will pick up the fresh env vars
python scripts/provision_test_env.py teardownFix (manual) — delete the role directly in AWS:
If you cannot obtain fresh credentials, delete the orphaned role in the AWS Console:
- Go to IAM → Roles
- Search for
genie-test-uc-role- - Select the role → Delete
The Databricks workspace and metastore are always removed by teardown regardless of whether the IAM step succeeds, so only the IAM role requires manual cleanup.
Prevention for make test-ci: If your organisation requires STS tokens, extend the session
duration to at least 4 hours before starting the pipeline:
# Request a longer-lived token (max depends on your IAM policy, up to 12 h for roles)
aws sts assume-role --role-arn arn:aws:iam::<account>:role/<role> \
--role-session-name genie-ci --duration-seconds 14400 # 4 hoursSymptom:
⚠ Could not delete storage account 'genietest*': The client secret has expired.
Cause: Azure AD client secrets have a finite lifetime (default 6 months or 2 years). If the secret expires between provisioning and teardown, Azure API calls fail.
Fix — generate a new client secret and re-run teardown:
- Azure Portal → Microsoft Entra ID → App registrations → your app → Certificates & secrets
- Generate a new client secret
- Update
AZURE_CLIENT_SECRETinscripts/account-admin.azure.env - Re-run teardown:
python scripts/provision_test_env.py teardown
Fix (manual) — delete resources directly in Azure Portal:
- Go to Resource Groups → your RG
- Search for
genietest— delete the storage account and access connector - Go to Microsoft Entra ID → Enterprise applications — remove any test managed identities
Prevention: Use a client secret with a longer expiry (2 years), or automate secret rotation in your CI pipeline.
If teardown fails completely, check the Databricks Account Console:
- Workspaces: Account Console → Workspaces → filter by name
genie-test-*→ Delete - Metastores: Account Console → Data → Unity Catalog → filter by name
genie-test-*→ Delete (check "Force delete") - Groups: Account Console → User Management → Groups → filter by name
genie-test-admins-*→ Delete
After manually cleaning up, remove the stale state file so subsequent runs start clean:
rm -f scripts/.test_env_state.json