Skip to content

Add column_layout: infer columns from whitespace (borderless tables)#377

Merged
JE-Chen merged 1 commit into
devfrom
feat/column-layout-batch
Jun 23, 2026
Merged

Add column_layout: infer columns from whitespace (borderless tables)#377
JE-Chen merged 1 commit into
devfrom
feat/column-layout-batch

Conversation

@JE-Chen

@JE-Chen JE-Chen commented Jun 23, 2026

Copy link
Copy Markdown
Member

摘要

新增 detect_borderless_table / column_gutters / assign_columns / vertical_projection — 以垂直留白投影偵測無框線表格的欄位。ocr/structure 只有在每一列儲存格左緣 x 都在容差內相符時才偵測得到表格,對 ragged / 無框線 / 右對齊數字欄、或缺格的列都會失敗;edge_lines.find_grid 需要框線,純留白表格沒有網格。

本功能用版面分析文獻常用的穩健方法——靠間隙:把 OCR 框投影到 x 軸(墨水密度剖面),讀出持續為空的垂直帶作為欄間隙(gutter),為每個框指派欄索引,依垂直間距分群成列,輸出無框線表格。純標準函式庫差分陣列投影(不需 numpy);重用 table_grid_fill 的框邊界讀取器。Qt-free。

五層

  • 核心:utils/column_layout/vertical_projectioncolumn_guttersassign_columnsdetect_borderless_table
  • Facade:由 je_auto_control 匯出 + __all__
  • Executor:AC_detect_borderless_table({found, table})/ AC_column_gutters({count, gutters})。
  • MCP:ac_detect_borderless_table / ac_column_gutters(read-only)。
  • Script Builder:Detect Borderless Table / Column Gutters (whitespace)(OCR)。
  • 文件:v165 EN + Zh + toctree。更新日誌:root EN + zh-TW + zh-CN。

測試

test/unit_test/headless/test_column_layout_batch.py — 投影含零值 gutter、gutter 偵測、欄位指派、端到端 2 欄 3 列表格、單欄非表格回 None、空回 None、wiring + facade。8 passed。ruff / bandit / radon / float-scan / Qt-free 全乾淨。

ocr/structure detects a table only when every row's cell-left-x matches, so it
fails on ragged / borderless / right-aligned columns; edge_lines.find_grid
needs ruling lines a whitespace table has none of. Find columns by the gaps:
project OCR boxes onto the x-axis, read the persistent empty vertical bands as
gutters, assign column indices, bucket rows by spacing, emit the table. Pure
difference-array projection, no numpy.
@codacy-production

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 58 complexity · 0 duplication

Metric Results
Complexity 58
Duplication 0

View in Codacy

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@JE-Chen JE-Chen merged commit a878e12 into dev Jun 23, 2026
22 of 23 checks passed
@JE-Chen JE-Chen deleted the feat/column-layout-batch branch June 23, 2026 19:44
@sonarqubecloud

Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant