Skip to content

Feature/implement inspectorat so crawler#367

Merged
vbuch merged 3 commits intovbuch:mainfrom
IvelinaKostadinova:feature/implement-inspectorat-so-crawler
Apr 7, 2026
Merged

Feature/implement inspectorat so crawler#367
vbuch merged 3 commits intovbuch:mainfrom
IvelinaKostadinova:feature/implement-inspectorat-so-crawler

Conversation

@IvelinaKostadinova
Copy link
Copy Markdown
Contributor

No description provided.

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 6, 2026

@IvelinaKostadinova is attempting to deploy a commit to the Valery Buchinsky's projects Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Столичен инспекторат” source end-to-end (shared source catalog + ingest crawler + deployment config), so the ingest pipeline can crawl and persist inspectorat-so.org news and the web UI can display it as a supported source.

Changes:

  • Register new source metadata (inspectorat-so-org) and expose it via @oboapp/shared public exports.
  • Add an Inspectorat crawler (selectors, extractors, date parsing, crawl entrypoint) with unit tests.
  • Wire the crawler into ingest Cloud Run jobs via Terraform and set trust defaults; add source logo asset.

Reviewed changes

Copilot reviewed 11 out of 14 changed files in this pull request and generated no comments.

Show a summary per file
File Description
web/public/sources/inspectorat-so-org.png Adds logo asset for the new source.
shared/src/sources.ts Registers inspectorat-so-org in the shared sources list.
shared/src/index.ts Re-exports sources from the shared package entrypoint.
ingest/terraform/main.tf Adds Cloud Run job config for the new crawler.
ingest/lib/source-trust.ts Adds trust/geometry defaults for inspectorat-so-org.
ingest/crawlers/inspectorat-so-org/types.ts Defines crawler document typing and sourceType.
ingest/crawlers/inspectorat-so-org/tsconfig.json Crawler-local TS config consistent with other crawlers.
ingest/crawlers/inspectorat-so-org/selectors.ts CSS selectors for index/post scraping.
ingest/crawlers/inspectorat-so-org/extractors.ts Link + post detail extraction using shared utilities.
ingest/crawlers/inspectorat-so-org/index.ts Crawler entrypoint, crawl wiring, and date parsing logic.
ingest/crawlers/inspectorat-so-org/index.test.ts Tests crawl wiring and date parsing year inference.
ingest/crawlers/inspectorat-so-org/extractors.test.ts Tests link filtering/deduping and post detail extraction wrapper.

@vbuch vbuch linked an issue Apr 7, 2026 that may be closed by this pull request
@vbuch vbuch merged commit 4ba6236 into vbuch:main Apr 7, 2026
8 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement a crawler for inspectorat-so

3 participants