Skip to content

Add sync-repos workflow to auto-discover repos with arch-docs#15

Merged
jonathanpopham merged 1 commit intomainfrom
claude/issue-8-20260303-2011
Mar 3, 2026
Merged

Add sync-repos workflow to auto-discover repos with arch-docs#15
jonathanpopham merged 1 commit intomainfrom
claude/issue-8-20260303-2011

Conversation

@jonathanpopham
Copy link
Contributor

@jonathanpopham jonathanpopham commented Mar 3, 2026

Summary

  • Adds .github/workflows/sync-repos.yml that runs every 6 hours and on workflow_dispatch to auto-discover org repos that have an arch-docs.yml workflow
  • Scans all repos via gh api --paginate, checks for .github/workflows/arch-docs.yml, and appends missing entries to repos.yaml using yq
  • Forks go to Community category (categories[1]); org-native repos go to Supermodel Open Source (categories[0])
  • Maps GitHub language field to pill/pill_class values
  • Deduplication via grep on the existing repos.yaml
  • Commits and pushes via supermodel-bot only when repos were added
  • Uses BOT_TOKEN (with repo + workflow scopes) to ensure the push triggers build-index.yml; falls back to GITHUB_TOKEN with a comment explaining the limitation
  • Uses concurrency group add-repo with cancel-in-progress: false

Test plan

  • Run workflow manually via workflow_dispatch
  • Verify missing repos (e.g. next.js, django, bun) are added to the correct repos.yaml category
  • Verify fork repos get upstream field set
  • Verify no duplicate entries are created on a second run
  • Verify push triggers build-index.yml to rebuild the site

Closes #8

Generated with Claude Code

Summary by CodeRabbit

  • Chores
    • Added automated workflow to synchronize organization repositories into the repository catalog on a 6-hour schedule
    • Repositories are automatically categorized as community projects or official releases based on fork status
    • Repository metadata including descriptions, programming languages, and upstream sources are automatically tracked and updated

Runs every 6 hours and on workflow_dispatch. Scans all org repos,
checks for .github/workflows/arch-docs.yml, and appends missing
repos to repos.yaml using yq. Forks go to Community (index 1),
org-native repos go to Supermodel Open Source (index 0). Language
is mapped to pill/pill_class. Commits and pushes via supermodel-bot
if any repos were added.

Co-authored-by: Jonathan Popham <jonathanpopham@users.noreply.github.com>
Co-Authored-By: Claude <noreply@anthropic.com>
@coderabbitai
Copy link

coderabbitai bot commented Mar 3, 2026

Walkthrough

A new GitHub Actions workflow is added that automatically discovers repositories with arch-docs workflows and syncs them into repos.yaml. The workflow runs every 6 hours and on manual trigger, fetching organization repositories, filtering for those with the required workflow file, and committing additions with appropriate metadata mappings.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow
.github/workflows/sync-repos.yml
New workflow that scans org repos via gh cli, checks for arch-docs.yml presence, extracts metadata (name, fork status, description, language), maps language to visual pill styles, and auto-appends missing repos to repos.yaml in appropriate categories (Supermodel Open Source or Community). Includes language-to-pill mapping with special DevOps handling for Shell/HCL. Commits changes with bot identity when additions occur.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🤖 Hidden repos scattered in the digital night,
A bot awakens every six hours—what a sight!
Filtering, mapping, sorting with care,
repos.yaml grows complete, updated fair! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding a new workflow to automatically discover and sync repositories with arch-docs.
Linked Issues check ✅ Passed The workflow implementation addresses all core requirements: scheduled runs, GitHub CLI pagination, arch-docs detection, duplicate prevention via grep, language-to-pill mapping, fork/native categorization, upstream assignment, and commit conditions.
Out of Scope Changes check ✅ Passed The single file added (.github/workflows/sync-repos.yml) is entirely within scope and directly implements the requirements from issue #8.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch claude/issue-8-20260303-2011

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
.github/workflows/sync-repos.yml (1)

89-90: Nice-to-have: replace magic category indexes with named constants.

Using raw 0/1 works, but named constants make future repos.yaml maintenance less brittle and easier to read.

Based on learnings: Maintain repos.yaml as the source of truth for all listed repos (categories, names, descriptions, pills).

Also applies to: 103-104

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/sync-repos.yml around lines 89 - 90, Replace raw numeric
category indexes with named workflow variables and use them where the workflow
sets repo env vars (e.g., in the REPO_NAME assignment). Define constants like
COMMUNITY_CATEGORY_INDEX and CORE_CATEGORY_INDEX (or CATEGORY_INDEX_<NAME>) near
the top of the job, initialize them to the appropriate numeric values (or derive
them from repos.yaml if you already parse it), then replace all occurrences of
literal 0/1 in the category-related env assignments with these variables (also
update the other occurrences mentioned in the review). Ensure you reference
REPO_NAME and any category-related env keys so callers use the named constants
instead of magic numbers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/sync-repos.yml:
- Around line 58-60: The grep-based duplicate check using the pattern "^\s+name:
${REPO_NAME}$" fails because entries are stored as list items like "- name:
..."; replace the grep step with a yq query that checks repos.yaml for any list
entry whose name equals the REPO_NAME variable and only continue/skip when such
an entry exists. Update the check that currently invokes grep (refer to the grep
command and the REPO_NAME variable operating against repos.yaml) to use yq's
YAML-aware selection so existing repos are detected reliably before attempting
to add them.
- Around line 51-52: The PARENT extraction from repo_json is empty because the
org list API doesn't include a parent field; update the logic that sets PARENT
(the PARENT=$(echo "$repo_json" | jq -r '.parent.full_name // ""') assignment)
to detect when .parent is missing and, for forked repos, perform an extra API
request to GET /repos/{owner}/{repo} using the repo's full_name from repo_json
to retrieve .parent.full_name, then assign that value to PARENT; ensure the same
corrected PARENT value is used when building the YAML entries for forks (the
block that adds fork repos to the YAML) so upstream is populated correctly.

---

Nitpick comments:
In @.github/workflows/sync-repos.yml:
- Around line 89-90: Replace raw numeric category indexes with named workflow
variables and use them where the workflow sets repo env vars (e.g., in the
REPO_NAME assignment). Define constants like COMMUNITY_CATEGORY_INDEX and
CORE_CATEGORY_INDEX (or CATEGORY_INDEX_<NAME>) near the top of the job,
initialize them to the appropriate numeric values (or derive them from
repos.yaml if you already parse it), then replace all occurrences of literal 0/1
in the category-related env assignments with these variables (also update the
other occurrences mentioned in the review). Ensure you reference REPO_NAME and
any category-related env keys so callers use the named constants instead of
magic numbers.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f746348 and eaf05bc.

📒 Files selected for processing (1)
  • .github/workflows/sync-repos.yml

Comment on lines +51 to +52
PARENT=$(echo "$repo_json" | jq -r '.parent.full_name // ""')

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub REST API GET /orgs/{org}/repos parent field fork repositories

💡 Result:

GET /orgs/{org}/repos returns repository objects that include "fork": true/false, but it does not include the fork relationship details (parent / source) in that listing response (see the example response for “List organization repositories”). [1]

To get the parent of a fork, you must fetch the specific repo with GET /repos/{owner}/{repo}that endpoint documents that parent and source objects are present when the repository is a fork. [2]

Notes:

  • You can filter the org listing to just forks via GET /orgs/{org}/repos?type=forks, but you still won’t get parent inline; you’ll need per-repo GET /repos/{owner}/{repo} calls. [1][2]

Sources
[1] GitHub Docs – “List organization repositories” (example response shows fork but no parent/source) (docs.github.com)
[2] GitHub Docs – “Get a repository” (parent/source present when fork) (docs.github.com)

Citations:


🏁 Script executed:

# First, let's look at the workflow file to understand the context
cat -n .github/workflows/sync-repos.yml | head -120

Repository: supermodeltools/supermodeltools.github.io

Length of output: 5229


Fork upstream will be empty with current API.

The code fetches repos from GET /orgs/supermodeltools/repos (line 42–44), which doesn't include the parent object for forks—that field only exists when you call GET /repos/{owner}/{repo} for a specific repo. So when line 51 tries to extract .parent.full_name, it gets null, and the jq fallback // "" leaves PARENT empty. Then lines 88–101 add fork repos to the YAML with an empty upstream field, breaking your acceptance criteria.

Here's the fix:

Suggested fix
-            PARENT=$(echo "$repo_json" | jq -r '.parent.full_name // ""')
+            PARENT=""
+            if [ "$IS_FORK" = "true" ]; then
+              PARENT=$(gh api "repos/supermodeltools/${REPO_NAME}" --jq '.parent.full_name // ""')
+            fi

This way, for fork repos, you make an extra API call to get the real parent info before populating the YAML.

Also applies to: 88–101

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/sync-repos.yml around lines 51 - 52, The PARENT extraction
from repo_json is empty because the org list API doesn't include a parent field;
update the logic that sets PARENT (the PARENT=$(echo "$repo_json" | jq -r
'.parent.full_name // ""') assignment) to detect when .parent is missing and,
for forked repos, perform an extra API request to GET /repos/{owner}/{repo}
using the repo's full_name from repo_json to retrieve .parent.full_name, then
assign that value to PARENT; ensure the same corrected PARENT value is used when
building the YAML entries for forks (the block that adds fork repos to the YAML)
so upstream is populated correctly.

Comment on lines +58 to +60
# Skip if already listed in repos.yaml
if grep -qE "^\s+name: ${REPO_NAME}$" repos.yaml; then
continue
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find and list key files
fd -t f "repos\.yaml|sync-repos\.yml"

Repository: supermodeltools/supermodeltools.github.io

Length of output: 95


🏁 Script executed:

# Get the structure and content of repos.yaml
if [ -f "repos.yaml" ]; then
  echo "=== repos.yaml exists ==="
  wc -l repos.yaml
  echo "=== First 30 lines ==="
  head -30 repos.yaml
else
  echo "repos.yaml not found"
  find . -name "*.yaml" -o -name "*.yml" | head -20
fi

Repository: supermodeltools/supermodeltools.github.io

Length of output: 1364


🏁 Script executed:

# Examine the sync-repos.yml workflow file around line 59
if [ -f ".github/workflows/sync-repos.yml" ]; then
  echo "=== sync-repos.yml content (lines 45-70) ==="
  sed -n '45,70p' .github/workflows/sync-repos.yml
else
  echo "sync-repos.yml not found"
fi

Repository: supermodeltools/supermodeltools.github.io

Length of output: 1131


🏁 Script executed:

# Test grep -E with \s behavior
echo "Testing grep -E with \\s:"
cat > /tmp/test_repos.txt << 'EOF'
  - name: repo1
  - name: repo2
    name: repo3
      name: repo4
EOF

echo "=== Content ==="
cat -n /tmp/test_repos.txt

echo -e "\n=== grep -qE with \\s pattern ==="
if grep -qE "^\s+name: repo1$" /tmp/test_repos.txt; then
  echo "✓ Pattern matched repo1 (POSIX grep with \\s)"
else
  echo "✗ Pattern did NOT match repo1"
fi

echo -e "\n=== grep with [[:space:]] pattern ==="
if grep -qE "^[[:space:]]+name:[[:space:]]+repo1$" /tmp/test_repos.txt; then
  echo "✓ Pattern matched with [[:space:]]"
else
  echo "✗ Pattern did NOT match"
fi

echo -e "\n=== Test \s with actual whitespace ==="
echo "  name: repo1" | grep -qE "^\s+name: repo1$" && echo "✓ Works" || echo "✗ Fails"

Repository: supermodeltools/supermodeltools.github.io

Length of output: 394


🏁 Script executed:

# Check what grep version is available
grep --version | head -1
echo "Testing \\s support:"
echo "  test" | grep -E "^\s+test$" 2>&1 && echo "\\s supported" || echo "\\s not supported"

Repository: supermodeltools/supermodeltools.github.io

Length of output: 146


🏁 Script executed:

# Show exact format around one repo entry with line numbers and visible whitespace
sed -n '5,10p' repos.yaml | cat -A
echo "---"
sed -n '5,10p' repos.yaml | od -c | head -20

Repository: supermodeltools/supermodeltools.github.io

Length of output: 467


🏁 Script executed:

# Test the actual grep pattern against real repos.yaml
echo "=== Testing original pattern against actual repos.yaml ==="
REPO_NAME="mcp"
if grep -qE "^\s+name: ${REPO_NAME}$" repos.yaml; then
  echo "✓ Pattern MATCHED"
else
  echo "✗ Pattern FAILED (as expected)"
fi

echo -e "\n=== Testing corrected grep pattern ==="
if grep -qE "^\s+-\s+name: ${REPO_NAME}$" repos.yaml; then
  echo "✓ Corrected pattern MATCHED"
else
  echo "✗ Corrected pattern FAILED"
fi

echo -e "\n=== Testing yq alternative ==="
REPO_NAME="mcp"
if REPO_NAME="$REPO_NAME" yq -e '.categories[].repos[] | select(.name == env(REPO_NAME))' repos.yaml >/dev/null 2>&1; then
  echo "✓ yq approach WORKS"
else
  echo "✗ yq approach FAILED"
fi

Repository: supermodeltools/supermodeltools.github.io

Length of output: 295


🏁 Script executed:

# Verify yq is available and show what it returns
if command -v yq &> /dev/null; then
  echo "yq version:"
  yq --version
  echo -e "\n=== Sample output from yq query ==="
  yq '.categories[].repos[] | select(.name == "mcp")' repos.yaml
else
  echo "yq not found"
fi

Repository: supermodeltools/supermodeltools.github.io

Length of output: 97


🏁 Script executed:

# Check the workflow environment to see what tools are available
grep -A 5 "runs-on:" .github/workflows/sync-repos.yml
echo "---"
# Look for any tool installations in the workflow
grep -E "(apt-get|brew|install|yq)" .github/workflows/sync-repos.yml || echo "No explicit tool installations found"

Repository: supermodeltools/supermodeltools.github.io

Length of output: 632


🏁 Script executed:

# Show the full relevant section of the workflow to understand context
sed -n '1,100p' .github/workflows/sync-repos.yml | tail -60

Repository: supermodeltools/supermodeltools.github.io

Length of output: 2584


Fix duplicate detection: regex pattern won't match actual YAML structure.

Line 59 uses grep -qE "^\s+name: ${REPO_NAME}$" to check if a repo already exists in repos.yaml. But repos in that file are formatted as - name: ... (with the YAML list marker), so the pattern never matches. This means existing repos won't be skipped, and every workflow run will try to add duplicates.

The suggested fix using yq works well here since it's already installed and used throughout this workflow for YAML manipulation:

-            if grep -qE "^\s+name: ${REPO_NAME}$" repos.yaml; then
+            if REPO_NAME="$REPO_NAME" yq -e '.categories[].repos[] | select(.name == env(REPO_NAME))' repos.yaml >/dev/null 2>&1; then
               continue
             fi

This properly queries the YAML structure and will correctly detect existing repos across all categories.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Skip if already listed in repos.yaml
if grep -qE "^\s+name: ${REPO_NAME}$" repos.yaml; then
continue
# Skip if already listed in repos.yaml
if REPO_NAME="$REPO_NAME" yq -e '.categories[].repos[] | select(.name == env(REPO_NAME))' repos.yaml >/dev/null 2>&1; then
continue
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/sync-repos.yml around lines 58 - 60, The grep-based
duplicate check using the pattern "^\s+name: ${REPO_NAME}$" fails because
entries are stored as list items like "- name: ..."; replace the grep step with
a yq query that checks repos.yaml for any list entry whose name equals the
REPO_NAME variable and only continue/skip when such an entry exists. Update the
check that currently invokes grep (refer to the grep command and the REPO_NAME
variable operating against repos.yaml) to use yq's YAML-aware selection so
existing repos are detected reliably before attempting to add them.

@jonathanpopham jonathanpopham merged commit cdaeb6b into main Mar 3, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add sync-repos workflow to auto-discover repos with arch-docs

1 participant