Skip to content

fix: missing log in multitask training#5382

Open
iProzd wants to merge 2 commits intodeepmodeling:masterfrom
iProzd:0407_fix_mul_log
Open

fix: missing log in multitask training#5382
iProzd wants to merge 2 commits intodeepmodeling:masterfrom
iProzd:0407_fix_mul_log

Conversation

@iProzd
Copy link
Copy Markdown
Member

@iProzd iProzd commented Apr 7, 2026

fix missing log during multitask training, resulting from #4850 .

Summary by CodeRabbit

  • Refactor

    • Cleaner multi-task training logs: task-specific training metrics are always shown, and validation metrics are emitted only when valid results are present to avoid incomplete validation output.
  • Tests

    • Added post-training checks to ensure the training log contains expected model-probability columns and that data rows match the header column count.

@iProzd iProzd requested review from OutisLi, Copilot and njzjz April 7, 2026 08:26
@github-actions github-actions bot added the Python label Apr 7, 2026
@iProzd iProzd requested a review from anyangml April 7, 2026 08:26
@dosubot dosubot bot added the bug label Apr 7, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 7, 2026

📝 Walkthrough

Walkthrough

Refactored multi-task logging in deepmd/pt/train/training.py to always emit per-task training metrics when rank == 0 and to emit per-task validation metrics only when valid_results[_key] is truthy; added post-training assertions in source/tests/pt/test_multitask.py to validate lcurve.out headers and row counts against configured model probabilities.

Changes

Cohort / File(s) Summary
Multi-task training logging restructure
deepmd/pt/train/training.py
Restructured the rank == 0 logging block: training metrics for each task are logged unconditionally under the rank guard; validation metrics logging is moved into a nested if valid_results[_key]: block.
Multitask test additions
source/tests/pt/test_multitask.py
Added assertions after training to parse lcurve.out: extract header columns (skipping comments), ensure every model_prob key appears in at least one column, and verify the first data row has the same column count as the header.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: missing log in multitask training' directly and specifically describes the main change—addressing missing log output in multitask training, which aligns with the changeset that fixes logging behavior by restructuring the validation metrics logging in the training code.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 1285 to +1306
train_results[_key] = log_loss_train(
loss, more_loss, _task_key=_key
)
valid_results[_key] = log_loss_valid(_task_key=_key)
if self.rank == 0:
log.info(
format_training_message_per_task(
batch=display_step_id,
task_name=_key + "_trn",
rmse=train_results[_key],
learning_rate=cur_lr,
)
)
if valid_results[_key]:
valid_results[_key] = log_loss_valid(_task_key=_key)
if self.rank == 0:
log.info(
format_training_message_per_task(
batch=display_step_id,
task_name=_key + "_val",
rmse=valid_results[_key],
learning_rate=None,
task_name=_key + "_trn",
rmse=train_results[_key],
learning_rate=cur_lr,
)
)
if valid_results[_key]:
log.info(
format_training_message_per_task(
batch=display_step_id,
task_name=_key + "_val",
rmse=valid_results[_key],
learning_rate=None,
)
)
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The indentation change moves valid_results[_key] = log_loss_valid(...) (and both log messages) into the same inner block as train_results[_key] = .... Previously, validation collection/logging happened at a broader scope; if that outer condition isn’t always entered (e.g., loss not computed/filtered for a task), this will skip validation results and logs for that task, potentially reintroducing the ‘missing log’ behavior. Consider restoring valid_results assignment and the rank==0 logging block to the prior scope (keeping only the intended logging fix), so per-task validation/logging isn’t coupled to that inner conditional.

Copilot uses AI. Check for mistakes.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 80.38%. Comparing base (adf5b83) to head (2a88470).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5382      +/-   ##
==========================================
- Coverage   80.40%   80.38%   -0.02%     
==========================================
  Files         812      812              
  Lines       83476    83477       +1     
  Branches     4050     4050              
==========================================
- Hits        67115    67106       -9     
- Misses      15139    15147       +8     
- Partials     1222     1224       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@anyangml
Copy link
Copy Markdown
Collaborator

anyangml commented Apr 7, 2026

maybe add a simple UT to count number of columns in lcurve.out?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@source/tests/pt/test_multitask.py`:
- Line 76: Rename the ambiguous comprehension variable `l` in the list
comprehension inside test_multitask.py to a clearer name (e.g., `line`) so ruff
E741 is avoided; update the expression `data_lines = [l for l in lines if not
l.startswith("#")]` to use the new identifier (`line`) and ensure any adjacent
references in the same scope match the rename.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: efcb3235-4e4e-4d57-8f12-b610de5b78a7

📥 Commits

Reviewing files that changed from the base of the PR and between 739bd1a and 2a88470.

📒 Files selected for processing (1)
  • source/tests/pt/test_multitask.py

len(cols_for_model), 0, f"No lcurve columns found for {mk}"
)
# data line column count should match header
data_lines = [l for l in lines if not l.startswith("#")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Rename ambiguous variable l to pass CI.

The variable name l is flagged by ruff (E741) as ambiguous since it can be confused with 1 or I. As per coding guidelines, ruff check . must pass before committing.

Proposed fix
-        data_lines = [l for l in lines if not l.startswith("#")]
+        data_lines = [line for line in lines if not line.startswith("#")]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data_lines = [l for l in lines if not l.startswith("#")]
data_lines = [line for line in lines if not line.startswith("#")]
🧰 Tools
🪛 Ruff (0.15.9)

[error] 76-76: Ambiguous variable name: l

(E741)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@source/tests/pt/test_multitask.py` at line 76, Rename the ambiguous
comprehension variable `l` in the list comprehension inside test_multitask.py to
a clearer name (e.g., `line`) so ruff E741 is avoided; update the expression
`data_lines = [l for l in lines if not l.startswith("#")]` to use the new
identifier (`line`) and ensure any adjacent references in the same scope match
the rename.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants