Skip to content

[Improvement-18039][Metrics] Add missing metrics for workflow and task state transitions#18038

Open
sanjana2505006 wants to merge 1 commit into
apache:devfrom
sanjana2505006:Improvement-Metrics
Open

[Improvement-18039][Metrics] Add missing metrics for workflow and task state transitions#18038
sanjana2505006 wants to merge 1 commit into
apache:devfrom
sanjana2505006:Improvement-Metrics

Conversation

@sanjana2505006

@sanjana2505006 sanjana2505006 commented Mar 9, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR adds missing metrics to the Master module to improve visibility into workflow and task execution states. It addresses the gap where several metrics classes were defined but not utilized in the core execution flow.

Proposed Changes

  • Added workflow instance state tracking in AbstractWorkflowStateAction.
  • Added granular task state tracking (dispatch, success, failure, kill, retry) in AbstractTaskStateAction.
  • Added task timeout metric tracking in TaskTimeoutLifecycleEventHandler.
  • Added workflow instance generation duration measurement in WorkflowExecutionRunnableFactory.

Verification

  • Verified code style with mvn spotless:apply.
  • Performed final code review to ensure metrics integration follows existing patterns.

This PR closes #18039

@github-actions github-actions Bot removed the test label Mar 9, 2026
@sanjana2505006 sanjana2505006 changed the title [Improvement-Metrics][master] Add missing metrics for workflow and task state transitions [Improvement][master] Add missing metrics for workflow and task state transitions Mar 9, 2026
@sanjana2505006 sanjana2505006 changed the title [Improvement][master] Add missing metrics for workflow and task state transitions [Improvement] Add missing metrics for workflow and task state transitions Mar 9, 2026
@sanjana2505006 sanjana2505006 changed the title [Improvement] Add missing metrics for workflow and task state transitions [Improvement][master]Add missing metrics for workflow and task state transitions Mar 9, 2026
@sanjana2505006 sanjana2505006 changed the title [Improvement][master]Add missing metrics for workflow and task state transitions [Improvement][master] Add missing metrics for workflow and task state transitions Mar 9, 2026

@SbloodyS SbloodyS left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow the pull request notice and create an issue first. @sanjana2505006

@sanjana2505006 sanjana2505006 changed the title [Improvement][master] Add missing metrics for workflow and task state transitions [Improvement-18039][Metrics] Add missing metrics for workflow and task state transitions Mar 9, 2026
@sanjana2505006

Copy link
Copy Markdown
Contributor Author

Thank you @SbloodyS , for the guidance! I've created the issue #18039 and updated the PR title and commit to reference it.

@sanjana2505006 sanjana2505006 requested a review from SbloodyS March 9, 2026 06:06
@SbloodyS SbloodyS added the improvement make more easy to user or prompt friendly label Mar 9, 2026
@SbloodyS SbloodyS added this to the 3.4.2 milestone Mar 9, 2026
@SbloodyS

Copy link
Copy Markdown
Member

Please check the failed CI. @sanjana2505006

@sanjana2505006

sanjana2505006 commented Mar 10, 2026

Copy link
Copy Markdown
Contributor Author

Sure, @SbloodyS, I have an exam today. I’ll take a look at the failed CI once I’m done 😃...

@sanjana2505006

Copy link
Copy Markdown
Contributor Author

Hello @SbloodyS, I've updated the PR to address the CI failures. Please have a look when you have time and share your feedback. Thank you!

@github-actions github-actions Bot added the test label Mar 11, 2026
@SbloodyS

Copy link
Copy Markdown
Member

UT still failed. @sanjana2505006

@ruanwenjun ruanwenjun left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR.
It might be better to add event metrics first.

WorkflowEventBusFireWorker#doFireSingleWorkflowEventBus

We may need to provide a public method to handle changes in instance state, and then use listeners or similar mechanisms to update the metrics.
So it’s not suitable to add this directly at the moment, but I’ll be doing that soon.

sanjana2505006 added a commit to sanjana2505006/dolphinscheduler that referenced this pull request Apr 2, 2026
…sage

- Add back empty line in AbstractTaskStateAction.java line 199 as requested
- Add incTaskInstanceByLifecycleEvent method to TaskMetrics to use TaskLifecycleEventType directly
- Update WorkflowEventBusFireWorker to use new method instead of string comparisons
- Add comprehensive tests for new lifecycle event method

Addresses feedback from SbloodyS and ruanwenjun on PR apache#18038
@sonarqubecloud

sonarqubecloud Bot commented Apr 2, 2026

Copy link
Copy Markdown

Quality Gate Failed Quality Gate failed

Failed conditions
0.0% Coverage on New Code (required ≥ 60%)

See analysis details on SonarQube Cloud

@sanjana2505006 sanjana2505006 requested a review from SbloodyS April 2, 2026 16:32
@SbloodyS SbloodyS modified the milestones: 3.4.2, 3.5.0 Jun 3, 2026
@sanjana2505006

sanjana2505006 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@SbloodyS I've updated the pr according to your feedback , please have a look
and let me know what else to change

…k state transitions

Record workflow instance state changes in AbstractWorkflowStateAction,
workflow generation duration in WorkflowExecutionFactory, and task lifecycle
metrics via TaskLifecycleEventType in WorkflowEventBusFireWorker. Add unit
tests for TaskMetrics and WorkflowInstanceMetrics.

Co-authored-by: Cursor <cursoragent@cursor.com>
Comment on lines +76 to +81
public void incTaskInstanceByState(final String state) {
if (taskInstanceCounters.get(state) == null) {
return;
}
taskInstanceCounters.get(state).increment();
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should store TaskLifecycleEventType.name() directly. You can refer to org.apache.dolphinscheduler.server.master.metrics.WorkflowInstanceMetrics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend improvement make more easy to user or prompt friendly test

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Improvement][Metrics] Add missing metrics to Master and Worker modules

3 participants