Skip to content

automatic time-skipping#740

Open
feiyang3cat wants to merge 5 commits intomasterfrom
api-time-skipping-v2
Open

automatic time-skipping#740
feiyang3cat wants to merge 5 commits intomasterfrom
api-time-skipping-v2

Conversation

@feiyang3cat
Copy link
Copy Markdown

What changed?
new feature: automatic time-skipping configuration and new event type added

Breaking changes
new event type added but right now no workflow history will have this new event type

Server PR
not expect to break server, no new rpc-handler and new fields are all optional

@feiyang3cat feiyang3cat requested review from a team as code owners March 23, 2026 22:25
@feiyang3cat feiyang3cat changed the title POC automatic timeskipping automatic time-skipping Mar 23, 2026
message TimeSkippingConfig {

// If set, enables automatic time-skipping for this workflow execution.
// It can also be disabled by setting this field to false.
Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sushisource Hey! I've cleaned this pr and ready for review.

And this config may need special attention since it is changed significantly from previous version Chad proposed.
(1)deleted most of the fine-grained control (like max timer-firing count, max duration to skip) from the previous version. The reason is it is ambiguous when complicated configuration is propagated to child-workflows.
(2) I have also changed the nested config to one single layer because this configuration is only used for automatic time-skipping, manual time-skipping will have a separate request and response payload
(3) we allow disabling time-skipping for in-flight workflows since it is not hard for the server to have this feature, and it will be a nice-to-have feature same as the SDK TestServer.

I think we can consider wrapping up this pr early, and put manual time-skipping in a separate pr maybe.
Pls feel free to disagree, happy to refine.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks totally reasonable - but I also don't see any reason to rush the PR while work is in progress unless we intend to ship a version without manual skipping first. If we don't, let's use the opportunity to develop it and decide if we want to make any changes to this config that relate to manual timeskipping.

message TimeSkippingConfig {

// If set, enables automatic time-skipping for this workflow execution.
// It can also be disabled by setting this field to false.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks totally reasonable - but I also don't see any reason to rush the PR while work is in progress unless we intend to ship a version without manual skipping first. If we don't, let's use the opportunity to develop it and decide if we want to make any changes to this config that relate to manual timeskipping.

@feiyang3cat feiyang3cat changed the title automatic time-skipping automatic time-skipping (don't merge) Mar 24, 2026
@feiyang3cat feiyang3cat force-pushed the api-time-skipping-v2 branch 2 times, most recently from 4aeceb6 to 094e2c5 Compare March 25, 2026 06:40
@feiyang3cat feiyang3cat force-pushed the api-time-skipping-v2 branch 6 times, most recently from 9721ff7 to c686992 Compare March 26, 2026 03:53
// An event that indicates that the previously paused workflow execution has been unpaused.
EVENT_TYPE_WORKFLOW_EXECUTION_UNPAUSED = 59;
// An event that indicates that some duration was skipped for this workflow execution.
EVENT_TYPE_WORKFLOW_EXECUTION_TIME_SKIPPED = 60;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You know this just occurred to me and I don't think we've talked about it - if time skipping is enabled on a workflow with an old SDK version processing it, it's going to break on these new event types.

That's fine I think, we will tell people they need to upgrade, but we need to make sure that is clear in the docs & GTM material we create.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get it. the docs & GTM material mainly refer to API comments (here) and https://docs.temporal.io/ ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we always set worker_may_ignore for TimeSkippingEvents, it won't break workers right? because in the demo, I set worker_may_ignore = true and used current SDK which has no time-skipping feature and didnt break it

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for sdk version old enough, it doesn't even know about the worker_may_ignore flag, hence the comment. but yeah this is a new feature so we can make it clear re. the sdk version required.

Comment on lines +977 to +979
// when true, it means the automatic time-skipping was enabled with a bound
// and now the bound is reached and the time-skipping is disabled automatically.
bool bound_reached_and_time_skipping_disabled = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// when true, it means the automatic time-skipping was enabled with a bound
// and now the bound is reached and the time-skipping is disabled automatically.
bool bound_reached_and_time_skipping_disabled = 2;
// when true, automatic time-skipping was enabled with a bound
// and now that bound has been reached and time-skipping is disabled automatically.
bool disabled_after_bound = 2;

This is my best idea for a better name. The comment makes things clear, so I don't think we need a super wordy field name.

@feiyang3cat feiyang3cat force-pushed the api-time-skipping-v2 branch from c686992 to 837c381 Compare March 27, 2026 19:51
@feiyang3cat feiyang3cat force-pushed the api-time-skipping-v2 branch from 837c381 to 012f767 Compare March 27, 2026 19:59
// i.e. the existing time-skipping conf will be preserved.
TimeSkippingConfig time_skipping_config = 3;
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Sushisource @yycptt updated this part especially the bound part according to discussions. With the new fine-grained bound, I think the 1h-signal can be easily implemented by setting max_elapsed_duration to 1hour or max_target_time to some specific time in 1h hour.

For server, a EVENT_TYPE_WORKFLOW_EXECUTION_TIME_SKIPPED will be created to indicate time-skipping is disabled by bound even if no duration skipped.

For SDK can poll this event, but I think there are two things to consider right now

  1. we may need to add a version/ID for the time-skipping config if the user resets this config multiple times so the poller can differentiate EVENT_TYPE_WORKFLOW_EXECUTION_TIME_SKIPPED easily
  2. right now there is one get-api to read history events, it may be easier if we build a dedicated polling api (but this one may be added later?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for me the confusing part is those two options didn't say it will, like, create a new timer at the specified bound. and I am under the impression that time skipping will only skip to existing timers within the bound.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it only skip timers, but we allow this to be turned off automatically without finding anything to skip
do you have recommendations how to change ?

@feiyang3cat feiyang3cat changed the title automatic time-skipping (don't merge) automatic time-skipping Mar 30, 2026
string request_id = 3;
}

// Attributes for an event indicating that virtual time was advanced for a workflow execution,
Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ highlight the change after review meeting

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add to the comment here that worker_may_ignore should always be set true for this event

bool worker_may_ignore = 300;

Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added but I remember an early proposal was to break the workers and remind them to upgrade. So I guess our strategy changed?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant we should add this to the comment here, not on the field. Otherwise that field will have an ever-increasing list in the comment.

And I think that was before we realized workers don't need to handle this at all. Even old SDKs will be able to successfully replay histories with time-skipping, which could be useful if users want to try this without needing to upgrade SDKs.

Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I located here correctly this time?

// This bound is not propagated to child workflows.
// It is recommended to set disable_propagation to true
// and configure TimeSkippingConfig explicitly in child workflows.
oneof bound {
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ highlight the change after review meeting

Copy link
Copy Markdown
Member

@yycptt yycptt Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

max_elapsed_duration may be a bit hard to use due to real time part of it.

For the timer + signal case we talked about, which option should caller use? max_elapsed_duration?
sorry I see you answered this question in the other comment.

string request_id = 3;
}

// Attributes for an event indicating that virtual time was advanced for a workflow execution,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add to the comment here that worker_may_ignore should always be set true for this event

bool worker_may_ignore = 300;

@feiyang3cat feiyang3cat enabled auto-merge (squash) April 1, 2026 18:04
// An event that indicates that the previously paused workflow execution has been unpaused.
EVENT_TYPE_WORKFLOW_EXECUTION_UNPAUSED = 59;
// An event that indicates that some duration was skipped for this workflow execution.
EVENT_TYPE_WORKFLOW_EXECUTION_TIME_SKIPPED = 60;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for sdk version old enough, it doesn't even know about the worker_may_ignore flag, hence the comment. but yeah this is a new feature so we can make it clear re. the sdk version required.

message WorkflowExecutionTimeSkippedEventAttributes {

// The virtual time after time skipping is applied.
google.protobuf.Timestamp target_time = 1;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp of this event will be the workflow (virtual) time before or after the skip? I guess it's before? and the difference between the event time and the target time is the duration skipped?

Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm this is after, right now at sometime for example 4:00 virtual time (which is 3:00 in real time) we are going to skip 1hour to 5:00 and the target_time is 5:00

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to this one but the name is changed

Image

bool disabled_after_bound = 2;

// The wall-clock time when the time-skipping event happened.
google.protobuf.Timestamp wall_clock_time = 3;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand this field v.s. the event time of the event. I mean I understand that this is always wal clock and event time can be a virtual time, but want to see if you any use case where this field will help make it easier.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was discussed in the review-meeting only for better visibility for debugging purpose. for example, I am an user, and I am debugging something with my business system built on temporal, now I need to align temporal history (which is virtual time) with my own log (which is real time)

definitely, it will be better but heavier if we add all wallclock to all following events

Comment on lines +586 to +587
// If set, the enabled field is not propagated to child workflows.
bool disable_propagation = 2;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should clarify if this will propagate via continue as new or reset as well.

Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking we make it simple now: we don't propagate bound and propagate enable for all these three cases. And this feature is not supported for nexus operation so no propagation for now. make sense?


message DeleteActivityExecutionResponse {
}
} No newline at end of file
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add new line

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emm it seems not to be a new line >?

// Configuration for automatic time skipping during a workflow execution.
// When enabled, virtual time advances automatically whenever there are
// no in-flight activities, child workflows, or Nexus operations.
message TimeSkippingConfig {
Copy link
Copy Markdown
Member

@yycptt yycptt Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to add this in to child workflow command?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I don't imagine we will need to set childWorkflow time-skipping config thru command when users use SDK testing env. And in general, I think we may keep this as complex as the current version and only consider adding more when SDK really needs it.

// Configuration for automatic time skipping during a workflow execution.
// When enabled, virtual time advances automatically whenever there are
// no in-flight activities, child workflows, or Nexus operations.
message TimeSkippingConfig {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Wonder if we need to call this AutoTimeSkippingConfig in case we add manual skipping support in the future.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am kind of yes and no for different reasons. Guts-feeling is we will not get there that adding manual skipping.
what do you think @Sushisource ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I’m leaning against renaming it. In my mind, 'TimeSkipping' implies an automatic process. Let’s keep it simple and only add a 'Manual' prefix when we need to distinguish the manual version?

// This bound is not propagated to child workflows.
// It is recommended to set disable_propagation to true
// and configure TimeSkippingConfig explicitly in child workflows.
oneof bound {
Copy link
Copy Markdown
Member

@yycptt yycptt Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great!

max_elapsed_duration may be a bit hard to use due to real time part of it.

For the timer + signal case we talked about, which option should caller use? max_elapsed_duration?
sorry I see you answered this question in the other comment.


// Configuration for automatic time skipping during a workflow execution.
// When enabled, virtual time advances automatically whenever there are
// no in-flight activities, child workflows, or Nexus operations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to block auto skipping when there's pending signal/cancel external workflow operations in the transfer queue as well.

Copy link
Copy Markdown
Author

@feiyang3cat feiyang3cat Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these are good scenarios for testing, I think I can record these in testing planning.

For desgin,
(1) last time I learned signal/update, they come with workflow tasks so covered by workflow tasks ?
(2) for cancel task in transfer queue, the ms state becomes completed no? and I think our poc already check this state together with isAutoSkippable
pls feel free to correct me if I misunderstand how the sys works right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants