Only increment stats when the worker acknowledged the test by ChrisBr · Pull Request #373 · Shopify/ci-queue

ChrisBr · 2026-02-08T20:47:11Z

Only increment error stats when the worker acknowledged the test otherwise we end up with an incorrect counter.

ChrisBr · 2026-02-08T20:47:51Z

ruby/lib/minitest/queue/runner.rb

          attributes.merge(type: type)
        end.compact
-        File.open(queue_config.warnings_file, 'w') do |f|
+        File.open(queue_config.warnings_file, 'a') do |f|


We call this multiple times so we need to use append.

kangze-jia · 2026-02-09T21:20:00Z

ruby/lib/ci/queue/redis/build_record.rb

-          @queue.increment_test_failed if acknowledged == 1
+          if acknowledged
+            # if another worker already acknowledged the test, we don't need to update the global stats or increment the test failed count
+            record_stats(stats)


This may not be enough.

Considering this scenario:

Worker A runs test T1 (reclaim), fails, calls record_error. acknowledged = 0 → we don’t call record_stats.

BuildStatusRecorder has already done self.failures += 1 (or self.errors += 1) for T1 in record() before calling build.record_error.

Worker A then runs test T2, fails, calls record_error. acknowledged = 1 → we do call record_stats(stats).
stats is built from current self.failures / self.errors, which still include T1. So we send e.g. failures: 2 (T1 + T2) for worker A.

Redis ends up with worker A’s failures = 2, but only one of those (T2) was a first ack; T1 was a duplicate and was never supposed to be counted in stats.

So we over-count whenever a worker has both a duplicate ack and a later first ack: the duplicate stays in the in-memory counter and is included in the next record_stats.

Yes this could be a problem. In that case you need to flip the logging around which makes the diff / refactor a lot more complicated.

Something like this https://github.com/Shopify/ci-queue/compare/cbruckmayer/fix-logging-of-tests?expand=1

Sure, I can continue the work on your branch.

I would probably keep it simple and remove the ignored state and introduce it in a later dedicated PR.

kangze-jia · 2026-02-09T21:22:15Z

ruby/lib/ci/queue/redis/build_record.rb

        end

        def record_error(id, payload, stats: nil)
-          acknowledged, _ = redis.pipelined do |pipeline|


Current code uses one pipeline: acknowledge + record_stats together. The new code goes more round trip which may cause performance regression.

Yes, if you want to keep it pipelined you need to inline it into the lua script which is a lot more complicated as you cannot rely on the result of the previous command in a pipeline. I don't think it's a significant performance regression so I prefer simplicity here.

Make sense. We can start simple first

ChrisBr · 2026-02-10T17:30:09Z

ruby/lib/ci/queue/redis/build_record.rb

        def record_error(id, payload, stats: nil)
-          acknowledged, _ = redis.pipelined do |pipeline|
+          # Run acknowledge first so we know whether we're the first to ack
+          acknowledged = redis.pipelined do |pipeline|


If there is only one command, you don't need a pipeline.

Yeah, you are right. This lua script is a single round-trip even though it does several Redis calls inside.

I will remove pipeline in here.

ChrisBr · 2026-02-10T17:30:16Z

ruby/lib/ci/queue/redis/build_record.rb

            @queue.acknowledge(id, error: payload, pipeline: pipeline)
-            record_stats(stats, pipeline: pipeline)
-          end
+          end.first


Yeah, it is not necessary. Removing it along with pipeline.

ChrisBr · 2026-02-10T17:30:31Z

ruby/lib/ci/queue/redis/build_record.rb

-          nil
+          if acknowledged
+            # We were the first to ack; another worker already ack'd would get falsy from SADD
+            redis.pipelined do |pipeline|


If there is only one command, we don't need a pipeline.

inside record_stats method (this is not lua script), it does multiple commands (hset and expire). So we can get benefit with pipeline. I prefer to keep it in here.

ChrisBr · 2026-02-10T17:32:03Z

ruby/lib/ci/queue/redis/build_record.rb

+            @queue.increment_test_failed
+          end
+          # Return so caller can roll back local counter when not acknowledged
+          !!acknowledged


Do we need to update the other implementations of record_error too?

Don't think so.

lib/ci/queue/build_record.rb (base BuildRecord): Static/local queue, single process, no Redis, no distributed workers.

lib/ci/queue/redis/grind_record.rb (GrindRecord): Grind mode (run tests many times). Different model from the main queue. It just lpushes onto an error list and updates stats. No “processed” set or SADD-based first-ack semantics.

What I mean is: Are they used with the reporter because they all return nil now which means we would always decrement the error count.

ChrisBr · 2026-02-10T20:52:30Z

ruby/lib/ci/queue/redis/build_record.rb

+            @queue.increment_test_failed
+          end
+          # Return so caller can roll back local counter when not acknowledged
+          !!acknowledged


!! is this needed?

!! does not change whether the value is “success” or “failure”. It only turns that into a real boolean.

acknowledge.lua already returns a boolean, so the !! is not a must. However, this can make sure the future change to acknowledge.lua will be safe.

ChrisBr · 2026-02-11T09:21:35Z

ruby/lib/minitest/queue/build_status_recorder.rb


      private

+      def stat_delta(counter, test)


Why is this needed?

…place - Record stats only when worker acknowledges; duplicate acks do not increment - Redis: record_stats_delta (HINCRBY); record_success returns true when ack'd or replaced - Stat correction when success replaces failure; real assertion count (test.assertions) in delta - Test helper: Requeue before Skip when both set; test_aggregation and integration expectations updated - Remove [stats] debug logging from Redis BuildRecord; test_redis_reporter assertions = 8

Only increment counts when we acknowledge

928f0b2

ChrisBr commented Feb 8, 2026

View reviewed changes

ChrisBr mentioned this pull request Feb 8, 2026

Only increment counts when we acknowledge #372

Closed

ChrisBr requested a review from thadcraft-shopify February 8, 2026 21:03

Fix appending to warning file

14cf9e8

ChrisBr force-pushed the cbruckmayer/only-increment-on-ack-v2 branch from a9c8024 to 14cf9e8 Compare February 9, 2026 10:37

thadcraft-shopify approved these changes Feb 9, 2026

View reviewed changes

kangze-jia reviewed Feb 9, 2026

View reviewed changes

ChrisBr commented Feb 10, 2026

View reviewed changes

ChrisBr commented Feb 11, 2026

View reviewed changes

ruby/lib/minitest/queue/build_status_recorder.rb Outdated

private

def stat_delta(counter, test)

Copy link

Contributor Author

ChrisBr Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

kangze-jia force-pushed the cbruckmayer/only-increment-on-ack-v2 branch from 0fff968 to 93edc70 Compare February 14, 2026 02:54

kangze-jia force-pushed the cbruckmayer/only-increment-on-ack-v2 branch from b5df285 to b1ea42b Compare February 14, 2026 03:31

Conversation

ChrisBr commented Feb 8, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants