docs: add Python data handling section by lennessyy · Pull Request #4378 · temporalio/documentation

lennessyy · 2026-04-02T01:21:39Z

Summary

Adds new Python data handling section with four pages: index, data conversion, data encryption, and large payload storage
Adds corresponding sidebar entry under the Python develop section
Split from docs: large payload storage; data handling refactor #4333 (PR 1 of 3)

Test plan

Verify sidebar renders correctly with the new "Data handling" category
Verify all four new pages load and render properly
Confirm no broken links in the new section

🤖 Generated with Claude Code

┆Attachments: EDU-6148 docs: add Python data handling section

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

vercel · 2026-04-02T01:21:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
temporal-documentation	Error		Apr 4, 2026 0:44am

github-actions · 2026-04-02T01:22:06Z

📖 Docs PR preview links

Develop
- Python
  - Data handling
External Storage

docs/develop/python/data-handling/data-conversion.mdx

drewhoskins-temporal · 2026-04-02T22:30:17Z

docs/develop/python/data-handling/index.mdx

+|                           | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) |
+| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **Purpose**               | Serialize application data to bytes                               | Transform encoded payloads (encrypt, compress)                | Offload large payloads to external store                               |
+| **Must be deterministic** | Yes                                                               | No                                                            | No                                                                     |


Not sure about this line. What are we trying to help the user with here? And @jmaeagle99 could you take a look?

For codec, I think we say that due to content hashing, codec should be deterministic for cases when the workflow task fails.

Ah, so this came from the TypeScript page: https://docs.temporal.io/develop/typescript/converters-and-encryption

When I was creating the table, I used the TS page, which had specific instructions on whether or not these components can access external services or employ non-deterministic modules. I think the main thing was to tell users they cannot do that in the payload converter, and thus cannot do any encryption there either.

If you think that line about codec is worth adding, we can change it. Otherwise, I'm okay with removing this row.

I find it a bit abstract as well. Not sure it's doing much good in such a concise form so prominently in the doc. But when I'm actually building a custom payload converter, I'd like to know that it should be deterministic/not access network.

I don't think having a "Must be deterministic? Yes/No" explains much and might just create more questions. I think that this information is more for the authors of converters, codecs, and storage drivers rather than the authors of workflows. Even if workflow authors have to think about determinism, just a different kind of determinism.

I think there are two aspects to think about when talking about determinism of these things (converters, codecs, and external storage):

For a given input, the output should be reproducible when the operation is successful.

Whether the operation is allowed to fail.

For example, payload converter cannot raise/throw/return errors. That is because these run within the workflow code execution. The workflow code can handle the errors and compensate with the another workflow command. This will cause workflow non-determinism on replay.

In Python, codecs can raise/throw/return errors. That is because they are executed before the workflow code executes and after the workflow code has yielded. In either case, the workflow code has no ability to handle the error. Raising/throwing/returning errors here will cause the WFT to be retried and has no impact on workflow determinism. The same is allowed for external storage.

docs/develop/python/data-handling/data-encryption.mdx

drewhoskins-temporal · 2026-04-02T22:35:24Z

docs/develop/python/data-handling/index.mdx

+/>
+
+Of these three layers, only the PayloadConverter is required. Temporal uses a default PayloadConverter that handles JSON
+serialization. The PayloadCodec and ExternalStorage layers are optional. You only need to customize these layers when


Should we link to encyclopedia for external storage somewhere?

docs/develop/python/data-handling/large-payload-storage.mdx

drewhoskins-temporal · 2026-04-02T22:48:49Z

docs/develop/python/data-handling/large-payload-storage.mdx

+data_converter = dataclasses.replace(
+    temporalio.converter.default(),
+    external_storage=ExternalStorage(
+        drivers=[MyStorageDriver()],


Use LocalDiskStorageDriver here ? (Is there a snipsync?)

Yes, I will snipsync all the code blcoks once all the content is approved.

docs/develop/python/data-handling/large-payload-storage.mdx

drewhoskins-temporal · 2026-04-02T22:50:32Z

docs/develop/python/data-handling/large-payload-storage.mdx

+## Configure payload size threshold
+
+You can configure the payload size threshold that triggers external storage. By default, payloads larger than 256 KiB
+are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, or set it to 1 to


Suggested change

are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, or set it to 1 to

are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, even setting it to 0 to

docs/develop/python/data-handling/large-payload-storage.mdx

drewhoskins-temporal · 2026-04-03T01:24:37Z

docs/develop/python/data-handling/index.mdx

+| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **Purpose**               | Serialize application data to bytes                               | Transform encoded payloads (encrypt, compress)                | Offload large payloads to external store                               |
+| **Must be deterministic** | Yes                                                               | No                                                            | No                                                                     |
+| **Default**               | JSON serialization                                                | None (passthrough)                                            | None (passthrough)                                                     |


I feel that "passthrough" is correct. It is whatever comes out of this "pipeline" of data handling is what is stored in workflow history and shouldn't be tied to the external storage step.

jmaeagle99

I've left a handful of feedback. Nothing blocking and could always be addressed later, if need be.

jmaeagle99 · 2026-04-03T02:07:29Z

docs/develop/python/data-handling/large-payload-storage.mdx

@@ -0,0 +1,252 @@
+---
+id: large-payload-storage


"external storage" and "large payload storage" are being used inconsistently throughout these docs. I think we should stick with one, namely external storage.

jmaeagle99 · 2026-04-03T02:44:36Z

docs/develop/python/data-handling/index.mdx

+|                           | [PayloadConverter](/develop/python/data-handling/data-conversion) | [PayloadCodec](/develop/python/data-handling/data-encryption) | [ExternalStorage](/develop/python/data-handling/large-payload-storage) |
+| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **Purpose**               | Serialize application data to bytes                               | Transform encoded payloads (encrypt, compress)                | Offload large payloads to external store                               |
+| **Must be deterministic** | Yes                                                               | No                                                            | No                                                                     |


I don't think having a "Must be deterministic? Yes/No" explains much and might just create more questions. I think that this information is more for the authors of converters, codecs, and storage drivers rather than the authors of workflows. Even if workflow authors have to think about determinism, just a different kind of determinism.

I think there are two aspects to think about when talking about determinism of these things (converters, codecs, and external storage):

For a given input, the output should be reproducible when the operation is successful.

Whether the operation is allowed to fail.

For example, payload converter cannot raise/throw/return errors. That is because these run within the workflow code execution. The workflow code can handle the errors and compensate with the another workflow command. This will cause workflow non-determinism on replay.

In Python, codecs can raise/throw/return errors. That is because they are executed before the workflow code executes and after the workflow code has yielded. In either case, the workflow code has no ability to handle the error. Raising/throwing/returning errors here will cause the WFT to be retried and has no impact on workflow determinism. The same is allowed for external storage.

jmaeagle99 · 2026-04-03T02:48:56Z

docs/develop/python/data-handling/index.mdx

+| ------------------------- | ----------------------------------------------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------- |
+| **Purpose**               | Serialize application data to bytes                               | Transform encoded payloads (encrypt, compress)                | Offload large payloads to external store                               |
+| **Must be deterministic** | Yes                                                               | No                                                            | No                                                                     |
+| **Default**               | JSON serialization                                                | None (passthrough)                                            | None (passthrough)                                                     |


I feel that "passthrough" is correct. It is whatever comes out of this "pipeline" of data handling is what is stored in workflow history and shouldn't be tied to the external storage step.

jmaeagle99 · 2026-04-03T02:53:24Z

docs/develop/python/data-handling/large-payload-storage.mdx

+
+### Prerequisites
+
+- An Amazon S3 bucket that you have write access to. Refer to [lifecycle management](/external-storage#lifecycle) to


Suggested change

- An Amazon S3 bucket that you have write access to. Refer to [lifecycle management](/external-storage#lifecycle) to

- An Amazon S3 bucket that you have read and write access to. Refer to [lifecycle management](/external-storage#lifecycle) to

If you want to be even more prescriptive, the identity needs at least s3:PutObject and s3:GetObject. It would be unlikely that you can get away with just s3:PutObject.

jmaeagle99 · 2026-04-03T02:55:03Z

docs/develop/python/data-handling/large-payload-storage.mdx

+
+- An Amazon S3 bucket that you have write access to. Refer to [lifecycle management](/external-storage#lifecycle) to
+  ensure that your payloads remain available for the entire lifetime of the Workflow.
+- The `aioboto3` library is installed and available.


The Python has an extra that installs this (and the types for the) library:

python -m pip install "temporalio[aioboto3]"

jmaeagle99 · 2026-04-03T03:00:54Z

docs/develop/python/data-handling/large-payload-storage.mdx

+        os.makedirs(self._store_dir, exist_ok=True)
+
+        prefix = self._store_dir
+        sc = context.serialization_context


FYI, this is changing in this PR. Haven't been able to merge it yet due to failures impacting the repository.

jmaeagle99 · 2026-04-03T03:02:41Z

docs/develop/python/data-handling/large-payload-storage.mdx

+Store payloads durably so that they survive process crashes and remain available for debugging and auditing after the
+Workflow completes. Refer to [lifecycle management](/external-storage#lifecycle) for retention requirements.
+
+The following example shows a complete custom driver implementation that uses local disk as the backing store:


Should we caveat that this example should not be used in production? It works for local development and demoing on one machine, but would not work for multi worker environments.

jmaeagle99 · 2026-04-03T03:05:47Z

docs/develop/python/data-handling/large-payload-storage.mdx

+and [Payload Codec](/develop/python/data-handling/data-encryption) before it reaches the driver. 
+See the [Components of a Data Converter](/dataconversion#data-converter-components) for more details.
+
+Return a `StorageDriverClaim` for each payload with enough information to retrieve it later. Structure your storage keys


I think how driver authors want to structure their keys is up to them. They could just use CAS and not use anything prefixing if they don't need sophisticated lifecycle management. So I think these are more recommended rather than something required.

jmaeagle99 · 2026-04-03T03:14:11Z

docs/develop/python/data-handling/large-payload-storage.mdx

+
+## Use multiple storage drivers
+
+When you have multiple drivers, such as for hot and cold storage tiers, pass a `driver_selector` function that chooses


I'm thinking we might be able to use a better example of why you'd have multiple drivers:

You're worker needs to support receiving workflow starts that were created by far clients that don't use the same driver that you prefer for your worker. Register that far client driver and your preferred driver, and use the selector to always pick your driver.

Maybe some of your workflows could be optimized with local caching (like Redis) instead of going to a far storage service; you'd trading lower latency for durability, but maybe that workflow type is allowed to be less durable. Register your Redis driver and S3 driver, and use the selector to pick based on workflow type (coming in this PR).

docs: add Python data handling section

3c1d108

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

lennessyy requested a review from a team as a code owner April 2, 2026 01:21

vercel bot had a problem deploying to Preview April 2, 2026 01:22 Failure

drewhoskins-temporal reviewed Apr 2, 2026

View reviewed changes

docs: address comments

11a60c8

lennessyy requested a review from drewhoskins-temporal April 3, 2026 00:38

vercel bot had a problem deploying to Preview April 3, 2026 00:38 Failure

drewhoskins-temporal approved these changes Apr 3, 2026

View reviewed changes

jmaeagle99 approved these changes Apr 3, 2026

View reviewed changes

lennessyy changed the base branch from main to large-payload-prerelease April 4, 2026 00:17

address feedback

e67ca99

vercel bot had a problem deploying to Preview April 4, 2026 00:44 Failure

	are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, or set it to 1 to
	are offloaded to external storage. You can adjust this with the `payload_size_threshold` parameter, even setting it to 0 to

	\| Default \| JSON serialization \| None (passthrough) \| None (passthrough) \|
	\| Default \| JSON serialization \| None (passthrough) \| None (all payloads will be stored in Workflow History) \|


		### Prerequisites

		- An Amazon S3 bucket that you have write access to. Refer to [lifecycle management](/external-storage#lifecycle) to

	- An Amazon S3 bucket that you have write access to. Refer to [lifecycle management](/external-storage#lifecycle) to
	- An Amazon S3 bucket that you have read and write access to. Refer to [lifecycle management](/external-storage#lifecycle) to


		## Use multiple storage drivers

		When you have multiple drivers, such as for hot and cold storage tiers, pass a `driver_selector` function that chooses

Conversation

lennessyy commented Apr 2, 2026 • edited by sync-by-unito bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

vercel bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📖 Docs PR preview links

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lennessyy Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmaeagle99 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lennessyy commented Apr 2, 2026 •

edited by sync-by-unito bot

Loading

vercel bot commented Apr 2, 2026 •

edited

Loading

github-actions bot commented Apr 2, 2026 •

edited

Loading

lennessyy Apr 3, 2026 •

edited

Loading

jmaeagle99 left a comment •

edited

Loading