[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) by MaxGekk · Pull Request #56199 · apache/spark

MaxGekk · 2026-05-29T09:36:07Z

What changes were proposed in this pull request?

This PR registers TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst) operations, following the TimeTypeOps / TimeTypeApiOps reference implementation.

Concretely:

Adds TypeOps (catalyst) and TypeApiOps (sql/api) implementations for both nanos types, sharing a common base.
Adds a dedicated MutableTimestampNanos holder so nanos columns in SpecificInternalRow avoid the MutableAny fallback.
Registers both types at the single registration points (TypeOps.apply() and TypeApiOps.apply()); existing call sites already delegate there, so no per-call-site edits are needed.
Keeps encoders and java.time conversion out of scope (SPARK-57033): getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER, matching today's behavior.

Class hierarchy (mirrors TimeTypeOps extends TimeTypeApiOps with TypeOps):

TimestampNTZNanosTypeOps extends TimestampNTZNanosTypeApiOps with TimestampNanosTypeOps (-> TypeOps)
TimestampLTZNanosTypeOps extends TimestampLTZNanosTypeApiOps with TimestampNanosTypeOps (-> TypeOps)

All registration is gated by spark.sql.types.framework.enabled. When the flag is false, behavior is identical to the existing legacy paths.

Why are the changes needed?

Part of SPARK-56822 (Timestamps with nanosecond precision). The logical types and physical row layer already exist (SPARK-56876, SPARK-56981), but the nanos types were wired only through legacy dispatch in PhysicalDataType, Literal, InternalRow, and codegen. This change centralizes that wiring behind the Types Framework, consistent with how TimeType is handled, reducing scattered pattern matching.

Does this PR introduce any user-facing change?

No. The types are internal/unstable and the framework path is gated by an internal feature flag; with the flag off the behavior is unchanged.

How was this patch tested?

Added TimestampNanosTypeOpsSuite covering, for p in {7, 8, 9} and both NTZ and LTZ: framework registration, physical type, default literal, codegen Java class, GenericInternalRow / SpecificInternalRow roundtrips, the dedicated MutableTimestampNanos holder, getEncoder parity, SQL-literal prefixes, and framework-off equivalence.
Ran related catalyst suites (all passing): TimestampNanosTypeOpsSuite, TimestampNanosRowSuite, TimestampNanosRowValuesSuite, LiteralExpressionSuite, CatalystTypeConvertersSuite, GenerateUnsafeProjectionSuite, DataTypeSuite, TypeUtilsSuite, DataTypeParserSuite, RowEncoderSuite, ExpressionEncoderSuite, RowJsonSuite, ToPrettyStringSuite.
dev/scalastyle passes.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…ramework (server-side) Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in the Types Framework by adding TypeOps (catalyst) and TypeApiOps (sql/api) implementations following the TimeTypeOps reference, plus a dedicated MutableTimestampNanos holder. All integration points (PhysicalDataType, Literal default, InternalRow writer/accessor, codegen Java class, SpecificInternalRow, CatalystTypeConverters) pick this up via the single registration points when spark.sql.types.framework.enabled is true; legacy paths are unchanged when the flag is off. Encoders and java.time conversion remain out of scope (SPARK-57033), so getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER to preserve parity.

…al for inline codegen Route Literal.doGenCode through TypeOps.getJavaLiteral before falling back to addReferenceObj, so that TypeOps-registered types (e.g. TimestampNTZNanosType, TimestampLTZNanosType) emit a self-contained inline expression in generated code rather than a heap-allocated reference object. Add a codegen test to TimestampNanosTypeOpsSuite that asserts the fromParts(...) call is inlined and no TimestampNanosVal reference is added to the CodegenContext. Co-authored-by: Isaac

…sSuite Replace startsWith prefix checks with full-string assertions so the interim debug format (TimestampNanosVal.toString) is explicit in the test. When a real fractional-second formatter lands (SPARK-57033), the test will fail visibly rather than silently passing a prefix-only check. Co-authored-by: Isaac

Verify that copy() correctly propagates isNull in all three states (null-initialized, value-set, null-after-value) and that mutating the original after copying does not affect the copy. Co-authored-by: Isaac

…Ops classes Co-authored-by: Isaac

…ramework (server-side) Wire TimestampNTZNanosType and TimestampLTZNanosType through the Types Framework for five additional integration points that were previously hardcoded: Gap 1 - InternalRow.getAccessor: add getScalaAccessor to TypeOps and route the read-side accessor through it, symmetric with the already-wired getRowWriter. Gap 2 - CodeGenerator.javaType: derive the Java type name from getJavaClass.getSimpleName, removing the PhysicalDataType hardcoded cases and making javaType consistent with the already-wired javaClass. Gap 3 - CodeGenerator.getValue: add getCodegenGetter(input, ordinal) to TypeOps and route the codegen row-read expression through it. Gap 4 - CodeGenerator.setColumn: add getCodegenSetter(row, ordinal, value) to TypeOps and route the codegen row-write expression through it, following the same primary/default split used by InternalRow.getWriter/getWriterDefault. Gap 5 - GenerateUnsafeProjection null-writes: add getCodegenNullWrite returning Option[String] (None = use caller's context-specific default; Some = use this typed-null write) to TypeOps and route both the row-field and array-element null paths through it. All five changes keep the legacy hardcoded cases as the getOrElse fallback so behavior is identical when spark.sql.types.framework.enabled is false. TimeTypeOps implements getScalaAccessor, getCodegenGetter, and getCodegenSetter (primitive Long paths); TimestampNanosTypeOps (trait) provides getCodegenNullWrite; the NTZ/LTZ case classes provide the remaining three methods. Co-authored-by: Isaac

Co-authored-by: Isaac

…Ops classes Use fully qualified names in Scaladoc [[...]] links in TimestampNanosTypeApiOps. The Scala compiler converts [[ShortName]] to {@link ShortName} in generated Java stubs; Javadoc then fails to resolve short names that are not in the same package. Switching to fully qualified paths fixes the three fatal Javadoc errors reported by CI. Co-authored-by: Isaac

Use fully qualified names in Scaladoc [[...]] links. The Scala compiler converts [[ShortName]] to {@link ShortName} in generated Java stubs and Javadoc cannot resolve short names outside the declaring package. Reformat with scalafmt after the line-length change. Co-authored-by: Isaac

MaxGekk · 2026-05-29T12:00:57Z

+  // Encoders are handled in a follow-up issue (SPARK-57033). Until then, report the type as
+  // unsupported with the same error as the legacy RowEncoder fallback to preserve parity.
+  override def getEncoder: AgnosticEncoder[_] =
+    throw new AnalysisException(


We will replace this after the PR #56158 is merged.

…eral in doGenCode Remove _: TimeType from the explicit Long-literal case in Literal.doGenCode so that TimeType falls through to the TypeOps branch and calls TimeTypeOps.getJavaLiteral, which returns the same "${value}L" string. This makes getJavaLiteral live on TimeTypeOps, consistent with all other TypeOps integration points. Co-authored-by: Isaac

…PrimitiveType Move the TypeOps check before the isPrimitiveType guard in both CodeGenerator.getValue and CodeGenerator.setColumn. TypeOps-registered types (e.g. TimeType) now reach getCodegenGetter/getCodegenSetter first; unregistered primitive types (LongType, IntegerType, etc.) fall into getOrElse and hit the isPrimitiveType branch as before. This makes TimeTypeOps.getCodegenGetter and getCodegenSetter live rather than dead code, completing the TypeOps coverage for TimeType codegen. Co-authored-by: Isaac

…to pass scalastyle Co-authored-by: Max Gekk <max.gekk@gmail.com>

MaxGekk · 2026-05-30T05:27:05Z

@davidm-db @dejankrak-db @stevomitric Could you review this PR, please.

MaxGekk changed the title ~~[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)~~ [WIP][SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) May 29, 2026

MaxGekk added 8 commits May 29, 2026 11:57

[SPARK-57101][SQL] Add direct test for MutableTimestampNanos.copy()

891736c

Verify that copy() correctly propagates isNull in all three states (null-initialized, value-set, null-after-value) and that mutating the original after copying does not affect the copy. Co-authored-by: Isaac

[SPARK-57101][SQL] Bump @SInCE tags from 4.2.0 to 4.3.0 in nanos Type…

8b2ac7b

…Ops classes Co-authored-by: Isaac

[SPARK-57101][SQL] Reformat TimestampNanosTypeApiOps.scala with scalafmt

52850ae

Co-authored-by: Isaac

MaxGekk commented May 29, 2026

View reviewed changes

MaxGekk added 3 commits May 29, 2026 14:34

[SPARK-57101][SQL] Fix import ordering in TimestampNanosTypeOpsSuite …

e932e50

…to pass scalastyle Co-authored-by: Max Gekk <max.gekk@gmail.com>

MaxGekk changed the title ~~[WIP][SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)~~ [SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) May 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199

[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199
MaxGekk wants to merge 12 commits into
apache:masterfrom
MaxGekk:nanos-type-framework

MaxGekk commented May 29, 2026 •

edited

Loading

Uh oh!

MaxGekk May 29, 2026

Uh oh!

MaxGekk commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGekk commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

MaxGekk May 29, 2026

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MaxGekk commented May 29, 2026 •

edited

Loading