Skip to content

[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199

Open
MaxGekk wants to merge 12 commits into
apache:masterfrom
MaxGekk:nanos-type-framework
Open

[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199
MaxGekk wants to merge 12 commits into
apache:masterfrom
MaxGekk:nanos-type-framework

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented May 29, 2026

What changes were proposed in this pull request?

This PR registers TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst) operations, following the TimeTypeOps / TimeTypeApiOps reference implementation.

Concretely:

  • Adds TypeOps (catalyst) and TypeApiOps (sql/api) implementations for both nanos types, sharing a common base.
  • Adds a dedicated MutableTimestampNanos holder so nanos columns in SpecificInternalRow avoid the MutableAny fallback.
  • Registers both types at the single registration points (TypeOps.apply() and TypeApiOps.apply()); existing call sites already delegate there, so no per-call-site edits are needed.
  • Keeps encoders and java.time conversion out of scope (SPARK-57033): getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER, matching today's behavior.

Class hierarchy (mirrors TimeTypeOps extends TimeTypeApiOps with TypeOps):

TimestampNTZNanosTypeOps extends TimestampNTZNanosTypeApiOps with TimestampNanosTypeOps (-> TypeOps)
TimestampLTZNanosTypeOps extends TimestampLTZNanosTypeApiOps with TimestampNanosTypeOps (-> TypeOps)

All registration is gated by spark.sql.types.framework.enabled. When the flag is false, behavior is identical to the existing legacy paths.

Why are the changes needed?

Part of SPARK-56822 (Timestamps with nanosecond precision). The logical types and physical row layer already exist (SPARK-56876, SPARK-56981), but the nanos types were wired only through legacy dispatch in PhysicalDataType, Literal, InternalRow, and codegen. This change centralizes that wiring behind the Types Framework, consistent with how TimeType is handled, reducing scattered pattern matching.

Does this PR introduce any user-facing change?

No. The types are internal/unstable and the framework path is gated by an internal feature flag; with the flag off the behavior is unchanged.

How was this patch tested?

  • Added TimestampNanosTypeOpsSuite covering, for p in {7, 8, 9} and both NTZ and LTZ: framework registration, physical type, default literal, codegen Java class, GenericInternalRow / SpecificInternalRow roundtrips, the dedicated MutableTimestampNanos holder, getEncoder parity, SQL-literal prefixes, and framework-off equivalence.
  • Ran related catalyst suites (all passing): TimestampNanosTypeOpsSuite, TimestampNanosRowSuite, TimestampNanosRowValuesSuite, LiteralExpressionSuite, CatalystTypeConvertersSuite, GenerateUnsafeProjectionSuite, DataTypeSuite, TypeUtilsSuite, DataTypeParserSuite, RowEncoderSuite, ExpressionEncoderSuite, RowJsonSuite, ToPrettyStringSuite.
  • dev/scalastyle passes.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…ramework (server-side)

Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in
the Types Framework by adding TypeOps (catalyst) and TypeApiOps (sql/api)
implementations following the TimeTypeOps reference, plus a dedicated
MutableTimestampNanos holder. All integration points (PhysicalDataType, Literal
default, InternalRow writer/accessor, codegen Java class, SpecificInternalRow,
CatalystTypeConverters) pick this up via the single registration points when
spark.sql.types.framework.enabled is true; legacy paths are unchanged when the
flag is off. Encoders and java.time conversion remain out of scope (SPARK-57033),
so getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER to preserve parity.
@MaxGekk MaxGekk changed the title [SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) [WIP][SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) May 29, 2026
MaxGekk added 8 commits May 29, 2026 11:57
…al for inline codegen

Route Literal.doGenCode through TypeOps.getJavaLiteral before falling back to
addReferenceObj, so that TypeOps-registered types (e.g. TimestampNTZNanosType,
TimestampLTZNanosType) emit a self-contained inline expression in generated code
rather than a heap-allocated reference object. Add a codegen test to
TimestampNanosTypeOpsSuite that asserts the fromParts(...) call is inlined and no
TimestampNanosVal reference is added to the CodegenContext.

Co-authored-by: Isaac
…sSuite

Replace startsWith prefix checks with full-string assertions so the interim
debug format (TimestampNanosVal.toString) is explicit in the test. When a real
fractional-second formatter lands (SPARK-57033), the test will fail visibly
rather than silently passing a prefix-only check.

Co-authored-by: Isaac
Verify that copy() correctly propagates isNull in all three states
(null-initialized, value-set, null-after-value) and that mutating the
original after copying does not affect the copy.

Co-authored-by: Isaac
…ramework (server-side)

Wire TimestampNTZNanosType and TimestampLTZNanosType through the Types Framework for
five additional integration points that were previously hardcoded:

  Gap 1 - InternalRow.getAccessor: add getScalaAccessor to TypeOps and route the
  read-side accessor through it, symmetric with the already-wired getRowWriter.

  Gap 2 - CodeGenerator.javaType: derive the Java type name from
  getJavaClass.getSimpleName, removing the PhysicalDataType hardcoded cases and
  making javaType consistent with the already-wired javaClass.

  Gap 3 - CodeGenerator.getValue: add getCodegenGetter(input, ordinal) to TypeOps
  and route the codegen row-read expression through it.

  Gap 4 - CodeGenerator.setColumn: add getCodegenSetter(row, ordinal, value) to
  TypeOps and route the codegen row-write expression through it, following the same
  primary/default split used by InternalRow.getWriter/getWriterDefault.

  Gap 5 - GenerateUnsafeProjection null-writes: add getCodegenNullWrite returning
  Option[String] (None = use caller's context-specific default; Some = use this
  typed-null write) to TypeOps and route both the row-field and array-element null
  paths through it.

All five changes keep the legacy hardcoded cases as the getOrElse fallback so
behavior is identical when spark.sql.types.framework.enabled is false.
TimeTypeOps implements getScalaAccessor, getCodegenGetter, and getCodegenSetter
(primitive Long paths); TimestampNanosTypeOps (trait) provides getCodegenNullWrite;
the NTZ/LTZ case classes provide the remaining three methods.

Co-authored-by: Isaac
…Ops classes

Use fully qualified names in Scaladoc [[...]] links in TimestampNanosTypeApiOps.
The Scala compiler converts [[ShortName]] to {@link ShortName} in generated Java
stubs; Javadoc then fails to resolve short names that are not in the same package.
Switching to fully qualified paths fixes the three fatal Javadoc errors reported
by CI.

Co-authored-by: Isaac
Use fully qualified names in Scaladoc [[...]] links. The Scala compiler
converts [[ShortName]] to {@link ShortName} in generated Java stubs and
Javadoc cannot resolve short names outside the declaring package. Reformat
with scalafmt after the line-length change.

Co-authored-by: Isaac
// Encoders are handled in a follow-up issue (SPARK-57033). Until then, report the type as
// unsupported with the same error as the legacy RowEncoder fallback to preserve parity.
override def getEncoder: AgnosticEncoder[_] =
throw new AnalysisException(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will replace this after the PR #56158 is merged.

MaxGekk added 3 commits May 29, 2026 14:34
…eral in doGenCode

Remove _: TimeType from the explicit Long-literal case in Literal.doGenCode so
that TimeType falls through to the TypeOps branch and calls
TimeTypeOps.getJavaLiteral, which returns the same "${value}L" string. This
makes getJavaLiteral live on TimeTypeOps, consistent with all other TypeOps
integration points.

Co-authored-by: Isaac
…PrimitiveType

Move the TypeOps check before the isPrimitiveType guard in both
CodeGenerator.getValue and CodeGenerator.setColumn. TypeOps-registered types
(e.g. TimeType) now reach getCodegenGetter/getCodegenSetter first; unregistered
primitive types (LongType, IntegerType, etc.) fall into getOrElse and hit the
isPrimitiveType branch as before. This makes TimeTypeOps.getCodegenGetter and
getCodegenSetter live rather than dead code, completing the TypeOps coverage for
TimeType codegen.

Co-authored-by: Isaac
…to pass scalastyle

Co-authored-by: Max Gekk <max.gekk@gmail.com>
@MaxGekk MaxGekk changed the title [WIP][SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) [SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side) May 29, 2026
@MaxGekk
Copy link
Copy Markdown
Member Author

MaxGekk commented May 30, 2026

@davidm-db @dejankrak-db @stevomitric Could you review this PR, please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant