[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199
Open
MaxGekk wants to merge 12 commits into
Open
[SPARK-57101][SQL] Register nanosecond timestamp types in the Types Framework (server-side)#56199MaxGekk wants to merge 12 commits into
MaxGekk wants to merge 12 commits into
Conversation
…ramework (server-side) Register TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) in the Types Framework by adding TypeOps (catalyst) and TypeApiOps (sql/api) implementations following the TimeTypeOps reference, plus a dedicated MutableTimestampNanos holder. All integration points (PhysicalDataType, Literal default, InternalRow writer/accessor, codegen Java class, SpecificInternalRow, CatalystTypeConverters) pick this up via the single registration points when spark.sql.types.framework.enabled is true; legacy paths are unchanged when the flag is off. Encoders and java.time conversion remain out of scope (SPARK-57033), so getEncoder reports UNSUPPORTED_DATA_TYPE_FOR_ENCODER to preserve parity.
…al for inline codegen Route Literal.doGenCode through TypeOps.getJavaLiteral before falling back to addReferenceObj, so that TypeOps-registered types (e.g. TimestampNTZNanosType, TimestampLTZNanosType) emit a self-contained inline expression in generated code rather than a heap-allocated reference object. Add a codegen test to TimestampNanosTypeOpsSuite that asserts the fromParts(...) call is inlined and no TimestampNanosVal reference is added to the CodegenContext. Co-authored-by: Isaac
…sSuite Replace startsWith prefix checks with full-string assertions so the interim debug format (TimestampNanosVal.toString) is explicit in the test. When a real fractional-second formatter lands (SPARK-57033), the test will fail visibly rather than silently passing a prefix-only check. Co-authored-by: Isaac
Verify that copy() correctly propagates isNull in all three states (null-initialized, value-set, null-after-value) and that mutating the original after copying does not affect the copy. Co-authored-by: Isaac
…Ops classes Co-authored-by: Isaac
…ramework (server-side) Wire TimestampNTZNanosType and TimestampLTZNanosType through the Types Framework for five additional integration points that were previously hardcoded: Gap 1 - InternalRow.getAccessor: add getScalaAccessor to TypeOps and route the read-side accessor through it, symmetric with the already-wired getRowWriter. Gap 2 - CodeGenerator.javaType: derive the Java type name from getJavaClass.getSimpleName, removing the PhysicalDataType hardcoded cases and making javaType consistent with the already-wired javaClass. Gap 3 - CodeGenerator.getValue: add getCodegenGetter(input, ordinal) to TypeOps and route the codegen row-read expression through it. Gap 4 - CodeGenerator.setColumn: add getCodegenSetter(row, ordinal, value) to TypeOps and route the codegen row-write expression through it, following the same primary/default split used by InternalRow.getWriter/getWriterDefault. Gap 5 - GenerateUnsafeProjection null-writes: add getCodegenNullWrite returning Option[String] (None = use caller's context-specific default; Some = use this typed-null write) to TypeOps and route both the row-field and array-element null paths through it. All five changes keep the legacy hardcoded cases as the getOrElse fallback so behavior is identical when spark.sql.types.framework.enabled is false. TimeTypeOps implements getScalaAccessor, getCodegenGetter, and getCodegenSetter (primitive Long paths); TimestampNanosTypeOps (trait) provides getCodegenNullWrite; the NTZ/LTZ case classes provide the remaining three methods. Co-authored-by: Isaac
Co-authored-by: Isaac
…Ops classes
Use fully qualified names in Scaladoc [[...]] links in TimestampNanosTypeApiOps.
The Scala compiler converts [[ShortName]] to {@link ShortName} in generated Java
stubs; Javadoc then fails to resolve short names that are not in the same package.
Switching to fully qualified paths fixes the three fatal Javadoc errors reported
by CI.
Co-authored-by: Isaac
Use fully qualified names in Scaladoc [[...]] links. The Scala compiler
converts [[ShortName]] to {@link ShortName} in generated Java stubs and
Javadoc cannot resolve short names outside the declaring package. Reformat
with scalafmt after the line-length change.
Co-authored-by: Isaac
MaxGekk
commented
May 29, 2026
| // Encoders are handled in a follow-up issue (SPARK-57033). Until then, report the type as | ||
| // unsupported with the same error as the legacy RowEncoder fallback to preserve parity. | ||
| override def getEncoder: AgnosticEncoder[_] = | ||
| throw new AnalysisException( |
Member
Author
There was a problem hiding this comment.
We will replace this after the PR #56158 is merged.
…eral in doGenCode
Remove _: TimeType from the explicit Long-literal case in Literal.doGenCode so
that TimeType falls through to the TypeOps branch and calls
TimeTypeOps.getJavaLiteral, which returns the same "${value}L" string. This
makes getJavaLiteral live on TimeTypeOps, consistent with all other TypeOps
integration points.
Co-authored-by: Isaac
…PrimitiveType Move the TypeOps check before the isPrimitiveType guard in both CodeGenerator.getValue and CodeGenerator.setColumn. TypeOps-registered types (e.g. TimeType) now reach getCodegenGetter/getCodegenSetter first; unregistered primitive types (LongType, IntegerType, etc.) fall into getOrElse and hit the isPrimitiveType branch as before. This makes TimeTypeOps.getCodegenGetter and getCodegenSetter live rather than dead code, completing the TypeOps coverage for TimeType codegen. Co-authored-by: Isaac
…to pass scalastyle Co-authored-by: Max Gekk <max.gekk@gmail.com>
Member
Author
|
@davidm-db @dejankrak-db @stevomitric Could you review this PR, please. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This PR registers
TimestampNTZNanosType(p)andTimestampLTZNanosType(p)(p in [7, 9]) in the Spark SQL Types Framework (SPARK-53504) for server-side (catalyst) operations, following theTimeTypeOps/TimeTypeApiOpsreference implementation.Concretely:
TypeOps(catalyst) andTypeApiOps(sql/api) implementations for both nanos types, sharing a common base.MutableTimestampNanosholder so nanos columns inSpecificInternalRowavoid theMutableAnyfallback.TypeOps.apply()andTypeApiOps.apply()); existing call sites already delegate there, so no per-call-site edits are needed.getEncoderreportsUNSUPPORTED_DATA_TYPE_FOR_ENCODER, matching today's behavior.Class hierarchy (mirrors
TimeTypeOps extends TimeTypeApiOps with TypeOps):All registration is gated by
spark.sql.types.framework.enabled. When the flag is false, behavior is identical to the existing legacy paths.Why are the changes needed?
Part of SPARK-56822 (Timestamps with nanosecond precision). The logical types and physical row layer already exist (SPARK-56876, SPARK-56981), but the nanos types were wired only through legacy dispatch in
PhysicalDataType,Literal,InternalRow, and codegen. This change centralizes that wiring behind the Types Framework, consistent with howTimeTypeis handled, reducing scattered pattern matching.Does this PR introduce any user-facing change?
No. The types are internal/unstable and the framework path is gated by an internal feature flag; with the flag off the behavior is unchanged.
How was this patch tested?
TimestampNanosTypeOpsSuitecovering, for p in {7, 8, 9} and both NTZ and LTZ: framework registration, physical type, default literal, codegen Java class,GenericInternalRow/SpecificInternalRowroundtrips, the dedicatedMutableTimestampNanosholder,getEncoderparity, SQL-literal prefixes, and framework-off equivalence.TimestampNanosTypeOpsSuite,TimestampNanosRowSuite,TimestampNanosRowValuesSuite,LiteralExpressionSuite,CatalystTypeConvertersSuite,GenerateUnsafeProjectionSuite,DataTypeSuite,TypeUtilsSuite,DataTypeParserSuite,RowEncoderSuite,ExpressionEncoderSuite,RowJsonSuite,ToPrettyStringSuite.dev/scalastylepasses.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor (Claude Opus 4.8)