[SPARK-56255][PYTHON][CONNECT] Make spark.read.csv accept DataFrame input by Yicong-Huang · Pull Request #55274 · apache/spark

Yicong-Huang · 2026-04-09T07:07:49Z

What changes were proposed in this pull request?

This PR adds support for passing a DataFrame containing CSV strings directly to spark.read.csv(), following the same pattern established by #55097 (SPARK-56253) for spark.read.json().

Why are the changes needed?

Adding DataFrame support to csv() makes the API consistent with json() and enables Connect-compatible CSV parsing without sc.parallelize().

Does this PR introduce any user-facing change?

Yes. spark.read.csv() now accepts a DataFrame with a single string column as input, in addition to the existing str, list, and RDD inputs.

csv_df = spark.createDataFrame([("Alice,25",), ("Bob,30",)], schema="value STRING")
spark.read.csv(csv_df, schema="name STRING, age INT").show()
# +-----+---+
# | name|age|
# +-----+---+
# |Alice| 25|
# |  Bob| 30|
# +-----+---+

How was this patch tested?

Added 10 new test cases.

Was this patch authored or co-authored using generative AI tooling?

No.

HyukjinKwon · 2026-04-12T22:16:50Z

+
+        from pyspark.sql.connect.dataframe import DataFrame
+
+        if isinstance(path, DataFrame):


I think it's safer to add this if dataframe check first with return to avoid any behaviour changes. e.g., we might support sth else than list.

Make sense. moved the check up.

HyukjinKwon · 2026-04-12T22:17:36Z

+      df: DataFrame): DataFrame = {
+    val classicReader = reader.asInstanceOf[ClassicDataFrameReader]
+    val fields = df.schema.fields
+    if (fields.isEmpty) {


Shall we fix the same as 638ca41?

Done. Applied the same pattern to reject multi-column input.

…ti-column input

hvanhovell · 2026-04-13T19:44:26Z

+  def csvFromDataFrame(
+      reader: DataFrameReader,
+      df: DataFrame): DataFrame = {
+    val classicReader = reader.asInstanceOf[ClassicDataFrameReader]


This the same code as above. Create a util for this?

Makes sense. Extracted a helper method.

…alidation

HyukjinKwon · 2026-04-13T23:52:38Z

Merged to master.

HyukjinKwon reviewed Apr 12, 2026

View reviewed changes

Yicong-Huang added 6 commits April 13, 2026 05:53

feat: make spark.read.csv accept DataFrame input

7ab416e

fix: align csv signature and mypy annotations with json

5a39963

fix: add csv to RDD argument skip list in connect compatibility test

08c85c9

chore: retrigger CI

25c91f7

chore: retrigger CI

4637441

fix: address review feedback - reorder DataFrame check and reject mul…

c8868ea

…ti-column input

Yicong-Huang force-pushed the SPARK-56255 branch from 679a1f1 to c8868ea Compare April 13, 2026 06:08

chore: retrigger CI

a9c7e74

hvanhovell reviewed Apr 13, 2026

View reviewed changes

Yicong-Huang added 3 commits April 13, 2026 20:27

refactor: extract toStringDataset util for json/csv DataFrame input v…

95930c0

…alidation

docs: add Scaladoc to toStringDataset

94c5ced

docs: trim Scaladoc

110a163

HyukjinKwon approved these changes Apr 13, 2026

View reviewed changes

HyukjinKwon closed this in 64f9344 Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56255][PYTHON][CONNECT] Make spark.read.csv accept DataFrame input#55274

[SPARK-56255][PYTHON][CONNECT] Make spark.read.csv accept DataFrame input#55274
Yicong-Huang wants to merge 10 commits intoapache:masterfrom
Yicong-Huang:SPARK-56255

Yicong-Huang commented Apr 9, 2026 •

edited

Loading

Uh oh!

HyukjinKwon Apr 12, 2026

Uh oh!

Yicong-Huang Apr 13, 2026

Uh oh!

HyukjinKwon Apr 12, 2026

Uh oh!

Yicong-Huang Apr 13, 2026

Uh oh!

hvanhovell Apr 13, 2026

Uh oh!

Yicong-Huang Apr 13, 2026

Uh oh!

HyukjinKwon commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		from pyspark.sql.connect.dataframe import DataFrame

		if isinstance(path, DataFrame):

Conversation

Yicong-Huang commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

HyukjinKwon Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon Apr 12, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

Yicong-Huang Apr 13, 2026

Choose a reason for hiding this comment

Uh oh!

HyukjinKwon commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Yicong-Huang commented Apr 9, 2026 •

edited

Loading