feat: Spec multi-result-set API#3871
Conversation
|
After discussion and agreement on the proposed function, I'll update the relevant implementations here of the other driver managers to leverage and expose this (python, rust, Go, etc.) |
|
Should we make a branch for spec changes? |
That makes sense, gather the spec updates in the branch and then eventually decide to push it as one thing. |
|
I've changed the base for this PR to point to the spec-1.2.0 branch. Now we just need people to weigh in on the proposal 😄 |
paleolimbot
left a comment
There was a problem hiding this comment.
Nice!
The previous PR also had:
ADBC_EXPORT
AdbcStatusCode AdbcStatementNextResultSetSchema(struct AdbcStatement* statement,
struct ArrowSchema* schema,
struct AdbcError* error);Is that no longer planned or no longer needed?
|
I've updated the PR based on the comments and discussion. Everyone please take a look and let me know what you think! |
c/include/arrow-adbc/adbc.h
Outdated
| /// \brief opaque implementation-defined state | ||
| void* private_data; | ||
|
|
||
| /// \brief Release the result set and any associated resources. |
There was a problem hiding this comment.
We can say this also cancels the query if not completed?
There was a problem hiding this comment.
docs updated to say this.
|
Updated based on the most recent comments, converted the callback members of |
|
@jadewang-db and @eric-wang-1990 are interested in this. |
c/include/arrow-adbc/adbc.h
Outdated
| /// ADBC_VERSION_1_0_0. The driver must not access the new fields, | ||
| /// which will carry undefined values. |
There was a problem hiding this comment.
Shouldn't the manager populate them with functions that fail immediately?
There was a problem hiding this comment.
I'm not actually sure what this is trying to say, now that you point it out.
There was a problem hiding this comment.
I believe this is in line with the current behavior. No attempt is made to pre-populate with functions that fail immediately (likely to avoid having to create separate versions for every callback going forward). We currently leave these members undefined in the struct. If we want to change this behavior, I'd prefer we do it in a different PR, not this one.
There was a problem hiding this comment.
Probably we should describe the user perspective and driver perspective separately.
From the user perspective: the manager will return a fully populated struct, but some of the functions may fail with NOT_IMPLEMENTED.
From the driver perspective: they should not access members outside of the version specified by the driver manager. (Though this is already required by the contract on the init func, and also you have no other way of knowing the size of the struct, nor do you know whether it's the driver manager or something else calling you.) I don't think we need to specify what exactly the driver manager does here.
There was a problem hiding this comment.
Ping @zeroshade, I think we should at least avoid specifying exactly what the driver manager does here, just that the driver manager expects that if it gives the driver a particular version, it should not try to access any functions defined after that version.
There was a problem hiding this comment.
How's the updated version?
paleolimbot
left a comment
There was a problem hiding this comment.
Just optional nits on the names from me!
Now that the new functions are not embedded callbacks on the AdbcMultiResultSet it's easy to add schema-only execution if we need it.
abonander
left a comment
There was a problem hiding this comment.
👍 nice and feasible to implement for the ClickHouse driver!
|
@lidavidm @paleolimbot I've updated based on the feedback including adding an |
c/include/arrow-adbc/adbc.h
Outdated
| /// | ||
| /// The AdbcMultiResultSet must outlive the returned partitions. |
There was a problem hiding this comment.
I'm not sure about this (isn't the point of the partitions to be independent of the API objects?)
c/include/arrow-adbc/adbc.h
Outdated
| /// \param[out] error An optional location to return an error message if necessary. | ||
| /// | ||
| /// \return ADBC_STATUS_NOT_IMPLEMENTED if the driver only supports fetching results | ||
| /// as partitions, ADBC_STATUS_INVALID_STATE if called at an inappropriate time, |
There was a problem hiding this comment.
When is "an inappropriate time"?
There was a problem hiding this comment.
artifact of the previous design. it no longer applies here give that we've decoupled everything by having these standalone functions on the separate MultiResultSet object
| /// then call `MultiResultSetNext` repeatedly until it returns ADBC_STATUS_OK and | ||
| /// sets the release callback to NULL, indicating that there are no more result sets. |
There was a problem hiding this comment.
I think a state we haven't addressed is whether you can call Next while still reading the previous result set or not. I hazard many databases will not support concurrency, especially because the queries may depend on each other...
There was a problem hiding this comment.
Should we leave it up to a driver to determine whether it's allowed? (i.e. "A driver MAY invalidate the current result set when MultiResultSetNext is called") or should we explicitly state that it is invalidated when calling MultiResultSetNext and thus require that behavior?
I personally lean towards the latter rather than the former for consistency, but it would prevent a driver from technically allowing it if it supports it. Thoughts?
There was a problem hiding this comment.
I think we can state MAY, and most code will then have to assume it will be invalidated; if you're coding to a specific driver release, then maybe you can make a stronger assumption.
c/include/arrow-adbc/adbc.h
Outdated
| /// ADBC_VERSION_1_0_0. The driver must not access the new fields, | ||
| /// which will carry undefined values. |
There was a problem hiding this comment.
Ping @zeroshade, I think we should at least avoid specifying exactly what the driver manager does here, just that the driver manager expects that if it gives the driver a particular version, it should not try to access any functions defined after that version.
Extracted from #3607 with influence by the comments there and main...CurtHagenlocher:arrow-adbc:MoreResults, this contains a proposal for handling multi-result set query execution via ADBC by adding a new function for drivers,
AdbcStatementNextResultSet.This also includes the necessary changes for an ADBC API Revision 1.2.0 (macro defines and so on). The comment above the function includes all the semantic definitions of the behavior.