Migrate org.apache.solr.cli tools from V1 to V2 APIs#4154
Migrate org.apache.solr.cli tools from V1 to V2 APIs#4154epugh wants to merge 10 commits intoapache:mainfrom
Conversation
Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
…r/nodes + /collections APIs Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
… POJOs Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
Co-authored-by: epugh <22395+epugh@users.noreply.github.com>
dsmiley
left a comment
There was a problem hiding this comment.
looks good; just some minor feedback
|
Right now, Solr allows users to disable the v2 APIs with a boolean sysprop: I'm 100% in favor of using the v2 APIs in more places - we desperately need that dog-fooding on them and this is a great way to do that. But if we're going to do this then we need to rip out that sysprop. Ideally that'd happen before this PR gets merged, just to make sure we don't forget about it and accidentally release something where the |
Good find! Yeah, I had forgotten about that setting. We could just commit these to For various v2 api's that we want to have flexiblity on, can we keep them as "experimental" for a little longer, so we don't have to worry about back compat, with the idea that internal to solr consumers, like the CLI (or maybe internal APIS) wouldn't care that we are changing them over time... |
FYI that these links are "private" AFAIK - I get a 404 when trying to click through.
I started out writing a reply to this that said essentially: "Sure, why not. Solr offers plenty of feature flags, and there's no clear correlation between those and 'experimental'-ness. Lots of non-experimental features have flags. And there are plenty of other 'experimental' features that don't have a feature flag. No reason to tie these things together." But in writing that and reading it over I find I've unconvinced myself 😦 The problem here is security. I don't know of any security issues specific to the v2 API, but it's a whole new attack surface. While we're signaling that it's not ready for prime time then I think a subset of users are going to want to be able to disable it. Telling them: "here's this thing that's not quite ready yet, pray it doesn't expose you in some way" is not a great message. This feels like a much larger discussion and I don't want it to hold up this PR. Would you be OK making this "main"-only for now, and I'll file a JIRA ticket or dev@ thread to figure out a path forward here? The dust may settle on that discussion in a few weeks and we decide we're OK with this hitting |
I love a suggestion that keeps me unblocked ;-). I think Interesting, I think of these as experimental not because of security but because we aren't sure we have the right flavour of API. Somethign I've noticed lately is that when I build APIs, that then, later, when I use them "for real" I find a lot of issues with them not being "quite right" yet. Not because of security or performacnce, but maybe because of too much paylard returned or not enough filters or what not. So yeah, please do move forward, and I'll merge this to |
gerlowskija
left a comment
There was a problem hiding this comment.
Found one bug in your code I think; see comments inline.
Also, this PR adds a whole new API (ListClusterNodes), so it probably deserves a changelog entry and some tests that use the generated v2 request.
| @SuppressWarnings("unchecked") | ||
| List<String> collections = (List<String>) existsCheckResult.get("collections"); | ||
| exists = collections != null && collections.contains(collection); | ||
| var response = new CollectionsApi.ListCollections().process(solrClient); |
There was a problem hiding this comment.
[0] I may be out of date, but as of one point at least these generated v2 request/response objects didn't throw exception on errors the way everything else in SolrJ does.
Assuming this is still true, users are responsible for inspecting the .responseHeader.status field to make sure that the operation succeeded. Hopefully I'm wrong, but if not you'll need to add that checking here and elsewhere.
| /** | ||
| * V2 API for listing the live nodes in the SolrCloud cluster. | ||
| * | ||
| * <p>This API (GET /v2/cluster/nodes) is equivalent to the v1 CLUSTERSTATUS action filtered to |
There was a problem hiding this comment.
[0] IMO the API comparison is a bit clearer if you put the verbatim v1 request here that you're comparing the v2 API to, rather than trying to describe it.
i.e. /admin/collections?action=CLUSTERSTATUS&liveNodes=true&includeAll=false
There was a problem hiding this comment.
@gerlowskija should this API return filtered data? I am now thinking that if this is the new /cluster/nodes api, then it should take in params like liveNodes or includeAll? Should this be /cluster/nodes/live and we also support /cluster/nodes which would return all?
Ugh, I dug a bit more, and so yeah, in V1 CLUSTERSTATUS command returns collection and cluster data mixed up.
What I have implemented only returns cluster data. But don't quite understand how what I have written overlaps with the existing V2 code that I found in org.apache.solr.handler.ClusterAPI. --> I think ClusterAPI is the old way of doing V2 apis, so if I want nice pretty SolrJ objects, I have to use the new way, or accept a Generic response... I think what I've written may be not even be used or is overlapping with the existing ClusterAPI's implementation of /cluster/nodes...
There was a problem hiding this comment.
don't quite understand how what I have written overlaps with the existing V2 code that I found in org.apache.solr.handler.ClusterAPI
My understanding is:
- Long ago Noble or someone added an old-style "/api/nodes/cluster" implementation in ClusterAPI.
- Both implementations, old and new, seem to be returning roughly the same information and be roughly equivalent.
- Your new implementation uses the desired framework and auto-generates a SolrJ binding, so it's definitely worth keeping.
- That means this PR should probably delete the "old" API definition (i.e.
ClusterAPI.getNodes())
should this API return filtered data? I am now thinking that if this is the new /cluster/nodes api, then it should take in params like liveNodes or includeAll? Should this be /cluster/nodes/live and we also support /cluster/nodes which would return all?
I'm not really sure what you mean by this. In SolrCloud, there's no real notion of "dead" nodes. Either you've registered yourself in ZK and you're a "live" part of the cluster, or you're not in the cluster at all. Right? Am I missing something?
There was a problem hiding this comment.
I am going to be tackling ClusterAPI migration soon, so I may leave the old definition for that big effort...
As far as live/dead, I notice that if a node is no longer avialable, it still shows up in our ZK nodes page as red "down"...
I am hoping if you are onboard with my plan to delete apis in ClusterAPI that we can go ahead and merge this one now?
… back to old style V2 api creation
| import org.apache.solr.client.api.model.ListClusterNodesResponse; | ||
|
|
||
| /** V2 API definition for listing the live nodes in the SolrCloud cluster. */ | ||
| @Path("/cluster/nodes") |
There was a problem hiding this comment.
I am still very fuzzy on how all this magic works! @gerlowskija I would really appreciate your reivewing the new endpoint. One thing I want to confirm is that we already HAVE a v2 /cluster/nodes. However, it was not (I believe) using the annotaitons os it meant the V2 wasn't in solrJ due to no openapi specs. Hence I had a GenericSolrRequest until I created this. (I being copilot).
| /** Response for the v2 "list cluster nodes" API */ | ||
| public class ListClusterNodesResponse extends SolrJerseyResponse { | ||
|
|
||
| @Schema(description = "The live nodes in the cluster.") |
There was a problem hiding this comment.
I also want to confirm that this api is for "live nodes" and not "all nodes" including dead ones?
| // TODO get this as a metric from the metrics API instead, or something else. | ||
| var collections = (Map<String, Object>) json._get(List.of("cluster", "collections"), null); | ||
| var collectionsResponse = new CollectionsApi.ListCollections().process(solrClient); | ||
| var collections = |
There was a problem hiding this comment.
I don't love all these != null checks...
|
|
||
| @Override | ||
| @PermissionName(COLL_READ_PERM) | ||
| public ListClusterNodesResponse listClusterNodes() { |
There was a problem hiding this comment.
@gerlowskija would love confirmation here on how we frame up the logic...
| DeleteNode.class, | ||
| ListAliases.class, | ||
| AliasProperty.class, | ||
| ListClusterNodes.class, |
There was a problem hiding this comment.
@gerlowskija and this is how we add v2 apis?
| /** | ||
| * V2 API for listing the live nodes in the SolrCloud cluster. | ||
| * | ||
| * <p>This API (GET /v2/cluster/nodes) is equivalent to the v1 CLUSTERSTATUS action filtered to |
| /** | ||
| * V2 API for listing the live nodes in the SolrCloud cluster. | ||
| * | ||
| * <p>This API (GET /v2/cluster/nodes) is equivalent to the v1 CLUSTERSTATUS action filtered to |
There was a problem hiding this comment.
@gerlowskija should this API return filtered data? I am now thinking that if this is the new /cluster/nodes api, then it should take in params like liveNodes or includeAll? Should this be /cluster/nodes/live and we also support /cluster/nodes which would return all?
Ugh, I dug a bit more, and so yeah, in V1 CLUSTERSTATUS command returns collection and cluster data mixed up.
What I have implemented only returns cluster data. But don't quite understand how what I have written overlaps with the existing V2 code that I found in org.apache.solr.handler.ClusterAPI. --> I think ClusterAPI is the old way of doing V2 apis, so if I want nice pretty SolrJ objects, I have to use the new way, or accept a Generic response... I think what I've written may be not even be used or is overlapping with the existing ClusterAPI's implementation of /cluster/nodes...
|
|
||
| @EndPoint(method = GET, path = "/cluster/nodes", permission = COLL_READ_PERM) | ||
| public void getNodes(SolrQueryRequest req, SolrQueryResponse rsp) { | ||
| System.out.println("\n\nHI THERE I AM IN CLUSTERAPI\n\n"); |
There was a problem hiding this comment.
@gerlowskija this is what I used to prove to myself that ClusterAPI is being used instead of the code that I wrote... Does this change anything on the "Lets write more tests or changelog?"

Replaces legacy V1 admin API calls (
CollectionAdminRequest,CoreAdminRequest) in thebin/solrCLI tools with generated V2 OpenAPI SolrJ classes. Also adds a new proper SolrJ V2 endpoint forGET /cluster/nodesthat was previously missing, eliminating the need for aGenericV2SolrRequestworkaround.I used Copilot for this, and here is the agent chat: https://github.com/epugh/solr/tasks/3c2f15c6-c0c7-4e39-b6b6-1cd51f8e7aa4
Migrated calls
CLIUtilssafeCheckCollectionExistsCollectionAdminRequest.List→CollectionsApi.ListCollectionsCreateToolcreateCoreCoreAdminRequest.createCore→CoresApi.CreateCoreCreateToolcreateCollectionCollectionAdminRequest.createCollection→CollectionsApi.CreateCollectionDeleteTooldeleteCollectionCollectionAdminRequest.deleteCollection→CollectionsApi.DeleteCollectionDeleteTooldeleteCoreCoreAdminRequest.Unload→CoresApi.UnloadCoreStatusToolgetCloudStatusCollectionAdminRequest.ClusterStatus→ClusterApi.ListClusterNodes+CollectionsApi.ListCollectionsVerbose output restored via Jackson
The old V1 code serialized
NamedListresponses with noggit'sJSONWriter. V2 response POJOs are Jackson-annotated, so verbose output now usesJacksonContentWriter.DEFAULT_MAPPER.writerWithDefaultPrettyPrinter()— actually cleaner output since the responses are already proper JSON objects:New
GET /cluster/nodesV2 API (SolrJ support)StatusToolpreviously fell back toGenericV2SolrRequestbecause SolrJ had no typed class for this endpoint. The full API pipeline was wired up:ListClusterNodesResponse— new model withSet<String> nodesListClusterNodesApi— new endpoint interface at@Path("/cluster/nodes")ListClusterNodes— new Jersey implementation registered inCollectionsHandlerClusterApi.ListClusterNodes— generated SolrJ request class (replacesGenericV2SolrRequest)Not migrated
Snapshot tools (
SnapshotCreateTool,SnapshotDeleteTool, etc.) are intentionally excluded as candidates for removal.In fact, we may just remove them...?
Tests
Existing tests + bats tests all pass.