-
Notifications
You must be signed in to change notification settings - Fork 44
Add smart URL transformation support for search-replace command #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hello! 👋 Thanks for opening this pull request! Please check out our contributing guidelines. We appreciate you taking the initiative to contribute to this project. Contributing isn't limited to just code. We encourage you to contribute in the way that best fits your abilities, by writing tutorials, giving a demo at your local meetup, helping other users with their support questions, or revising our documentation. Here are some useful Composer commands to get you started:
To run a single Behat test, you can use the following command: # Run all tests in a single file
composer behat features/some-feature.feature
# Run only a specific scenario (where 123 is the line number of the "Scenario:" title)
composer behat features/some-feature.feature:123You can find a list of all available Behat steps in our handbook. |
Summary of ChangesHello @MarkBerube, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces intelligent URL transformation capabilities to the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable enhancement to the search-replace command by adding smart URL transformation support. The new --smart-url and --analyze-tables flags, along with automatic URL detection, significantly improve performance for URL replacements by intelligently skipping non-URL columns. The implementation is robust, covering static core column lists, dynamic datatype analysis, and pattern matching for column names. Comprehensive test cases have been added to ensure the new functionality works as expected across various scenarios, including error handling for invalid URLs. The documentation has also been updated to reflect these new options and their usage.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
| ); | ||
|
|
||
| if ( empty( $columns ) ) { | ||
| continue; // @codeCoverageIgnore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To explain the ignore - this is a extremely rare edge case where the wpdb reports no columns for a DB table. This makes this feature effectively not usable so we must skip the operation. The ignore makes more sense to me than mocking a complex DB edge case in the .feature that should never be hit to begin with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR adds smart URL transformation support to the search-replace command to improve performance when replacing URLs in WordPress databases. The feature automatically skips columns that cannot contain URLs, resulting in ~34-42% performance improvement on large databases.
Changes:
- Adds
--smart-urlflag with auto-detection for URLs starting with http:// or https:// - Adds
--analyze-tablesflag for advanced MySQL datatype analysis to skip additional non-text columns - Implements comprehensive test coverage with 42 new test scenarios
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| src/WP_CLI/SearchReplace/Non_URL_Columns.php | New class providing static list of WordPress core non-URL columns and dynamic analysis methods for identifying non-text datatypes and naming patterns |
| src/Search_Replace_Command.php | Adds URL auto-detection, smart-url mode implementation, table analysis, and validation logic for the new flags |
| features/search-replace.feature | Updates existing test scenarios to expect "Detected URL replacement" message and adjusted output format |
| features/search-replace-url.feature | New comprehensive test file with 42 scenarios covering smart-url and analyze-tables functionality |
| README.md | Documents the new --smart-url and --analyze-tables flags with usage examples |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 'to_ping', | ||
| 'pinged', |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The columns to_ping and pinged in wp_posts can contain URLs. According to WordPress Codex, to_ping stores a list of URLs to ping when the post is published, and pinged stores URLs that have already been pinged. These columns should be removed from the skip list as they are specifically designed to contain URLs.
| 'to_ping', | |
| 'pinged', |
| 'post_status', | ||
| 'comment_status', | ||
| 'ping_status', | ||
| 'post_password', |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The post_name column (post slug) typically doesn't contain URLs, but it can contain URL-like strings in certain edge cases (e.g., when posts are imported with URL slugs). However, this is a reasonable optimization since post_name is meant for slugs, not full URLs. Consider adding a comment explaining this decision for future maintainers.
| 'post_password', | |
| 'post_password', | |
| // Note: post_name is a slug (not a full URL) in normal WordPress usage. | |
| // In rare edge cases (e.g. imports) it may contain URL-like strings, but we | |
| // still treat it as non-URL for search/replace to keep this optimization simple. |
| 'comment_id', | ||
|
|
||
| // wp_users table - User metadata and status | ||
| 'user_login', |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The user_login column could theoretically contain a URL-like string if someone uses an email address or URL-formatted username. While uncommon, excluding this from search-replace in URL mode could miss legitimate use cases. Consider whether this exclusion is too aggressive.
| 'user_login', |
| 'link_rating', | ||
| 'link_updated', | ||
| 'link_rel', | ||
| 'link_rss', |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link_rss column in wp_links is specifically designed to store RSS feed URLs. This should not be in the skip list as it can contain URLs. The link_rel column typically contains relationship values like 'nofollow' rather than URLs, so that one is correctly excluded.
| 'link_rss', |
| if ( ! filter_var( $old, FILTER_VALIDATE_URL ) ) { | ||
| WP_CLI::error( | ||
| sprintf( | ||
| 'The --smart-url flag is designed for URL replacements, but "%s" is not a valid URL. ' . | ||
| 'Please use a full URL (e.g., http://example.com) or remove the --smart-url flag.', | ||
| $old | ||
| ) | ||
| ); | ||
| } |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PHP's FILTER_VALIDATE_URL accepts URLs without schemes (e.g., 'example.com'), but the auto-detection only triggers for URLs starting with 'http://' or 'https://'. This creates an inconsistency where the validation would pass for 'example.com' if someone manually uses --smart-url, but it wouldn't auto-detect. The validation should either require a scheme (http:// or https://) or accept URLs without schemes consistently.
| specify multiple columns. | ||
|
|
||
| [--smart-url] | ||
| Enable smart URL mode. Automatically skips 75+ WordPress core columns |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation claims '75+ WordPress core columns' are skipped, but counting the columns in the get_core_columns() method reveals only 61 columns. This number should be updated to accurately reflect the actual count, or the method should be expanded to match the claimed count.
| Enable smart URL mode. Automatically skips 75+ WordPress core columns | |
| Enable smart URL mode. Automatically skips dozens of WordPress core columns |
| that cannot contain URLs (like post_type, post_status, user_pass, etc.), | ||
| significantly improving performance for URL replacements. This is | ||
| particularly useful when migrating sites or changing domain names. | ||
| Performance: ~34% faster on large databases. |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR description shows test results indicating 41.6% performance improvement, but the README states ~34%. While both might be valid depending on the database, it would be better to use consistent numbers or provide a range (e.g., '30-40% faster' or '~34-42% faster') to set accurate expectations.
| Performance: ~34% faster on large databases. | |
| Performance: ~34–42% faster on large databases (depending on the database). |
| | replacement | flags | | ||
| | {SITEURL}/subdir | | | ||
| | newdomain.com | | | ||
| | newdomain.com | --dry-run | |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test uses 'newdomain.com' as a replacement value, which is not a valid URL (missing scheme). This will cause FILTER_VALIDATE_URL to fail in the validation logic at line 311 of Search_Replace_Command.php when smart-url mode is auto-enabled. The test should use 'http://newdomain.com' or 'https://newdomain.com' instead.
| | replacement | flags | | |
| | {SITEURL}/subdir | | | |
| | newdomain.com | | | |
| | newdomain.com | --dry-run | | |
| | replacement | flags | | |
| | {SITEURL}/subdir | | | |
| | http://newdomain.com | | | |
| | http://newdomain.com | --dry-run | |
|
Thanks a lot! I think this goes in the right direction. This is very much related to #186. A separate command might be interesting for this as it allows for future optimizations more easily, such as ones mentioned at https://make.wordpress.org/core/2025/11/27/wordpress-importer-can-now-migrate-urls-in-your-content/ (cc @adamziel) |
|
At a glance, I feel like the real feature here is an option that says "ignore all non text columns" and it feels like it would be useful beyond URLs (though URLs is probably the most popular use case for this). Id be curious of a breakdown of performance and if there is a lot to gain by adding anything beyond the single feature that ignores based on column types. Feels like a lot of the hardcoded table names are already of a type that would be ignored anyway. The regex pattern for potential column names seems fragile and prone to false positives (and again, the positives are already likely to have a type that would be ignored). In other words, can we get most of the benefits here with just the single feature to ignore Maybe another command could build on top of that, but to me that seems like a small general change that could maybe have a big impact if a lot of time is spent on those types of columns. |
Why
When search-replace-command for WP CLI scans for URLs and other bits of text to replace, it searches the whole SQL database for that value. Even tables in the WP core ecosystem that will never have a URL, columns like:
post_statusonwp_postsmeta_keyonwp_postmetaWhile not a big deal for most WordPress sites, this becomes a persistent speed bump to cloning environments when you start dealing with databases with a large amount of rows or databases with multisite setups with a huge amount of tables.
What this does
What this change does is add two parameters to alleviate this pain that a WP-CLI user can call.
--smart-url- skip columns automactically that exist in WP Core that will NEVER have a URL as a value. These columns are statically fixed and exist insrc/WP_CLI/SearchReplace/Non_URL_Columns.php.--analyze-tables- can only be executed if--smart-urlis also present in parameters. This parameter will tell WP-CLI to scan the database for columns that are non text datatypes in SQL (binary,datetime, etc.) and for column names that match the core WP pattern that would also not have a URL in it (*_order,*_quantity, etc.) that will be skipped additionally before search-replace runs. Will be a bit slower, but will capture more columns to skip for custom WP DB setups.You must opt into these performance skips via the parameters above and it is only recommended that you do so if you are replacing URLs in the WP DB.
Performance Gains
I've tested this in multiple local setups. In smaller setups (less than 1-2gb DB size) there was no noticeable difference between the scan speeds. However when the DB grew larger in my benchmarks (10GB) where there were a large amount of rows in
wp_postmetaandwp_poststhere was an average 30-40% savings over a normal search-replace scan speed.Test Run
wp search-replace(standard)wp search-replace --smart-urlTest Details
Standard Command Output
Smart URL Mode Output