Skip to content

Conversation

@MarkBerube
Copy link
Contributor

Why

When search-replace-command for WP CLI scans for URLs and other bits of text to replace, it searches the whole SQL database for that value. Even tables in the WP core ecosystem that will never have a URL, columns like:

  • post_status on wp_posts
  • meta_key on wp_postmeta

While not a big deal for most WordPress sites, this becomes a persistent speed bump to cloning environments when you start dealing with databases with a large amount of rows or databases with multisite setups with a huge amount of tables.

What this does

What this change does is add two parameters to alleviate this pain that a WP-CLI user can call.

--smart-url - skip columns automactically that exist in WP Core that will NEVER have a URL as a value. These columns are statically fixed and exist in src/WP_CLI/SearchReplace/Non_URL_Columns.php.

--analyze-tables - can only be executed if --smart-url is also present in parameters. This parameter will tell WP-CLI to scan the database for columns that are non text datatypes in SQL (binary, datetime, etc.) and for column names that match the core WP pattern that would also not have a URL in it (*_order, *_quantity, etc.) that will be skipped additionally before search-replace runs. Will be a bit slower, but will capture more columns to skip for custom WP DB setups.

You must opt into these performance skips via the parameters above and it is only recommended that you do so if you are replacing URLs in the WP DB.

Performance Gains

I've tested this in multiple local setups. In smaller setups (less than 1-2gb DB size) there was no noticeable difference between the scan speeds. However when the DB grew larger in my benchmarks (10GB) where there were a large amount of rows in wp_postmeta and wp_posts there was an average 30-40% savings over a normal search-replace scan speed.

Test Run

Command Time Improvement
wp search-replace (standard) 144s baseline
wp search-replace --smart-url 84s 41.6% faster

Test Details

Standard Command Output

Table	Column	Replacements	Type
wp_commentmeta	meta_key	0	SQL
wp_commentmeta	meta_value	0	SQL
wp_comments	comment_author	0	SQL
wp_comments	comment_author_email	0	SQL
wp_comments	comment_author_url	200000	SQL
wp_comments	comment_author_IP	0	SQL
wp_comments	comment_content	200000	SQL
wp_comments	comment_approved	0	SQL
wp_comments	comment_agent	0	SQL
wp_comments	comment_type	0	SQL
wp_links	link_url	0	SQL
wp_links	link_name	0	SQL
wp_links	link_image	0	SQL
wp_links	link_target	0	SQL
wp_links	link_description	0	SQL
wp_links	link_visible	0	SQL
wp_links	link_rel	0	SQL
wp_links	link_notes	0	SQL
wp_links	link_rss	0	SQL
wp_options	option_name	0	SQL
wp_options	option_value	50000	PHP
wp_options	autoload	0	SQL
wp_postmeta	meta_key	0	SQL
wp_postmeta	meta_value	30000000	SQL
wp_posts	post_content	950000	SQL
wp_posts	post_title	0	SQL
wp_posts	post_excerpt	950000	SQL
wp_posts	post_status	0	SQL
wp_posts	comment_status	0	SQL
wp_posts	ping_status	0	SQL
wp_posts	post_password	0	SQL
wp_posts	post_name	0	SQL
wp_posts	to_ping	0	SQL
wp_posts	pinged	0	SQL
wp_posts	post_content_filtered	0	SQL
wp_posts	guid	950000	SQL
wp_posts	post_type	0	SQL
wp_posts	post_mime_type	0	SQL
wp_term_taxonomy	taxonomy	0	SQL
wp_term_taxonomy	description	0	SQL
wp_termmeta	meta_key	0	SQL
wp_termmeta	meta_value	0	SQL
wp_terms	name	0	SQL
wp_terms	slug	0	SQL
wp_usermeta	meta_key	0	SQL
wp_usermeta	meta_value	0	PHP
wp_users	user_login	0	SQL
wp_users	user_nicename	0	SQL
wp_users	user_email	0	SQL
wp_users	user_url	0	SQL
wp_users	user_activation_key	0	SQL
wp_users	display_name	0	SQL
Success: 33300000 replacements to be made.

real	2m23.679s
user	0m0.820s
sys	0m0.661s

Smart URL Mode Output

Table	Column	Replacements	Type
wp_commentmeta	meta_value	0	SQL
wp_comments	comment_author	0	SQL
wp_comments	comment_author_email	0	SQL
wp_comments	comment_author_url	200000	SQL
wp_comments	comment_author_IP	0	SQL
wp_comments	comment_content	200000	SQL
wp_comments	comment_agent	0	SQL
wp_links	link_url	0	SQL
wp_links	link_name	0	SQL
wp_links	link_image	0	SQL
wp_links	link_target	0	SQL
wp_links	link_description	0	SQL
wp_links	link_notes	0	SQL
wp_options	option_value	50000	PHP
wp_postmeta	meta_value	30000000	SQL
wp_posts	post_content	950000	SQL
wp_posts	post_title	0	SQL
wp_posts	post_excerpt	950000	SQL
wp_posts	post_content_filtered	0	SQL
wp_posts	guid	950000	SQL
wp_term_taxonomy	description	0	SQL
wp_termmeta	meta_value	0	SQL
wp_terms	name	0	SQL
wp_usermeta	meta_value	0	PHP
wp_users	user_url	0	SQL
wp_users	user_activation_key	0	SQL
Success: 33300000 replacements to be made.

real	1m23.315s
user	0m0.775s
sys	0m0.698s

@MarkBerube MarkBerube requested a review from a team as a code owner January 15, 2026 17:47
@github-actions
Copy link

Hello! 👋

Thanks for opening this pull request! Please check out our contributing guidelines. We appreciate you taking the initiative to contribute to this project.

Contributing isn't limited to just code. We encourage you to contribute in the way that best fits your abilities, by writing tutorials, giving a demo at your local meetup, helping other users with their support questions, or revising our documentation.

Here are some useful Composer commands to get you started:

  • composer install: Install dependencies.
  • composer test: Run the full test suite.
  • composer phpcs: Check for code style violations.
  • composer phpcbf: Automatically fix code style violations.
  • composer phpunit: Run unit tests.
  • composer behat: Run behavior-driven tests.

To run a single Behat test, you can use the following command:

# Run all tests in a single file
composer behat features/some-feature.feature

# Run only a specific scenario (where 123 is the line number of the "Scenario:" title)
composer behat features/some-feature.feature:123

You can find a list of all available Behat steps in our handbook.

@gemini-code-assist
Copy link

Summary of Changes

Hello @MarkBerube, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces intelligent URL transformation capabilities to the wp search-replace command, aiming to significantly boost performance for URL replacements in WordPress databases. By smartly identifying and skipping columns that are guaranteed not to contain URLs, either through a static list of core WordPress columns or dynamic database analysis, this enhancement makes environment cloning and domain migration processes faster and more efficient, especially for large installations.

Highlights

  • New --smart-url Flag: Introduces a --smart-url flag to the wp search-replace command, which automatically skips over 75 WordPress core columns known not to contain URLs. This significantly improves performance for URL replacement operations, with reported gains of 30-40% on large databases.
  • New --analyze-tables Flag: Adds an --analyze-tables flag, which can be used in conjunction with --smart-url. This flag enables advanced table analysis, dynamically identifying and skipping non-text columns (like integers, dates, enums) and columns matching common non-URL naming patterns (e.g., *_id, *_status) in custom plugin tables, further optimizing the search-replace process.
  • Automatic URL Replacement Detection: The wp search-replace command now automatically detects if the search string is a URL (starting with http:// or https://) and enables --smart-url mode by default, streamlining the user experience for common URL migration tasks.
  • Validation for --smart-url Usage: Includes validation to ensure that the --smart-url flag is only used when the search string is a valid URL, preventing incorrect usage and providing helpful error messages.
  • Comprehensive Test Coverage and Documentation: Extensive new feature tests have been added to cover various scenarios for --smart-url and --analyze-tables, including different data types, serialized data, multisite setups, and error conditions. The README.md has also been updated with detailed explanations and examples for the new flags.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable enhancement to the search-replace command by adding smart URL transformation support. The new --smart-url and --analyze-tables flags, along with automatic URL detection, significantly improve performance for URL replacements by intelligently skipping non-URL columns. The implementation is robust, covering static core column lists, dynamic datatype analysis, and pattern matching for column names. Comprehensive test cases have been added to ensure the new functionality works as expected across various scenarios, including error handling for invalid URLs. The documentation has also been updated to reflect these new options and their usage.

@codecov
Copy link

codecov bot commented Jan 15, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

);

if ( empty( $columns ) ) {
continue; // @codeCoverageIgnore
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To explain the ignore - this is a extremely rare edge case where the wpdb reports no columns for a DB table. This makes this feature effectively not usable so we must skip the operation. The ignore makes more sense to me than mocking a complex DB edge case in the .feature that should never be hit to begin with.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds smart URL transformation support to the search-replace command to improve performance when replacing URLs in WordPress databases. The feature automatically skips columns that cannot contain URLs, resulting in ~34-42% performance improvement on large databases.

Changes:

  • Adds --smart-url flag with auto-detection for URLs starting with http:// or https://
  • Adds --analyze-tables flag for advanced MySQL datatype analysis to skip additional non-text columns
  • Implements comprehensive test coverage with 42 new test scenarios

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/WP_CLI/SearchReplace/Non_URL_Columns.php New class providing static list of WordPress core non-URL columns and dynamic analysis methods for identifying non-text datatypes and naming patterns
src/Search_Replace_Command.php Adds URL auto-detection, smart-url mode implementation, table analysis, and validation logic for the new flags
features/search-replace.feature Updates existing test scenarios to expect "Detected URL replacement" message and adjusted output format
features/search-replace-url.feature New comprehensive test file with 42 scenarios covering smart-url and analyze-tables functionality
README.md Documents the new --smart-url and --analyze-tables flags with usage examples

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +104 to +105
'to_ping',
'pinged',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The columns to_ping and pinged in wp_posts can contain URLs. According to WordPress Codex, to_ping stores a list of URLs to ping when the post is published, and pinged stores URLs that have already been pinged. These columns should be removed from the skip list as they are specifically designed to contain URLs.

Suggested change
'to_ping',
'pinged',

Copilot uses AI. Check for mistakes.
'post_status',
'comment_status',
'ping_status',
'post_password',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post_name column (post slug) typically doesn't contain URLs, but it can contain URL-like strings in certain edge cases (e.g., when posts are imported with URL slugs). However, this is a reasonable optimization since post_name is meant for slugs, not full URLs. Consider adding a comment explaining this decision for future maintainers.

Suggested change
'post_password',
'post_password',
// Note: post_name is a slug (not a full URL) in normal WordPress usage.
// In rare edge cases (e.g. imports) it may contain URL-like strings, but we
// still treat it as non-URL for search/replace to keep this optimization simple.

Copilot uses AI. Check for mistakes.
'comment_id',

// wp_users table - User metadata and status
'user_login',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user_login column could theoretically contain a URL-like string if someone uses an email address or URL-formatted username. While uncommon, excluding this from search-replace in URL mode could miss legitimate use cases. Consider whether this exclusion is too aggressive.

Suggested change
'user_login',

Copilot uses AI. Check for mistakes.
'link_rating',
'link_updated',
'link_rel',
'link_rss',
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link_rss column in wp_links is specifically designed to store RSS feed URLs. This should not be in the skip list as it can contain URLs. The link_rel column typically contains relationship values like 'nofollow' rather than URLs, so that one is correctly excluded.

Suggested change
'link_rss',

Copilot uses AI. Check for mistakes.
Comment on lines +311 to +319
if ( ! filter_var( $old, FILTER_VALIDATE_URL ) ) {
WP_CLI::error(
sprintf(
'The --smart-url flag is designed for URL replacements, but "%s" is not a valid URL. ' .
'Please use a full URL (e.g., http://example.com) or remove the --smart-url flag.',
$old
)
);
}
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PHP's FILTER_VALIDATE_URL accepts URLs without schemes (e.g., 'example.com'), but the auto-detection only triggers for URLs starting with 'http://' or 'https://'. This creates an inconsistency where the validation would pass for 'example.com' if someone manually uses --smart-url, but it wouldn't auto-detect. The validation should either require a scheme (http:// or https://) or accept URLs without schemes consistently.

Copilot uses AI. Check for mistakes.
specify multiple columns.

[--smart-url]
Enable smart URL mode. Automatically skips 75+ WordPress core columns
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation claims '75+ WordPress core columns' are skipped, but counting the columns in the get_core_columns() method reveals only 61 columns. This number should be updated to accurately reflect the actual count, or the method should be expanded to match the claimed count.

Suggested change
Enable smart URL mode. Automatically skips 75+ WordPress core columns
Enable smart URL mode. Automatically skips dozens of WordPress core columns

Copilot uses AI. Check for mistakes.
that cannot contain URLs (like post_type, post_status, user_pass, etc.),
significantly improving performance for URL replacements. This is
particularly useful when migrating sites or changing domain names.
Performance: ~34% faster on large databases.
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description shows test results indicating 41.6% performance improvement, but the README states ~34%. While both might be valid depending on the database, it would be better to use consistent numbers or provide a range (e.g., '30-40% faster' or '~34-42% faster') to set accurate expectations.

Suggested change
Performance: ~34% faster on large databases.
Performance: ~34–42% faster on large databases (depending on the database).

Copilot uses AI. Check for mistakes.
Comment on lines +320 to +323
| replacement | flags |
| {SITEURL}/subdir | |
| newdomain.com | |
| newdomain.com | --dry-run |
Copy link

Copilot AI Jan 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test uses 'newdomain.com' as a replacement value, which is not a valid URL (missing scheme). This will cause FILTER_VALIDATE_URL to fail in the validation logic at line 311 of Search_Replace_Command.php when smart-url mode is auto-enabled. The test should use 'http://newdomain.com' or 'https://newdomain.com' instead.

Suggested change
| replacement | flags |
| {SITEURL}/subdir | |
| newdomain.com | |
| newdomain.com | --dry-run |
| replacement | flags |
| {SITEURL}/subdir | |
| http://newdomain.com | |
| http://newdomain.com | --dry-run |

Copilot uses AI. Check for mistakes.
@swissspidy
Copy link
Member

Thanks a lot! I think this goes in the right direction.

This is very much related to #186. A separate command might be interesting for this as it allows for future optimizations more easily, such as ones mentioned at https://make.wordpress.org/core/2025/11/27/wordpress-importer-can-now-migrate-urls-in-your-content/ (cc @adamziel)

@mrsdizzie
Copy link
Member

At a glance, I feel like the real feature here is an option that says "ignore all non text columns" and it feels like it would be useful beyond URLs (though URLs is probably the most popular use case for this). Id be curious of a breakdown of performance and if there is a lot to gain by adding anything beyond the single feature that ignores based on column types.

Feels like a lot of the hardcoded table names are already of a type that would be ignored anyway.

The regex pattern for potential column names seems fragile and prone to false positives (and again, the positives are already likely to have a type that would be ignored).

In other words, can we get most of the benefits here with just the single feature to ignore NON_TEXT_DATA_TYPES (which benefits all search-replace, not just when its a url) ? And have that as a --text-only flag or something like that?

Maybe another command could build on top of that, but to me that seems like a small general change that could maybe have a big impact if a lot of time is spent on those types of columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Introduce a dedicated search-replace url command

3 participants