⚡️ Speed up function `validate_flows_custom_components` by 18% in PR #11893 (`flow-json-edited-flag`) by codeflash-ai[bot] · Pull Request #11900 · langflow-ai/langflow

codeflash-ai · 2026-02-25T05:25:08Z

⚡️ This pull request contains optimizations for PR #11893

If you approve this dependent PR, these changes will be merged into the original PR branch flow-json-edited-flag.

This PR will be automatically closed if the original PR is merged.

📄 18% (0.18x) speedup for `validate_flows_custom_components` in `src/backend/base/langflow/api/utils/flow_validation.py`

⏱️ Runtime : 449 microseconds → 380 microseconds (best of 263 runs)

📝 Explanation and details

Brief: The optimized version reduces unnecessary work and object allocation in two small, focused ways: (1) avoid allocating a default empty list when reading nodes from a flow, and (2) short-circuit flows with missing/empty data before calling the single-flow validator. Those small changes cut down the number of function calls and temporary allocations in hot loops, producing the measured ~18% runtime improvement (449µs -> 380µs) without changing observable behavior.

What changed

validate_flow_custom_components:
- Replaced flow_data.get("nodes", []) with flow_data.get("nodes") and an immediate falsy check. This avoids allocating a new empty list on every call when the "nodes" key is missing.
validate_flows_custom_components:
- Added early guards to skip flows where data is falsy or where data.get("nodes") is falsy, instead of blindly calling validate_flow_custom_components for every flow.
- Still delegates to validate_flow_custom_components for flows that actually have nodes (keeps logic centralized and unchanged for those cases).

Why these changes speed things up

Avoiding allocations: flow_data.get("nodes", []) creates a new empty list object every time the key is missing. In hot loops or when many flows are empty/missing nodes, that allocation (and subsequent GC pressure) adds measurable overhead. Using .get("nodes") and a falsy check avoids that allocation.
Fewer function calls: validate_flows_custom_components used to call validate_flow_custom_components for every flow, even those with no data or empty nodes. Each avoided call saves Python call overhead and the downstream work in _get_blocked_by_edited_flag. The profiler shows most time is spent inside _get_blocked_by_edited_flag; reducing the number of times we enter that path yields direct savings.
Better short-circuiting: Skipping work early for common/no-op inputs reduces CPU and memory churn while preserving semantics.

Profiler evidence

The heavy work remains in _get_blocked_by_edited_flag (expected), but optimized code reduces the total time spent in the validation call paths (see lower totals for both functions).
The line-by-line profiles show less time spent in the per-flow call path and in allocating defaults, matching the 18% end-to-end speedup.

Impact on workloads and tests

Big wins when many flows are empty or have no nodes (large_scale / many_flows tests). The annotated large-scale tests exercise exactly this scenario and are where the savings manifest (fewer allocations and calls per flow).
Small or single-flow tests are unaffected functionally and still pass: behavior is preserved because skipping a call for empty/missing nodes produces the same empty result validate_flow_custom_components would have returned.
Memory behavior improves slightly (fewer transient empty lists), which helps at scale.

Behavioral considerations

The logic preserves behavior for normal cases. An empty nodes list or missing nodes still results in no blocked components.
If someone stores a non-list but truthy object in "nodes", behavior is unchanged vs. before except that we rely on a falsy check to skip — this matches the previous semantics where an explicit empty default list led to the same skip. In practice flows use lists for "nodes", so this is safe.

Summary
Small, low-risk changes: eliminate an unnecessary default-list allocation and avoid needless validator calls for empty/no-op flows. That reduces CPU and allocation overhead in the hot path, giving the measured ~18% speedup on the provided benchmarks while keeping behavior centralized and intact.

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	✅ 57 Passed
🌀 Generated Regression Tests	✅ 19 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

⚙️ Click to see Existing Unit Tests

🌀 Click to see Generated Regression Tests

from typing import Any, Dict, List

# imports
import pytest  # used for our unit tests
from langflow.api.utils.flow_validation import validate_flows_custom_components


def test_empty_flows_list_returns_empty_dict():
    # Basic test: an empty list of flows should yield an empty dict (nothing to validate).
    flows: List[Dict[str, Any]] = []
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_flow_with_no_data_or_nodes_is_ignored():
    # If a flow has no data (None) it should not appear in the result.
    flows = [{"name": "Flow A", "data": None}, {"name": "Flow B"}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_flow_with_no_edited_nodes_returns_empty():
    # A flow with nodes but none marked as edited should produce no blocked entries.
    node = {
        "id": "node-1",
        "data": {
            "type": "SomeType",
            "node": {
                # edited not present (defaults to False in logic)
                "display_name": "Component A"
            }
        }
    }
    flows = [{"name": "Clean Flow", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_single_flow_with_edited_node_blocks_component():
    # When a node has node.node.edited = True, it should be listed in the blocked mapping.
    node = {
        "id": "node-123",
        "data": {
            "id": "node-123",  # explicit id in data should be used
            "type": "CustomType",
            "node": {
                "edited": True,
                "display_name": "FancyComponent"
            }
        }
    }
    flows = [{"name": "FlowWithEdited", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_missing_display_name_uses_type_and_missing_id_uses_node_top_level_id():
    # If display_name is missing/empty, fallback to node_data.type
    # If node_data.id is missing, fallback to top-level node['id']
    nested_node = {
        # top-level id used because data.id will be omitted
        "id": "top-level-id",
        "data": {
            "type": "TypeFallback",
            # omit 'id' here to force fallback to top-level node id
            "node": {
                "edited": True,
                # omit 'display_name' to force fallback to 'type'
            }
        }
    }
    flows = [{"name": "FallbacksFlow", "data": {"nodes": [nested_node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_recurses_into_nested_flow_nodes_and_collects_all_edited_entries():
    # Create a group/sub-flow node that contains nested nodes, some of which are edited.
    inner1 = {
        "id": "inner-1",
        "data": {
            "id": "inner-1",
            "type": "InnerType",
            "node": {"edited": True, "display_name": "InnerComp1"}
        }
    }
    inner2 = {
        "id": "inner-2",
        "data": {
            "id": "inner-2",
            "type": "InnerType",
            "node": {"edited": False, "display_name": "InnerComp2"}
        }
    }
    # Outer node contains a 'flow' dict with nested data.nodes
    outer = {
        "id": "outer-1",
        "data": {
            "id": "outer-1",
            "type": "Group",
            "node": {
                # outer node itself is edited
                "edited": True,
                "display_name": "OuterGroup",
                # contains a nested flow with nodes
                "flow": {
                    "data": {
                        "nodes": [inner1, inner2]
                    }
                }
            }
        }
    }
    flows = [{"name": "NestedFlow", "data": {"nodes": [outer]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output
    blocked = result["NestedFlow"]


def test_multiple_flows_only_include_blocked_ones():
    # Mix flows where some have edited nodes and others do not.
    edited_node = {
        "id": "e1",
        "data": {
            "id": "e1",
            "type": "T",
            "node": {"edited": True, "display_name": "EditedOne"}
        }
    }
    clean_node = {
        "id": "c1",
        "data": {
            "id": "c1",
            "type": "T",
            "node": {"edited": False, "display_name": "CleanOne"}
        }
    }
    flows = [
        {"name": "CleanFlow", "data": {"nodes": [clean_node]}},
        {"name": "BlockedFlow", "data": {"nodes": [edited_node]}},
        {"name": "EmptyDataFlow", "data": {}},
    ]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_empty_display_name_string_falls_back_to_type():
    # If display_name is an empty string, it's falsy and should fall back to type.
    node = {
        "id": "x1",
        "data": {
            "id": "x1",
            "type": "TypeX",
            "node": {"edited": True, "display_name": ""}  # empty string -> fallback to type
        }
    }
    flows = [{"name": "EmptyDisp", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output



def test_large_scale_many_flows_and_nodes_performance_and_correctness():
    # Large-scale test: construct 100 flows each with 10 nodes = 1000 nodes total.
    # Mark exactly one node per flow as edited to ensure we have many blocked flows.
    flows = []
    num_flows = 100
    nodes_per_flow = 10
    for i in range(num_flows):
        nodes = []
        for j in range(nodes_per_flow):
            node_id = f"f{i}-n{j}"
            # Mark the first node in each flow as edited; others not edited.
            edited_flag = (j == 0)
            nodes.append({
                "id": node_id,
                "data": {
                    "id": node_id,
                    "type": "BulkType",
                    "node": {
                        "edited": edited_flag,
                        "display_name": f"BulkComp{i}-{j}" if edited_flag else f"BulkComp{i}-{j}"
                    }
                }
            })
        flows.append({"name": f"Flow_{i}", "data": {"nodes": nodes}})

    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output
    # Validate a few random samples deterministic by index
    for idx in (0, 10, 50, 99):
        flow_name = f"Flow_{idx}"
        expected_entry = [f"BulkComp{idx}-0 (f{idx}-n0)"]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, List

# imports
import pytest  # used for our unit tests
# Import the functions under test from their actual module location.
from langflow.api.utils.flow_validation import (
    _get_blocked_by_edited_flag, validate_flow_custom_components,
    validate_flows_custom_components)


def test_empty_flows_list_returns_empty_dict():
    # If no flows are provided, we expect an empty mapping of blocked flows.
    codeflash_output = validate_flows_custom_components([]); result = codeflash_output


def test_flows_with_no_data_or_empty_nodes_are_ignored():
    # Flows without 'data', with None data, or with empty nodes shouldn't be reported.
    flows = [
        {"name": "no_data_flow"},  # missing 'data' key
        {"name": "none_data_flow", "data": None},  # explicit None
        {"name": "empty_nodes_flow", "data": {"nodes": []}},  # empty nodes list
    ]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_single_flow_with_one_edited_node_detected():
    # Create a flow with one node that has node['data']['node']['edited'] == True.
    node = {
        "id": "node1",
        "data": {
            "id": "node1",
            "type": "CustomType",
            "node": {"edited": True, "display_name": "MyComponent"},
        },
    }
    flows = [{"name": "FlowA", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_validate_flow_custom_components_direct_returns_list():
    # Directly test the single-flow validator; should return a list of blocked descriptions.
    node = {
        "id": "n-2",
        "data": {
            "type": "TypeX",
            "node": {"edited": True},  # no display_name provided -> fallback to type
            # note: node_data has no 'id', so fallback to node.get('id')
        },
    }
    flow_data = {"nodes": [node]}
    blocked = validate_flow_custom_components(flow_data)


def test_missing_display_name_uses_type_fallback_and_missing_ids_use_unknown():
    # When display_name isn't present, use node_data['type'].
    # When both node_data['id'] and node['id'] are missing, use "unknown".
    node = {
        # intentionally omit node-level 'id'
        "data": {
            # type present for display fallback
            "type": "FallbackType",
            "node": {"edited": True},  # triggers blocking
        },
    }
    flows = [{"name": "FlowFallback", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_nested_flow_nodes_are_recursively_checked():
    # Build a node that contains a nested flow with its own nodes list.
    nested_node = {
        "id": "nested1",
        "data": {
            "id": "nested1",
            "type": "NestedType",
            "node": {"edited": True, "display_name": "NestedComp"},
        },
    }
    group_node = {
        "id": "group-1",
        "data": {
            "id": "group-1",
            "type": "Group",
            "node": {
                # no edited flag at the group level but includes a nested flow
                "flow": {"data": {"nodes": [nested_node]}},
            },
        },
    }
    flows = [{"name": "FlowWithGroup", "data": {"nodes": [group_node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_deeply_nested_flow_detects_deep_edited_node():
    # Create a 3-level nested flow where the deepest node has edited=True.
    deep_node = {
        "id": "deep",
        "data": {"id": "deep", "type": "DeepType", "node": {"edited": True, "display_name": "DeepComp"}},
    }
    level2 = {"id": "lvl2", "data": {"id": "lvl2", "type": "L2", "node": {"flow": {"data": {"nodes": [deep_node]}}}}}
    level1 = {"id": "lvl1", "data": {"id": "lvl1", "type": "L1", "node": {"flow": {"data": {"nodes": [level2]}}}}}
    flows = [{"name": "DeepFlow", "data": {"nodes": [level1]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_special_characters_in_display_name_and_name_fallback():
    # Display names may contain special/unicode characters and should be preserved.
    node = {
        "id": "sp1",
        "data": {
            "id": "sp1",
            "type": "TypeSpecial",
            "node": {"edited": True, "display_name": "Comp-©-测试-ß"},
        },
    }
    # Flow without a 'name' key should be labeled "Unknown Flow"
    flows = [{"data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test__get_blocked_by_edited_flag_direct_on_various_node_structures():
    # Direct unit test of the helper to ensure it handles mixed nodes robustly.
    nodes = [
        # non-edited node
        {"id": "a", "data": {"id": "a", "type": "A", "node": {"edited": False}}},
        # edited node with display_name
        {"id": "b", "data": {"id": "b", "type": "B", "node": {"edited": True, "display_name": "Bcomp"}}},
        # edited node without display_name
        {"id": "c", "data": {"id": "c", "type": "C", "node": {"edited": True}}},
    ]
    blocked = _get_blocked_by_edited_flag(nodes)


def test_large_scale_many_flows_with_sparse_blocked_items():
    # Construct 1000 flows where every 100th flow contains one edited node.
    large_flows: List[Dict] = []
    expected_keys = []
    total = 1000
    for i in range(total):
        name = f"flow-{i}"
        if i % 100 == 0:
            # Edited node present in this flow
            node = {
                "id": f"n-{i}",
                "data": {
                    "id": f"n-{i}",
                    "type": "BulkType",
                    "node": {"edited": True, "display_name": f"BulkComp-{i}"},
                },
            }
            large_flows.append({"name": name, "data": {"nodes": [node]}})
            expected_keys.append(name)
        else:
            # Flow with no nodes or non-edited node (ignored by validation)
            large_flows.append({"name": name, "data": {"nodes": []}})
    codeflash_output = validate_flows_custom_components(large_flows); result = codeflash_output
    # Confirm that for a sample of blocked flows, descriptions are correct
    for k in expected_keys[:5]:
        # Ensure the description string structure is present for each expected flow
        descriptions = result[k]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr11893-2026-02-25T05.25.02 and push.

Brief: The optimized version reduces unnecessary work and object allocation in two small, focused ways: (1) avoid allocating a default empty list when reading nodes from a flow, and (2) short-circuit flows with missing/empty data before calling the single-flow validator. Those small changes cut down the number of function calls and temporary allocations in hot loops, producing the measured ~18% runtime improvement (449µs -> 380µs) without changing observable behavior. What changed - validate_flow_custom_components: - Replaced flow_data.get("nodes", []) with flow_data.get("nodes") and an immediate falsy check. This avoids allocating a new empty list on every call when the "nodes" key is missing. - validate_flows_custom_components: - Added early guards to skip flows where data is falsy or where data.get("nodes") is falsy, instead of blindly calling validate_flow_custom_components for every flow. - Still delegates to validate_flow_custom_components for flows that actually have nodes (keeps logic centralized and unchanged for those cases). Why these changes speed things up - Avoiding allocations: flow_data.get("nodes", []) creates a new empty list object every time the key is missing. In hot loops or when many flows are empty/missing nodes, that allocation (and subsequent GC pressure) adds measurable overhead. Using .get("nodes") and a falsy check avoids that allocation. - Fewer function calls: validate_flows_custom_components used to call validate_flow_custom_components for every flow, even those with no data or empty nodes. Each avoided call saves Python call overhead and the downstream work in _get_blocked_by_edited_flag. The profiler shows most time is spent inside _get_blocked_by_edited_flag; reducing the number of times we enter that path yields direct savings. - Better short-circuiting: Skipping work early for common/no-op inputs reduces CPU and memory churn while preserving semantics. Profiler evidence - The heavy work remains in _get_blocked_by_edited_flag (expected), but optimized code reduces the total time spent in the validation call paths (see lower totals for both functions). - The line-by-line profiles show less time spent in the per-flow call path and in allocating defaults, matching the 18% end-to-end speedup. Impact on workloads and tests - Big wins when many flows are empty or have no nodes (large_scale / many_flows tests). The annotated large-scale tests exercise exactly this scenario and are where the savings manifest (fewer allocations and calls per flow). - Small or single-flow tests are unaffected functionally and still pass: behavior is preserved because skipping a call for empty/missing nodes produces the same empty result validate_flow_custom_components would have returned. - Memory behavior improves slightly (fewer transient empty lists), which helps at scale. Behavioral considerations - The logic preserves behavior for normal cases. An empty nodes list or missing nodes still results in no blocked components. - If someone stores a non-list but truthy object in "nodes", behavior is unchanged vs. before except that we rely on a falsy check to skip — this matches the previous semantics where an explicit empty default list led to the same skip. In practice flows use lists for "nodes", so this is safe. Summary Small, low-risk changes: eliminate an unnecessary default-list allocation and avoid needless validator calls for empty/no-op flows. That reduces CPU and allocation overhead in the hot path, giving the measured ~18% speedup on the provided benchmarks while keeping behavior centralized and intact.

codecov · 2026-02-25T05:29:52Z

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.54%. Comparing base (640e486) to head (05b8e98).
⚠️ Report is 109 commits behind head on flow-json-edited-flag.

Files with missing lines	Patch %	Lines
...backend/base/langflow/api/utils/flow_validation.py	66.66%	2 Missing ⚠️

❌ Your project status has failed because the head coverage (42.03%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@                    Coverage Diff                    @@
##           flow-json-edited-flag   #11900      +/-   ##
=========================================================
+ Coverage                  35.49%   35.54%   +0.04%     
=========================================================
  Files                       1528     1528              
  Lines                      73715    73657      -58     
  Branches                   11031    11015      -16     
=========================================================
+ Hits                       26168    26182      +14     
+ Misses                     46135    46061      -74     
- Partials                    1412     1414       +2

Flag	Coverage Δ
backend	`56.22% <66.66%> (+0.07%)`	⬆️
lfx	`42.03% <ø> (+0.09%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...backend/base/langflow/api/utils/flow_validation.py	`95.23% <66.66%> (-1.25%)`	⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ogabrielluiz · 2026-03-03T18:09:40Z

Closing automated codeflash PR.

codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 25, 2026

github-actions Bot added the community Pull Request from an external contributor label Feb 25, 2026

ogabrielluiz closed this Mar 3, 2026

codeflash-ai Bot deleted the codeflash/optimize-pr11893-2026-02-25T05.25.02 branch March 3, 2026 18:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚡️ Speed up function `validate_flows_custom_components` by 18% in PR #11893 (`flow-json-edited-flag`)#11900

⚡️ Speed up function `validate_flows_custom_components` by 18% in PR #11893 (`flow-json-edited-flag`)#11900
codeflash-ai[bot] wants to merge 1 commit intoflow-json-edited-flagfrom
codeflash/optimize-pr11893-2026-02-25T05.25.02

codeflash-ai Bot commented Feb 25, 2026

Uh oh!

codecov Bot commented Feb 25, 2026 •

edited

Loading

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

codeflash-ai Bot commented Feb 25, 2026

⚡️ This pull request contains optimizations for PR #11893

📄 18% (0.18x) speedup for validate_flows_custom_components in src/backend/base/langflow/api/utils/flow_validation.py

📝 Explanation and details

Uh oh!

codecov Bot commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ogabrielluiz commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📄 18% (0.18x) speedup for `validate_flows_custom_components` in `src/backend/base/langflow/api/utils/flow_validation.py`

codecov Bot commented Feb 25, 2026 •

edited

Loading