Skip to content

⚡️ Speed up function validate_flows_custom_components by 18% in PR #11893 (flow-json-edited-flag)#11900

Closed
codeflash-ai[bot] wants to merge 1 commit intoflow-json-edited-flagfrom
codeflash/optimize-pr11893-2026-02-25T05.25.02
Closed

⚡️ Speed up function validate_flows_custom_components by 18% in PR #11893 (flow-json-edited-flag)#11900
codeflash-ai[bot] wants to merge 1 commit intoflow-json-edited-flagfrom
codeflash/optimize-pr11893-2026-02-25T05.25.02

Conversation

@codeflash-ai
Copy link
Copy Markdown
Contributor

@codeflash-ai codeflash-ai Bot commented Feb 25, 2026

⚡️ This pull request contains optimizations for PR #11893

If you approve this dependent PR, these changes will be merged into the original PR branch flow-json-edited-flag.

This PR will be automatically closed if the original PR is merged.


📄 18% (0.18x) speedup for validate_flows_custom_components in src/backend/base/langflow/api/utils/flow_validation.py

⏱️ Runtime : 449 microseconds 380 microseconds (best of 263 runs)

📝 Explanation and details

Brief: The optimized version reduces unnecessary work and object allocation in two small, focused ways: (1) avoid allocating a default empty list when reading nodes from a flow, and (2) short-circuit flows with missing/empty data before calling the single-flow validator. Those small changes cut down the number of function calls and temporary allocations in hot loops, producing the measured ~18% runtime improvement (449µs -> 380µs) without changing observable behavior.

What changed

  • validate_flow_custom_components:
    • Replaced flow_data.get("nodes", []) with flow_data.get("nodes") and an immediate falsy check. This avoids allocating a new empty list on every call when the "nodes" key is missing.
  • validate_flows_custom_components:
    • Added early guards to skip flows where data is falsy or where data.get("nodes") is falsy, instead of blindly calling validate_flow_custom_components for every flow.
    • Still delegates to validate_flow_custom_components for flows that actually have nodes (keeps logic centralized and unchanged for those cases).

Why these changes speed things up

  • Avoiding allocations: flow_data.get("nodes", []) creates a new empty list object every time the key is missing. In hot loops or when many flows are empty/missing nodes, that allocation (and subsequent GC pressure) adds measurable overhead. Using .get("nodes") and a falsy check avoids that allocation.
  • Fewer function calls: validate_flows_custom_components used to call validate_flow_custom_components for every flow, even those with no data or empty nodes. Each avoided call saves Python call overhead and the downstream work in _get_blocked_by_edited_flag. The profiler shows most time is spent inside _get_blocked_by_edited_flag; reducing the number of times we enter that path yields direct savings.
  • Better short-circuiting: Skipping work early for common/no-op inputs reduces CPU and memory churn while preserving semantics.

Profiler evidence

  • The heavy work remains in _get_blocked_by_edited_flag (expected), but optimized code reduces the total time spent in the validation call paths (see lower totals for both functions).
  • The line-by-line profiles show less time spent in the per-flow call path and in allocating defaults, matching the 18% end-to-end speedup.

Impact on workloads and tests

  • Big wins when many flows are empty or have no nodes (large_scale / many_flows tests). The annotated large-scale tests exercise exactly this scenario and are where the savings manifest (fewer allocations and calls per flow).
  • Small or single-flow tests are unaffected functionally and still pass: behavior is preserved because skipping a call for empty/missing nodes produces the same empty result validate_flow_custom_components would have returned.
  • Memory behavior improves slightly (fewer transient empty lists), which helps at scale.

Behavioral considerations

  • The logic preserves behavior for normal cases. An empty nodes list or missing nodes still results in no blocked components.
  • If someone stores a non-list but truthy object in "nodes", behavior is unchanged vs. before except that we rely on a falsy check to skip — this matches the previous semantics where an explicit empty default list led to the same skip. In practice flows use lists for "nodes", so this is safe.

Summary
Small, low-risk changes: eliminate an unnecessary default-list allocation and avoid needless validator calls for empty/no-op flows. That reduces CPU and allocation overhead in the hot path, giving the measured ~18% speedup on the provided benchmarks while keeping behavior centralized and intact.

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 57 Passed
🌀 Generated Regression Tests 19 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
⚙️ Click to see Existing Unit Tests
🌀 Click to see Generated Regression Tests
from typing import Any, Dict, List

# imports
import pytest  # used for our unit tests
from langflow.api.utils.flow_validation import validate_flows_custom_components


def test_empty_flows_list_returns_empty_dict():
    # Basic test: an empty list of flows should yield an empty dict (nothing to validate).
    flows: List[Dict[str, Any]] = []
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_flow_with_no_data_or_nodes_is_ignored():
    # If a flow has no data (None) it should not appear in the result.
    flows = [{"name": "Flow A", "data": None}, {"name": "Flow B"}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_flow_with_no_edited_nodes_returns_empty():
    # A flow with nodes but none marked as edited should produce no blocked entries.
    node = {
        "id": "node-1",
        "data": {
            "type": "SomeType",
            "node": {
                # edited not present (defaults to False in logic)
                "display_name": "Component A"
            }
        }
    }
    flows = [{"name": "Clean Flow", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_single_flow_with_edited_node_blocks_component():
    # When a node has node.node.edited = True, it should be listed in the blocked mapping.
    node = {
        "id": "node-123",
        "data": {
            "id": "node-123",  # explicit id in data should be used
            "type": "CustomType",
            "node": {
                "edited": True,
                "display_name": "FancyComponent"
            }
        }
    }
    flows = [{"name": "FlowWithEdited", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_missing_display_name_uses_type_and_missing_id_uses_node_top_level_id():
    # If display_name is missing/empty, fallback to node_data.type
    # If node_data.id is missing, fallback to top-level node['id']
    nested_node = {
        # top-level id used because data.id will be omitted
        "id": "top-level-id",
        "data": {
            "type": "TypeFallback",
            # omit 'id' here to force fallback to top-level node id
            "node": {
                "edited": True,
                # omit 'display_name' to force fallback to 'type'
            }
        }
    }
    flows = [{"name": "FallbacksFlow", "data": {"nodes": [nested_node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_recurses_into_nested_flow_nodes_and_collects_all_edited_entries():
    # Create a group/sub-flow node that contains nested nodes, some of which are edited.
    inner1 = {
        "id": "inner-1",
        "data": {
            "id": "inner-1",
            "type": "InnerType",
            "node": {"edited": True, "display_name": "InnerComp1"}
        }
    }
    inner2 = {
        "id": "inner-2",
        "data": {
            "id": "inner-2",
            "type": "InnerType",
            "node": {"edited": False, "display_name": "InnerComp2"}
        }
    }
    # Outer node contains a 'flow' dict with nested data.nodes
    outer = {
        "id": "outer-1",
        "data": {
            "id": "outer-1",
            "type": "Group",
            "node": {
                # outer node itself is edited
                "edited": True,
                "display_name": "OuterGroup",
                # contains a nested flow with nodes
                "flow": {
                    "data": {
                        "nodes": [inner1, inner2]
                    }
                }
            }
        }
    }
    flows = [{"name": "NestedFlow", "data": {"nodes": [outer]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output
    blocked = result["NestedFlow"]


def test_multiple_flows_only_include_blocked_ones():
    # Mix flows where some have edited nodes and others do not.
    edited_node = {
        "id": "e1",
        "data": {
            "id": "e1",
            "type": "T",
            "node": {"edited": True, "display_name": "EditedOne"}
        }
    }
    clean_node = {
        "id": "c1",
        "data": {
            "id": "c1",
            "type": "T",
            "node": {"edited": False, "display_name": "CleanOne"}
        }
    }
    flows = [
        {"name": "CleanFlow", "data": {"nodes": [clean_node]}},
        {"name": "BlockedFlow", "data": {"nodes": [edited_node]}},
        {"name": "EmptyDataFlow", "data": {}},
    ]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_empty_display_name_string_falls_back_to_type():
    # If display_name is an empty string, it's falsy and should fall back to type.
    node = {
        "id": "x1",
        "data": {
            "id": "x1",
            "type": "TypeX",
            "node": {"edited": True, "display_name": ""}  # empty string -> fallback to type
        }
    }
    flows = [{"name": "EmptyDisp", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output



def test_large_scale_many_flows_and_nodes_performance_and_correctness():
    # Large-scale test: construct 100 flows each with 10 nodes = 1000 nodes total.
    # Mark exactly one node per flow as edited to ensure we have many blocked flows.
    flows = []
    num_flows = 100
    nodes_per_flow = 10
    for i in range(num_flows):
        nodes = []
        for j in range(nodes_per_flow):
            node_id = f"f{i}-n{j}"
            # Mark the first node in each flow as edited; others not edited.
            edited_flag = (j == 0)
            nodes.append({
                "id": node_id,
                "data": {
                    "id": node_id,
                    "type": "BulkType",
                    "node": {
                        "edited": edited_flag,
                        "display_name": f"BulkComp{i}-{j}" if edited_flag else f"BulkComp{i}-{j}"
                    }
                }
            })
        flows.append({"name": f"Flow_{i}", "data": {"nodes": nodes}})

    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output
    # Validate a few random samples deterministic by index
    for idx in (0, 10, 50, 99):
        flow_name = f"Flow_{idx}"
        expected_entry = [f"BulkComp{idx}-0 (f{idx}-n0)"]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
from typing import Dict, List

# imports
import pytest  # used for our unit tests
# Import the functions under test from their actual module location.
from langflow.api.utils.flow_validation import (
    _get_blocked_by_edited_flag, validate_flow_custom_components,
    validate_flows_custom_components)


def test_empty_flows_list_returns_empty_dict():
    # If no flows are provided, we expect an empty mapping of blocked flows.
    codeflash_output = validate_flows_custom_components([]); result = codeflash_output


def test_flows_with_no_data_or_empty_nodes_are_ignored():
    # Flows without 'data', with None data, or with empty nodes shouldn't be reported.
    flows = [
        {"name": "no_data_flow"},  # missing 'data' key
        {"name": "none_data_flow", "data": None},  # explicit None
        {"name": "empty_nodes_flow", "data": {"nodes": []}},  # empty nodes list
    ]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_single_flow_with_one_edited_node_detected():
    # Create a flow with one node that has node['data']['node']['edited'] == True.
    node = {
        "id": "node1",
        "data": {
            "id": "node1",
            "type": "CustomType",
            "node": {"edited": True, "display_name": "MyComponent"},
        },
    }
    flows = [{"name": "FlowA", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_validate_flow_custom_components_direct_returns_list():
    # Directly test the single-flow validator; should return a list of blocked descriptions.
    node = {
        "id": "n-2",
        "data": {
            "type": "TypeX",
            "node": {"edited": True},  # no display_name provided -> fallback to type
            # note: node_data has no 'id', so fallback to node.get('id')
        },
    }
    flow_data = {"nodes": [node]}
    blocked = validate_flow_custom_components(flow_data)


def test_missing_display_name_uses_type_fallback_and_missing_ids_use_unknown():
    # When display_name isn't present, use node_data['type'].
    # When both node_data['id'] and node['id'] are missing, use "unknown".
    node = {
        # intentionally omit node-level 'id'
        "data": {
            # type present for display fallback
            "type": "FallbackType",
            "node": {"edited": True},  # triggers blocking
        },
    }
    flows = [{"name": "FlowFallback", "data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_nested_flow_nodes_are_recursively_checked():
    # Build a node that contains a nested flow with its own nodes list.
    nested_node = {
        "id": "nested1",
        "data": {
            "id": "nested1",
            "type": "NestedType",
            "node": {"edited": True, "display_name": "NestedComp"},
        },
    }
    group_node = {
        "id": "group-1",
        "data": {
            "id": "group-1",
            "type": "Group",
            "node": {
                # no edited flag at the group level but includes a nested flow
                "flow": {"data": {"nodes": [nested_node]}},
            },
        },
    }
    flows = [{"name": "FlowWithGroup", "data": {"nodes": [group_node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_deeply_nested_flow_detects_deep_edited_node():
    # Create a 3-level nested flow where the deepest node has edited=True.
    deep_node = {
        "id": "deep",
        "data": {"id": "deep", "type": "DeepType", "node": {"edited": True, "display_name": "DeepComp"}},
    }
    level2 = {"id": "lvl2", "data": {"id": "lvl2", "type": "L2", "node": {"flow": {"data": {"nodes": [deep_node]}}}}}
    level1 = {"id": "lvl1", "data": {"id": "lvl1", "type": "L1", "node": {"flow": {"data": {"nodes": [level2]}}}}}
    flows = [{"name": "DeepFlow", "data": {"nodes": [level1]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test_special_characters_in_display_name_and_name_fallback():
    # Display names may contain special/unicode characters and should be preserved.
    node = {
        "id": "sp1",
        "data": {
            "id": "sp1",
            "type": "TypeSpecial",
            "node": {"edited": True, "display_name": "Comp-©-测试-ß"},
        },
    }
    # Flow without a 'name' key should be labeled "Unknown Flow"
    flows = [{"data": {"nodes": [node]}}]
    codeflash_output = validate_flows_custom_components(flows); result = codeflash_output


def test__get_blocked_by_edited_flag_direct_on_various_node_structures():
    # Direct unit test of the helper to ensure it handles mixed nodes robustly.
    nodes = [
        # non-edited node
        {"id": "a", "data": {"id": "a", "type": "A", "node": {"edited": False}}},
        # edited node with display_name
        {"id": "b", "data": {"id": "b", "type": "B", "node": {"edited": True, "display_name": "Bcomp"}}},
        # edited node without display_name
        {"id": "c", "data": {"id": "c", "type": "C", "node": {"edited": True}}},
    ]
    blocked = _get_blocked_by_edited_flag(nodes)


def test_large_scale_many_flows_with_sparse_blocked_items():
    # Construct 1000 flows where every 100th flow contains one edited node.
    large_flows: List[Dict] = []
    expected_keys = []
    total = 1000
    for i in range(total):
        name = f"flow-{i}"
        if i % 100 == 0:
            # Edited node present in this flow
            node = {
                "id": f"n-{i}",
                "data": {
                    "id": f"n-{i}",
                    "type": "BulkType",
                    "node": {"edited": True, "display_name": f"BulkComp-{i}"},
                },
            }
            large_flows.append({"name": name, "data": {"nodes": [node]}})
            expected_keys.append(name)
        else:
            # Flow with no nodes or non-edited node (ignored by validation)
            large_flows.append({"name": name, "data": {"nodes": []}})
    codeflash_output = validate_flows_custom_components(large_flows); result = codeflash_output
    # Confirm that for a sample of blocked flows, descriptions are correct
    for k in expected_keys[:5]:
        # Ensure the description string structure is present for each expected flow
        descriptions = result[k]
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr11893-2026-02-25T05.25.02 and push.

Codeflash

Brief: The optimized version reduces unnecessary work and object allocation in two small, focused ways: (1) avoid allocating a default empty list when reading nodes from a flow, and (2) short-circuit flows with missing/empty data before calling the single-flow validator. Those small changes cut down the number of function calls and temporary allocations in hot loops, producing the measured ~18% runtime improvement (449µs -> 380µs) without changing observable behavior.

What changed
- validate_flow_custom_components:
  - Replaced flow_data.get("nodes", []) with flow_data.get("nodes") and an immediate falsy check. This avoids allocating a new empty list on every call when the "nodes" key is missing.
- validate_flows_custom_components:
  - Added early guards to skip flows where data is falsy or where data.get("nodes") is falsy, instead of blindly calling validate_flow_custom_components for every flow.
  - Still delegates to validate_flow_custom_components for flows that actually have nodes (keeps logic centralized and unchanged for those cases).

Why these changes speed things up
- Avoiding allocations: flow_data.get("nodes", []) creates a new empty list object every time the key is missing. In hot loops or when many flows are empty/missing nodes, that allocation (and subsequent GC pressure) adds measurable overhead. Using .get("nodes") and a falsy check avoids that allocation.
- Fewer function calls: validate_flows_custom_components used to call validate_flow_custom_components for every flow, even those with no data or empty nodes. Each avoided call saves Python call overhead and the downstream work in _get_blocked_by_edited_flag. The profiler shows most time is spent inside _get_blocked_by_edited_flag; reducing the number of times we enter that path yields direct savings.
- Better short-circuiting: Skipping work early for common/no-op inputs reduces CPU and memory churn while preserving semantics.

Profiler evidence
- The heavy work remains in _get_blocked_by_edited_flag (expected), but optimized code reduces the total time spent in the validation call paths (see lower totals for both functions).
- The line-by-line profiles show less time spent in the per-flow call path and in allocating defaults, matching the 18% end-to-end speedup.

Impact on workloads and tests
- Big wins when many flows are empty or have no nodes (large_scale / many_flows tests). The annotated large-scale tests exercise exactly this scenario and are where the savings manifest (fewer allocations and calls per flow).
- Small or single-flow tests are unaffected functionally and still pass: behavior is preserved because skipping a call for empty/missing nodes produces the same empty result validate_flow_custom_components would have returned.
- Memory behavior improves slightly (fewer transient empty lists), which helps at scale.

Behavioral considerations
- The logic preserves behavior for normal cases. An empty nodes list or missing nodes still results in no blocked components.
- If someone stores a non-list but truthy object in "nodes", behavior is unchanged vs. before except that we rely on a falsy check to skip — this matches the previous semantics where an explicit empty default list led to the same skip. In practice flows use lists for "nodes", so this is safe.

Summary
Small, low-risk changes: eliminate an unnecessary default-list allocation and avoid needless validator calls for empty/no-op flows. That reduces CPU and allocation overhead in the hot path, giving the measured ~18% speedup on the provided benchmarks while keeping behavior centralized and intact.
@codeflash-ai codeflash-ai Bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Feb 25, 2026
@github-actions github-actions Bot added the community Pull Request from an external contributor label Feb 25, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 66.66667% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 35.54%. Comparing base (640e486) to head (05b8e98).
⚠️ Report is 109 commits behind head on flow-json-edited-flag.

Files with missing lines Patch % Lines
...backend/base/langflow/api/utils/flow_validation.py 66.66% 2 Missing ⚠️

❌ Your project status has failed because the head coverage (42.03%) is below the target coverage (60.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@                    Coverage Diff                    @@
##           flow-json-edited-flag   #11900      +/-   ##
=========================================================
+ Coverage                  35.49%   35.54%   +0.04%     
=========================================================
  Files                       1528     1528              
  Lines                      73715    73657      -58     
  Branches                   11031    11015      -16     
=========================================================
+ Hits                       26168    26182      +14     
+ Misses                     46135    46061      -74     
- Partials                    1412     1414       +2     
Flag Coverage Δ
backend 56.22% <66.66%> (+0.07%) ⬆️
lfx 42.03% <ø> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...backend/base/langflow/api/utils/flow_validation.py 95.23% <66.66%> (-1.25%) ⬇️

... and 7 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@ogabrielluiz
Copy link
Copy Markdown
Contributor

Closing automated codeflash PR.

@codeflash-ai codeflash-ai Bot deleted the codeflash/optimize-pr11893-2026-02-25T05.25.02 branch March 3, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant