Skip to content

Remove cdo/python-cdo from fre-cli, add xarray time averager#830

Open
Copilot wants to merge 8 commits intomainfrom
copilot/deprecate-cdo-python-cdo-use
Open

Remove cdo/python-cdo from fre-cli, add xarray time averager#830
Copilot wants to merge 8 commits intomainfrom
copilot/deprecate-cdo-python-cdo-use

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 1, 2026

Describe your changes

CDO/python-cdo has always been an awkward fit — it doesn't handle calendar='julian' well and adds a heavy external dependency. This PR removes CDO entirely and adds an xarray-based time averager as a replacement:

  • Removed CDO completely: cdoTimeAverager.py is now a thin stub that emits a loud FutureWarning ("CDO/python-cdo has been REMOVED") and delegates all work to the new xarrayTimeAverager
  • New xarrayTimeAverager class (xarrayTimeAverager.py): xarray-based time averaging and climatology generation supporting avg_type='all', 'seas', and 'month' (both weighted and unweighted)
  • Renamed frepytoolsTimeAveragerNumpyTimeAverager: the class in frepytoolsTimeAverager.py is now NumpyTimeAverager with the old name kept as a backward-compatible alias
  • pkg='cdo' entry point preserved but warns loudly via FutureWarning and routes to xarray under the hood — users are directed to switch to pkg='xarray' or pkg='fre-python-tools'
  • Added pkg='xarray' as a new CLI choice in both gen-time-averages and gen-time-averages-wrapper
  • Replaced CDO operations with xarray in shared code paths:
    • generate_time_averages.py: Cdo().mergetime()xr.open_mfdataset() (this was used by all packages, not just CDO)
    • frenctoolsTimeAverager.py: Cdo().select()xr.open_dataset().sel(time=ds['time'].dt.month == idx)
  • Removed cdo/python-cdo from required dependencies in environment.yml, meta.yaml, pyproject.toml
  • Updated pylintrc: configured class-rgx regex to allow mixedCase class names (e.g. xarrayTimeAverager, cdoTimeAverager) alongside PascalCase
  • Fixed weighted time-average dtype failures in xarrayTimeAverager:
    • _compute_time_weights now robustly handles all three cases xarray produces for time_bnds differences: timedelta64 (converted via astype('timedelta64[s]') / 86400), cftime object timedeltas (using .days + .seconds/86400), and numeric float/int (direct cast) — using numpy .values instead of the unreliable xarray .dt accessor
    • Added _is_numeric() guard in all weighted helper functions to skip non-numeric time-dependent variables (e.g. average_T1/average_T2 decoded as datetime64 by xarray for JULIAN-calendar files) that cannot be multiplied by float weights
  • Comprehensive unit tests added in test_xarrayTimeAverager.py (53 tests across 6 test classes):
    • TestIsNumeric — dtype classification helper for float, int, uint, datetime64, timedelta64, object
    • TestComputeTimeWeights — all three time_bnds encoding paths (float, timedelta64, no-bnds fallback) plus known-value correctness checks
    • TestWeightedTimeMean — numerical correctness, dim elimination, attribute preservation, non-numeric variable handling
    • TestWeightedSeasonalMean — season grouping, MAM/JJA/SON weighted values verified against known formulas
    • TestWeightedMonthlyMean — month dim, 12-month output, 2-year weighted mean correctness
    • TestXarrayTimeAveragerGenerateTimavg — full NetCDF round-trip tests for all/seas/month × weighted/unweighted, per-month file creation, ValueError path, input file not clobbered
  • Parametrized test cases expanded: added missing weighted seas and month cases (single-file and multi-file) to test_generate_time_averages.py
  • Other test updates:
    • New xarray wrapper tests (annual from monthly, annual from annual, monthly from monthly)
    • CDO test skip marks removed — CDO entry point now redirects to xarray, so tests run without python-cdo installed
    • test_cdoTimeAverager.py updated to verify FutureWarning is emitted

Issue ticket number and link (if applicable)

Checklist before requesting a review

  • I ran my code
  • I tried to make my code readable
  • I tried to comment my code
  • I wrote a new test, if applicable
  • I wrote new instructions/documentation, if applicable
  • I ran pytest and inspected it's output
  • I ran pylint and attempted to implement some of it's feedback
  • No print statements; all user-facing info uses logging module

Copilot AI linked an issue Apr 1, 2026 that may be closed by this pull request
Copilot AI and others added 3 commits April 1, 2026 23:51
…ion warnings, remove from required deps

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/83cd5af3-5071-4657-ba27-a32d87e1ded5

Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/83cd5af3-5071-4657-ba27-a32d87e1ded5

Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
Copilot AI changed the title [WIP] Deprecate cdo/python-cdo use in fre-cli Deprecate cdo/python-cdo use in fre-cli Apr 1, 2026
Copilot AI requested a review from ilaflott April 2, 2026 00:00
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 2, 2026

Codecov Report

❌ Patch coverage is 85.33333% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.29%. Comparing base (7e80492) to head (8250249).

Files with missing lines Patch % Lines
...p/generate_time_averages/frenctoolsTimeAverager.py 4.54% 21 Missing ⚠️
...e/app/generate_time_averages/xarrayTimeAverager.py 98.97% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #830      +/-   ##
==========================================
+ Coverage   84.12%   84.29%   +0.16%     
==========================================
  Files          71       72       +1     
  Lines        4944     5016      +72     
==========================================
+ Hits         4159     4228      +69     
- Misses        785      788       +3     
Flag Coverage Δ
unittests 84.29% <85.33%> (+0.16%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
fre/app/freapp.py 97.14% <ø> (ø)
fre/app/generate_time_averages/__init__.py 100.00% <ø> (ø)
fre/app/generate_time_averages/cdoTimeAverager.py 100.00% <100.00%> (ø)
...p/generate_time_averages/frepytoolsTimeAverager.py 80.43% <100.00%> (+0.14%) ⬆️
...p/generate_time_averages/generate_time_averages.py 92.75% <100.00%> (+0.95%) ⬆️
...e/app/generate_time_averages/xarrayTimeAverager.py 98.97% <98.97%> (ø)
...p/generate_time_averages/frenctoolsTimeAverager.py 25.97% <4.54%> (-0.70%) ⬇️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7e80492...8250249. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

@copilot let's change this from "deprecate" to remove, i slightly misunderstood the meaning in the context of software. also add another mode of time averaging and climatology generation using xarray. tests please. and keep the pkg=cdo entry point to functionality around please, but give a loud warning and say you're going to use xarray instead. and also rename the frepythontools timeaverager the numpytimeaverager. you can follow pylint's recommendation on naming cases for classes etc., but please understand i miss mixed case from C++ and i wish i could configure pylint to allow it (example mixedCase) so if you could figure that out that would be super cool but it's ok if you can't. we are going to need a test for this in the test_cloud_runner CI workflow in NOAA-GFDL/fre-workflows so study those reqs too. hop to it, yee haw!

…er, update tests and pylintrc

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/1c5f6605-f239-4dd0-94b7-53b5458889fd

Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 3, 2026

@copilot let's change this from "deprecate" to remove, i slightly misunderstood the meaning in the context of software. also add another mode of time averaging and climatology generation using xarray....

Pushed in facdfba:

  • CDO fully removed — cdoTimeAverager is now a stub that warns loudly and redirects to xarrayTimeAverager
  • New xarrayTimeAverager class added with all/seas/month support (weighted + unweighted)
  • frepytoolsTimeAverager renamed to NumpyTimeAverager (old name kept as alias)
  • pkg='cdo' entry point preserved with loud FutureWarning, routes to xarray
  • pkg='xarray' added as CLI choice
  • pylintrc updated with class-rgx to allow mixedCase class names

Known issue: the weighted time-average path (unwgt=False) in xarrayTimeAverager has a dtype conflict when time_bnds are decoded as datetime64 — needs further work. The unweighted path and all other avg_types work correctly. I did not get to studying the test_cloud_runner CI workflow in fre-workflows due to time.

Copilot AI changed the title Deprecate cdo/python-cdo use in fre-cli Remove cdo/python-cdo from fre-cli, add xarray time averager Apr 3, 2026
@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

@copilot fix unit test failure RE dtype

…er weighted average

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/1fb856bc-30a4-4be7-bb77-b2f42b051266

Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 3, 2026

@copilot fix unit test failure RE dtype

Fixed in a77657e. There were two root causes:

  1. _compute_time_weights: The test data uses calendar_type = "JULIAN", so xarray uses cftime objects for time_bnds (dtype object). The previous code used hasattr(weights.dt, 'days') which didn't handle cftime timedeltas. Fixed by calling .values to get a plain numpy array and dispatching on dtype.kind == 'm' (timedelta64), dtype == object (cftime), or numeric fallback.

  2. Non-numeric data variables: The atmos file contains average_T1 and average_T2 variables with units = "days since ..." that xarray decodes as datetime64. When the weighted loop tried ds['average_T1'] * float_weights, numpy raised ufunc 'multiply' cannot use operands with types dtype('<M8[ns]') and dtype('float64'). Fixed by adding an _is_numeric() guard in all three weighted helper functions — non-numeric time-dependent variables (datetime64, timedelta64) are retained as their first timestep value rather than being averaged.

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

@copilot we need coverage on the xarray time averager. more unit tests!

Copilot AI and others added 2 commits April 3, 2026 06:19
…ghted seas/month parametrized cases

Agent-Logs-Url: https://github.com/NOAA-GFDL/fre-cli/sessions/ac9c29b8-e26b-4463-804e-8f22154798d3

Co-authored-by: ilaflott <6273252+ilaflott@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 3, 2026

@copilot we need coverage on the xarray time averager. more unit tests!

Added in 8250249. New test_xarrayTimeAverager.py with 53 tests organized into 6 classes:

  • TestIsNumeric (8 tests) — float32, float64, int32, int64, uint8 → numeric; datetime64, timedelta64, object → not numeric
  • TestComputeTimeWeights (5 tests) — float bounds, timedelta64 bounds, no-bnds uniform fallback, dim name check, 2-timestep known values (31/28 days)
  • TestWeightedTimeMean (7 tests) — known weighted mean (1×31 + 2×28)/(31+28), uniform weights == arithmetic mean, time dim eliminated, static vars preserved, time_bnds excluded, non-numeric (datetime64) vars get first value, attrs preserved
  • TestWeightedSeasonalMean (7 tests) — season dim present, 4 seasons, time_bnds excluded, MAM/JJA/SON weighted values verified against known formulas, non-numeric vars excluded
  • TestWeightedMonthlyMean (6 tests) — month dim present, 12 months, single-year weighted == unweighted, 2-year Jan weighted mean, time_bnds excluded, non-numeric vars excluded
  • TestXarrayTimeAveragerGenerateTimavg (20 tests) — ValueError path, all/seas/month × wgt/unwgt output existence/shape/value checks, per-month file creation (.01.nc.12.nc), MAM seasonal value from file, infile not modified

Also added the missing weighted seas and month parametrized cases to test_generate_time_averages.py.

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

ready for a test_cloud_runner run; the idea is we have a smooth time-averaging transition with minimal effort required on fre-workflows' end, if (ideally) any

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

a test_cloud_runner workflow has been launched

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

@copilot

  • NOAA-GFDL/fre-workflows' test_cloud_runner CI failed using this branch, the link is here for you to examine fully
  • my take on the error, something to do with gfdl_msd_schemas and tags across it, fre-workflows and fre-cli.
  • for your convenience, i copy/pasted a large fraction of the error snippet below:
Successfully built fre-cli
Installing collected packages: fre-cli
  Attempting uninstall: fre-cli
    Found existing installation: fre-cli 2026.1a2
    Uninstalling fre-cli-2026.1a2:
      Successfully uninstalled fre-cli-2026.1a2
  WARNING: The script fre is installed in '/mnt/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed fre-cli-2026.1a2
/contrib-efs/fre-workflows-runners/actions-runner-1/_work/fre-workflows/fre-workflows
Please Enter Experiment Name:
Experiment name: test_pp
Please Enter Platform:
Platform: ptest
Please Enter Target:
Target: ttest
Please Enter Path to model yaml file:
Model yaml: ./for_gh_runner/yaml_workflow/model.yaml
Creating necessary paths used in workflow
Path /mnt/pp previously created. Removing...
   Creating new /mnt/pp

Path /mnt/ptmp previously created. Removing...
   Creating new /mnt/ptmp

Path /mnt/temp previously created. Removing...
   Creating new /mnt/temp

+ echo 'experiment cleaning, if it was previously installed'
experiment cleaning, if it was previously installed
+ '[' -d /mnt/cylc-run/test_pp__ptest__ttest ']'
+ echo -e '\ntest_pp__ptest__ttest previously installed'

+ echo '   Removing test_pp__ptest__ttest...'
+ cylc clean test_pp__ptest__ttest
test_pp__ptest__ttest previously installed
   Removing test_pp__ptest__ttest...
INFO - Removing directory: /mnt/cylc-run/test_pp__ptest__ttest
+ echo -e '\nCreating test_pp__ptest__ttest directory in /mnt/cylc-src/test_pp__ptest__ttest ...'

Creating test_pp__ptest__ttest directory in /mnt/cylc-src/test_pp__ptest__ttest ...
+ rm -rf /mnt/cylc-src/test_pp__ptest__ttest
+ mkdir -p /mnt/cylc-src/test_pp__ptest__ttest
+ echo -e '\nCopying fre-workflows directory in /mnt/cylc-src/test_pp__ptest__ttest ...'

Copying fre-workflows directory in /mnt/cylc-src/test_pp__ptest__ttest ...
+ cp -r ./Jinja2Filters ./LICENSE ./README.md ./app ./bin ./environment.yml ./etc ./flow.cylc ./for-developers.md ./for_gh_runner ./fre-cli ./lib ./portable-post-processing.md ./pylintrc ./pytest.ini ./rose-suite.conf ./site /mnt/cylc-src/test_pp__ptest__ttest
+ check_exit_status 'MOCK CHECKOUT (cp)'

Running fre pp configure-yaml, combining separate yaml configs into one, then writing rose-suite/app config files ...
+ '[' 0 -ne 0 ']'
+ echo -e '\nRunning fre pp configure-yaml, combining separate yaml configs into one, then writing rose-suite/app config files ...'
+ fre -vv pp configure-yaml -e test_pp -p ptest -t ttest -y ./for_gh_runner/yaml_workflow/model.yaml
[DEBUG:                  fre.py:                     fre] root fre_logger level set
[DEBUG:                  fre.py:                     fre] click entry-point function call done.
[ INFO:configure_script_yaml.py:               yaml_info] Starting
[ INFO: combine_yamls_script.py:       consolidate_yamls] Initializing a post-processing and analysis yaml instance...
[ INFO:       pp_info_parser.py:                 combine] Combining yaml files into one dictionary: 
[ INFO:       pp_info_parser.py:           combine_model]    model yaml: ./for_gh_runner/yaml_workflow/model.yaml
[ INFO:       pp_info_parser.py:        combine_settings]    settings yaml: settings.yaml
[ INFO:       pp_info_parser.py:    merge_multiple_yamls]    experiment yaml: pp1.yaml
[ INFO: analysis_info_parser.py:                 combine] Combining yaml files into one dictionary: 
[ INFO: analysis_info_parser.py:           combine_model]    model yaml: ./for_gh_runner/yaml_workflow/model.yaml
[ INFO: analysis_info_parser.py:        combine_settings]    settings yaml: settings.yaml
[ INFO: combine_yamls_script.py:       consolidate_yamls] No analysis yamls were combined
[ INFO:configure_script_yaml.py:           validate_yaml] Using yaml schema '/mnt/.local/lib/python3.11/site-packages/fre/gfdl_msd_schemas/FRE/fre_pp.json'
Traceback (most recent call last):
  File "/mnt/.local/lib/python3.11/site-packages/fre/pp/configure_script_yaml.py", line 48, in validate_yaml
    validate(instance = yamlfile,schema=schema)
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/jsonschema/validators.py", line 1332, in validate
    raise error
jsonschema.exceptions.ValidationError: Additional properties are not allowed ('do_preanalysis', 'do_refinediag' were unexpected)

Failed validating 'additionalProperties' in schema['properties']['postprocess']['properties']['switches']:
    {'type': 'object',
     'properties': {'clean_work': {'description': 'Switch to remove '
                                                  'intermediate data files '
                                                  'when they are no longer '
                                                  'needed.',
                                   'type': 'boolean'},
                    'do_atmos_plevel_masking': {'description': 'Switch to '
                                                               'mask atmos '
                                                               'pressure-level '
                                                               'output '
                                                               'above/below '
                                                               'surface '
                                                               'pressure/atmos '
                                                               'top.',
                                                'type': 'boolean'},
                    'do_analysis': {'description': 'Switch to launch '
                                                   'analysis scripts',
                                    'type': 'boolean'},
                    'do_analysis_only': {'description': 'Switch to only '
                                                        'launch analysis '
                                                        'scripts',
                                         'type': 'boolean'}},
     'required': ['clean_work',
                  'do_atmos_plevel_masking',
                  'do_analysis',
                  'do_analysis_only'],
     'additionalProperties': False}

On instance['postprocess']['switches']:
    {'clean_work': True,
     'do_refinediag': False,
     'do_atmos_plevel_masking': True,
     'do_preanalysis': False,
     'do_analysis': False,
     'do_analysis_only': False}

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/mnt/.local/bin/fre", line 6, in <module>
    sys.exit(fre())
             ^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/cylc-flow-tools/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/mnt/.local/lib/python3.11/site-packages/fre/pp/frepp.py", line 122, in configure_yaml
    configure_script_yaml.yaml_info(yamlfile,experiment,platform,target)
  File "/mnt/.local/lib/python3.11/site-packages/fre/pp/configure_script_yaml.py", line 242, in yaml_info
    validate_yaml(full_yamldict)
  File "/mnt/.local/lib/python3.11/site-packages/fre/pp/configure_script_yaml.py", line 53, in validate_yaml
    raise ValueError("Combined yaml is not valid. Please fix the errors and try again.") from exc
ValueError: Combined yaml is not valid. Please fix the errors and try again.
+ check_exit_status CONFIGURE-YAML
CONFIGURE-YAML failed
+ '[' 1 -ne 0 ']'
+ echo 'CONFIGURE-YAML failed'
+ exit 1

@ilaflott ilaflott added new functioning New feature or request clean up priority: HIGH performance related to or enhances performance (run time or memory) labels Apr 3, 2026
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 3, 2026

@copilot

  • NOAA-GFDL/fre-workflows' test_cloud_runner CI failed using this branch, the link is here for you to examine fully
  • my take on the error, something to do with `gf...

This CI failure is not caused by this PR. The gfdl_msd_schemas submodule ref is identical on both main and this branch (commit d5a42a3).

The root cause: gfdl_msd_schemas commit d98a598 removed do_refinediag and do_preanalysis from the fre_pp.json switches schema (replaced with per-script switches), and the switches section has additionalProperties: false. The fre-workflows test YAML at for_gh_runner/yaml_workflow/model.yaml still includes those old properties:

'do_refinediag': False,
'do_preanalysis': False,

Fix needed on the fre-workflows side: remove do_preanalysis and do_refinediag from the test YAML's switches section (or update to the new per-script switch format from the schema).

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

a full, successful test with fre-workflows requires a test_cloud_runner test-yaml update

@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

long winded- i mean, this PR requires NOAA-GFDL/fre-workflows#222

@ilaflott ilaflott marked this pull request as ready for review April 3, 2026 20:03
@ilaflott
Copy link
Copy Markdown
Member

ilaflott commented Apr 3, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clean up new functioning New feature or request performance related to or enhances performance (run time or memory) priority: HIGH

Projects

None yet

Development

Successfully merging this pull request may close these issues.

deprecate cdo/python-cdo use in fre-cli

2 participants