use filters.splitter instead of np.digitize and for loop for more efficient pixel separation

Hi, and thanks for the nicely structured lidar processing module!

It was a great inspiration to revisit some of my old approaches.
While trying out the chm calculation I noticed the way the point cloud is split into pixels can be vastly sped up when using the filters.splitter module provided with pdal, reducing calculation times of a >20 mio las file ([one like this for example](https://geoportal.geoportal-th.de/hoehendaten/LAS/las_2020-2025/las_32_625_5655_1_th_2020-2025.zip?type=dhm1n&id=7236&log=625_5655_1x1km))  from ~3 min to just 17 seconds. 

There are two drawbacks: afaik the splitter has to be added directly to the read pipeline and it returns the pixels unsorted.
It is not a direct drop-in for the np.digitize + for loop solution, and the writing to tif also uses the pdal writers to tackle the unsorted filter output.

I've included an example below based on your read_lidar function. 
Results should be identical to the chm given in the examples:

```python
arrays = read_lidar(file_path, srs)
chm, extent = calculate_chm(arrays[0], (1,1,1))
create_geotiff(chm, "....tif", extent)
```

```python

import numpy as np
import pdal
import json


def process_max_height_lidar(input_file: str, srs: str,
                             split_length: float | int = None,
                             save_as_tif: str = None,
                             minx=None, miny=None, maxx=None, maxy=None):
    """
    Reads and processes a LiDAR point cloud file to a max height model using PDAL based on specified options.

    :param input_file: str, The path to the input LiDAR file. Supported formats are .las, .laz, .copc, and .copc.laz.
    :param srs: str, The Spatial Reference System (SRS) of the point cloud.
    :param split_length: float|int, optional, The pixel size in SRS units to split the point cloud.
    :param save_as_tif: str, optional The path to save the tif.
    :param minx: int|float
    :param miny: int|float
    :param maxx: int|float
    :param maxy: int|float
    """

    las_extensions = ('.las', '.laz')
    copc_extensions = ('.copc', '.copc.laz')
    ept_file = ('ept.json')

    file_lower = input_file.lower()
    if file_lower.endswith(las_extensions):
        reader = 'readers.las'
    elif file_lower.endswith(copc_extensions):
        reader = 'readers.copc'
    elif file_lower.endswith(ept_file):
        reader = 'readers.ept'
    else:
        raise ValueError(
            "The input file must be a .las, .laz, .copc, .copc.laz file, or an ept.json file."
        )

    base_pipeline = [{
        "type": reader,
        "spatialreference": srs,
        "filename": input_file
    }]

    # get the extent from the file if not present in the arguments
    if any([minx, miny, maxx, maxy]) is None:
        metadata = pdal.Pipeline(json.dumps(base_pipeline))
        metadata.execute()
        metadata = metadata.metadata
        minx = metadata['metadata']['readers.las']['minx']
        miny = metadata['metadata']['readers.las']['miny']
        maxx = metadata['metadata']['readers.las']['maxx']
        maxy = metadata['metadata']['readers.las']['maxy']

    if split_length is not None:
        base_pipeline.append({
            "type": "filters.splitter",
            "length": f"{split_length}",
            "origin_x": f"{minx}",
            "origin_y": f"{miny}"
        })
    base_pipeline = pdal.Pipeline(json.dumps(base_pipeline))
    base_pipeline.execute()
    base_arrays = base_pipeline.arrays

    out_arr = []
    for i in range(len(base_arrays)):
        mod_point = base_arrays[i][0]
        mod_point['Z'] = np.max(base_arrays[i]['Z'])
        # correct the x/y to be in the middle of the pixel
        mod_point['X'] = np.min(base_arrays[i]['X']) + split_length/2
        mod_point['Y'] = np.min(base_arrays[i]['Y']) + split_length/2

        #base_arrays_calc[i] = [base_arrays_calc[i][0]]
        out_arr.append(mod_point)


    out_arr = np.asarray(out_arr)

    if save_as_tif is not None:
        pipeline = pdal.Writer.gdal(filename=str(save_as_tif),
                                    dimension='Z',
                                    output_type="min",
                                    data_type="float32",
                                    gdaldriver='GTiff',
                                    nodata=0.,
                                    window_size="0",
                                    origin_x=f"{minx}",
                                    origin_y=f"{miny}",
                                    width=f"{int((maxx - minx) / split_length)}",
                                    height=f"{int((maxy - miny) / split_length)}",
                                    resolution=f"{split_length}",
                                    radius=f"{split_length / 2}",
                                    default_srs=srs)
        pipeline = pipeline.pipeline(out_arr)
        pipeline.execute()
    else:
        return out_arr

```
If you'd like to adapt I am happy to help refactoring! 

Best,
Markus

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

use filters.splitter instead of np.digitize and for loop for more efficient pixel separation #29

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

use filters.splitter instead of np.digitize and for loop for more efficient pixel separation #29

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions