Skip to content

use filters.splitter instead of np.digitize and for loop for more efficient pixel separation #29

@MarkusZehner

Description

@MarkusZehner

Hi, and thanks for the nicely structured lidar processing module!

It was a great inspiration to revisit some of my old approaches.
While trying out the chm calculation I noticed the way the point cloud is split into pixels can be vastly sped up when using the filters.splitter module provided with pdal, reducing calculation times of a >20 mio las file (one like this for example) from ~3 min to just 17 seconds.

There are two drawbacks: afaik the splitter has to be added directly to the read pipeline and it returns the pixels unsorted.
It is not a direct drop-in for the np.digitize + for loop solution, and the writing to tif also uses the pdal writers to tackle the unsorted filter output.

I've included an example below based on your read_lidar function.
Results should be identical to the chm given in the examples:

arrays = read_lidar(file_path, srs)
chm, extent = calculate_chm(arrays[0], (1,1,1))
create_geotiff(chm, "....tif", extent)
import numpy as np
import pdal
import json


def process_max_height_lidar(input_file: str, srs: str,
                             split_length: float | int = None,
                             save_as_tif: str = None,
                             minx=None, miny=None, maxx=None, maxy=None):
    """
    Reads and processes a LiDAR point cloud file to a max height model using PDAL based on specified options.

    :param input_file: str, The path to the input LiDAR file. Supported formats are .las, .laz, .copc, and .copc.laz.
    :param srs: str, The Spatial Reference System (SRS) of the point cloud.
    :param split_length: float|int, optional, The pixel size in SRS units to split the point cloud.
    :param save_as_tif: str, optional The path to save the tif.
    :param minx: int|float
    :param miny: int|float
    :param maxx: int|float
    :param maxy: int|float
    """

    las_extensions = ('.las', '.laz')
    copc_extensions = ('.copc', '.copc.laz')
    ept_file = ('ept.json')

    file_lower = input_file.lower()
    if file_lower.endswith(las_extensions):
        reader = 'readers.las'
    elif file_lower.endswith(copc_extensions):
        reader = 'readers.copc'
    elif file_lower.endswith(ept_file):
        reader = 'readers.ept'
    else:
        raise ValueError(
            "The input file must be a .las, .laz, .copc, .copc.laz file, or an ept.json file."
        )

    base_pipeline = [{
        "type": reader,
        "spatialreference": srs,
        "filename": input_file
    }]

    # get the extent from the file if not present in the arguments
    if any([minx, miny, maxx, maxy]) is None:
        metadata = pdal.Pipeline(json.dumps(base_pipeline))
        metadata.execute()
        metadata = metadata.metadata
        minx = metadata['metadata']['readers.las']['minx']
        miny = metadata['metadata']['readers.las']['miny']
        maxx = metadata['metadata']['readers.las']['maxx']
        maxy = metadata['metadata']['readers.las']['maxy']

    if split_length is not None:
        base_pipeline.append({
            "type": "filters.splitter",
            "length": f"{split_length}",
            "origin_x": f"{minx}",
            "origin_y": f"{miny}"
        })
    base_pipeline = pdal.Pipeline(json.dumps(base_pipeline))
    base_pipeline.execute()
    base_arrays = base_pipeline.arrays

    out_arr = []
    for i in range(len(base_arrays)):
        mod_point = base_arrays[i][0]
        mod_point['Z'] = np.max(base_arrays[i]['Z'])
        # correct the x/y to be in the middle of the pixel
        mod_point['X'] = np.min(base_arrays[i]['X']) + split_length/2
        mod_point['Y'] = np.min(base_arrays[i]['Y']) + split_length/2

        #base_arrays_calc[i] = [base_arrays_calc[i][0]]
        out_arr.append(mod_point)


    out_arr = np.asarray(out_arr)

    if save_as_tif is not None:
        pipeline = pdal.Writer.gdal(filename=str(save_as_tif),
                                    dimension='Z',
                                    output_type="min",
                                    data_type="float32",
                                    gdaldriver='GTiff',
                                    nodata=0.,
                                    window_size="0",
                                    origin_x=f"{minx}",
                                    origin_y=f"{miny}",
                                    width=f"{int((maxx - minx) / split_length)}",
                                    height=f"{int((maxy - miny) / split_length)}",
                                    resolution=f"{split_length}",
                                    radius=f"{split_length / 2}",
                                    default_srs=srs)
        pipeline = pipeline.pipeline(out_arr)
        pipeline.execute()
    else:
        return out_arr

If you'd like to adapt I am happy to help refactoring!

Best,
Markus

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions