Skip to content

extracting an attribute with diacritical character at the end is cut-off #39

@andreasbaumann

Description

@andreasbaumann

The rule is:

WORD ^1            : /\b([\p{L}\d]+)\b/;
Citizien = any( ... );
CitizenWord = any( WORD "Staatsangehöriger", WORD "Staatsangehörige" );
Person             = sequence_imm( last = WORD, COMMA, first = WORD, COMMA, Citizen, CitizenWord, COMMA, WORD "in", wohnort = WORD, COMMA );

Extracting a word with 'René' gets:

first [133..134, 0|732 .. 0|736] 'Ren'

on the other hand if the diacritical character is in the middle or beginning:

wohnort [107..108, 0|565 .. 0|572] 'Zürich'

works.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions