Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion crowdsec-docs/docs/expr/file_helpers.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,4 +20,43 @@ Returns the content of `FileName` as an array of string, while providing cache m

Returns `true` if the `StringToMatch` is matched by one of the expressions contained in `FileName` (uses RE2 regexp engine).

> `RegexpInFile( evt.Enriched.reverse_dns, 'my_legit_seo_whitelists.txt')`
> `RegexpInFile( evt.Enriched.reverse_dns, 'my_legit_seo_whitelists.txt')`

## Map file helpers

Map file helpers work with JSON-lines files loaded with `type: map` in the [data property](/log_processor/scenarios/format.md#data). Each line in the file is a JSON object with three required fields:

- `pattern`: the value to match against
- `tag`: the label returned on match
- `type`: one of `equals` (exact match), `contains` (substring match), or `regex` (RE2 regular expression)

Example map file:

```json
{"pattern": "/wp-admin/", "tag": "WordPress", "type": "contains"}
{"pattern": "/specific/endpoint.php", "tag": "SpecificApp", "type": "equals"}
{"pattern": "/wp-content/plugins/[^/]+/readme\\.txt", "tag": "WordPress-Plugin", "type": "regex"}
```

Comments (lines starting with `#`) and blank lines are ignored.

### `FileMap(FileName) []map[string]string`

Returns the content of `FileName` as an array of maps. Each element is a map with the keys from the JSON object (`pattern`, `tag`, `type`).

> `FileMap('app_signatures.json')`

> `any(FileMap('app_signatures.json'), { #.tag == 'WordPress' })`

### `LookupFile(StringToMatch, FileName) string`

Searches the map file `FileName` for a match against `StringToMatch`. Returns the `tag` of the first matching entry, or an empty string if no match is found.

Matching is performed in priority order:
1. **Exact match** (`equals` entries) — O(1) hash map lookup
2. **Substring match** (`contains` entries) — using [Aho-Corasick](https://en.wikipedia.org/wiki/Aho%E2%80%93Corasick_algorithm) automaton
3. **Regex match** (`regex` entries) — RE2 regular expressions

> `LookupFile(evt.Parsed.request, 'app_signatures.json')`

> `LookupFile(evt.Parsed.request, 'app_signatures.json') != ''`
7 changes: 4 additions & 3 deletions crowdsec-docs/docs/log_processor/parsers/format.md
Original file line number Diff line number Diff line change
Expand Up @@ -393,17 +393,18 @@ statics:
data:
- source_url: https://URL/TO/FILE
dest_file: LOCAL_FILENAME
type: (regexp|string)
type: (regexp|string|map)
```

`data` allows user to specify an external source of data.
This section is only relevant when `cscli` is used to install parser from hub, as it will download the `source_url` and store it to `dest_file`. When the parser is not installed from the hub, CrowdSec won't download the URL, but the file must exist for the parser to be loaded correctly.

The `type` is mandatory if you want to evaluate the data in the file, and should be `regex` for valid (re2) regular expression per line or `string` for string per line.
The `type` is mandatory if you want to evaluate the data in the file, and should be `regex` for valid (re2) regular expression per line, `string` for string per line, or `map` for JSON-lines map files (see [map file helpers](/expr/file_helpers.md#map-file-helpers)).

The regexps will be compiled, the strings will be loaded into a list and both will be kept in memory.
Without specifying a `type`, the file will be downloaded and stored as file and not in memory.

You can refer to the content of the downloaded file(s) by using either the `File()` or `RegexpInFile()` function in an expression:
You can refer to the content of the downloaded file(s) by using `File()`, `RegexpInFile()`, `FileMap()`, or `LookupFile()` in an expression:

```yaml
filter: 'evt.Meta.log_type in ["http_access-log", "http_error-log"] and any(File("backdoors.txt"), { evt.Parsed.request contains #})'
Expand Down
6 changes: 3 additions & 3 deletions crowdsec-docs/docs/log_processor/scenarios/format.md
Original file line number Diff line number Diff line change
Expand Up @@ -612,7 +612,7 @@ If the `cancel_on` expression returns true, the bucket is immediately destroyed
data:
- source_url: https://URL/TO/FILE
dest_file: LOCAL_FILENAME
[type: (regexp|string)]
[type: (regexp|string|map)]
```

:::info
Expand All @@ -628,12 +628,12 @@ The `source_url` section is only relevant when `cscli` is used to install scenar

When the scenario is not installed from the hub, CrowdSec will not download the `source_url`, however, if the file exists at `dest_file` within the data directory it will still be loaded into memory.

The `type` is mandatory if you want to evaluate the data in the file, and should be `regex` for valid (re2) regular expression per line or `string` for string per line.
The `type` is mandatory if you want to evaluate the data in the file, and should be `regex` for valid (re2) regular expression per line, `string` for string per line, or `map` for JSON-lines map files (see [map file helpers](/expr/file_helpers.md#map-file-helpers)).

The regexps will be compiled, the strings will be loaded into a list and both will be kept in memory.
Without specifying a `type`, the file will be downloaded and stored as file and not in memory.

You can refer to the content of the downloaded file(s) by using either the `File()` or `RegexpInFile()` function in an expression:
You can refer to the content of the downloaded file(s) by using `File()`, `RegexpInFile()`, `FileMap()`, or `LookupFile()` in an expression:

```yaml
filter: 'evt.Meta.log_type in ["http_access-log", "http_error-log"] and any(File("backdoors.txt"), { evt.Parsed.request contains #})'
Expand Down
Loading