Skip to content

role/php: per-application PHP-FPM pools for nextcloud, wordpress, moodle, grav, icingaweb2 #228

@markuslf

Description

@markuslf

Summary

All lfops application roles that require PHP (nextcloud, wordpress, moodle, grav, icingaweb2, and the icingaweb2 modules) currently share the default www PHP-FPM pool from the php role instead of defining their own dedicated pool.

The librenms role already shows the right pattern (see roles/librenms/defaults/main.yml: librenms__php__fpm_pools__dependent_var). We should extend this to all other PHP applications, also on hosts that only run a single application, so that the PHP-FPM pool naming, ownership, and resource boundaries always match the app that actually uses them. The preferred naming is <app>.conf (e.g. nextcloud.conf, wordpress.conf), never www.conf, even on a dedicated host.

Why (Linux System Engineer perspective)

A dedicated per-application pool gives us four things that the default shared www pool does not:

  1. Unix-level isolation. Each pool runs as its own system user (nextcloud, wordpress, moodle, ...). A code-execution vulnerability in one app cannot read or modify the files of another app, because the worker process does not have the Unix permissions.
  2. Resource budgets that do not steal from each other. pm.max_children, pm.max_requests, CPU nice, rlimits are per pool. A WordPress spam attack cannot starve Nextcloud workers, and vice versa.
  3. Per-app PHP ini overrides. Nextcloud typically needs memory_limit = 512M..1G, opcache.memory_consumption = 256, opcache.interned_strings_buffer = 16; WordPress is happy with memory_limit = 256M. Mixing them on one pool wastes RAM per worker, or under-sizes the heavy app.
  4. Per-app hardening. open_basedir, disable_functions, and session.save_path can be tightened per pool without breaking another app that legitimately needs one of those functions.

Even on a host that is dedicated to a single app, we want the pool to be named after the app (nextcloud.conf, not www.conf) because:

  • The PHP-FPM status page (pm.status_path) is named after the pool. Monitoring output, Grafana panels, and alert messages become self-documenting (Pool nextcloud: 12/40 workers active instead of Pool www: 12/40 workers active on a host where there is no "www" in sight).
  • A later second app on the same host can be added without having to migrate the existing pool, and without a confusing www pool that belongs to nothing.
  • The socket path (/run/php-fpm/nextcloud.sock) and slowlog (/var/log/php-fpm/nextcloud-slow.log) also reflect the actual application, which makes incident triage faster.

Scope

Application roles to update:

  • grav
  • icingaweb2 (plus the three module roles that inherit its pool: icingaweb2_module_fileshipper, icingaweb2_module_vspheredb, icingaweb2_module_x509)
  • moodle
  • nextcloud
  • wordpress

librenms already does this correctly and serves as the reference implementation.

Out of scope for this ticket: introducing a multi-PHP-version setup (Remi SCL, parallel php74-php-fpm + php82-php-fpm). That is an orthogonal concern and should be its own ticket.

Proposed implementation

For every affected app role, add a <role>__php__fpm_pools__dependent_var to defaults/main.yml in the shape librenms already uses. Minimal template:

# roles/nextcloud/defaults/main.yml
nextcloud__php__fpm_pools__dependent_var:
  - name: 'nextcloud'
    by_role: 'nextcloud'
    user: '{{ nextcloud__user | default("nextcloud") }}'
    group: '{{ nextcloud__group | default("nextcloud") }}'
    raw: |-
      ; Nextcloud needs generous memory for previews, PDF generation, OCR.
      php_admin_value[memory_limit] = 1G
      php_admin_value[opcache.memory_consumption] = 256
      php_admin_value[opcache.interned_strings_buffer] = 16
      php_admin_value[upload_max_filesize] = 16G
      php_admin_value[post_max_size] = 16G
      php_admin_value[session.save_path] = /var/lib/php/session-nextcloud
      env[PATH] = /usr/local/bin:/usr/bin:/bin

Each role brings its own app-appropriate raw block with the PHP ini overrides the upstream documentation recommends. The web server role (apache_httpd, nginx) routes the vhost to the app-specific socket at /run/php-fpm/<app>.sock instead of /run/php-fpm/www.sock.

Dropping the default www pool on app hosts

Today, even when an application adds its own pool, the www pool from roles/php/defaults/main.yml (php__fpm_pools__role_var) is still deployed alongside it, because the combine_lod logic merges both. On a dedicated Nextcloud host you end up with two pools, one of which is unused.

Preferred path (Option B): change the default of php__fpm_pools__role_var in the php role to [] and document that every consumer of the php role must define at least one pool. The existing librenms setup keeps working. A host that genuinely wants a generic www pool adds it explicitly via php__fpm_pools__host_var or php__fpm_pools__group_var. This is the clean long-term design and matches the "applikationsname als <app>.conf" convention we want.

Fallback (Option A), only if B is too disruptive to roll out in one go: leave php__fpm_pools__role_var as is (with www) and have each app role override it to [] in its own defaults/main.yml alongside the dependent pool. This is less invasive but keeps the dead-code www default around and will need a second cleanup ticket later.

Additional hygiene fix in the pool template

Today the pool template (roles/php/templates/etc/php-fpm.d/RedHat-pool.conf.j2 and Debian-pool.conf.j2) hardcodes:

php_value[session.save_path] = /var/lib/php/session

This path is shared across all pools on the host, which silently undermines pool isolation: a worker in pool A can read session files belonging to pool B if the session file names are guessable. The template should default to /var/lib/php/session-{{ item[\"name\"] }} and create that directory as owned by the pool's user/group. The tasks in roles/php/tasks/main.yml need a file task to create the per-pool session directory alongside the pool config deployment.

How it should look on RHEL

RHEL packages PHP-FPM such that all pool configs live under /etc/php-fpm.d/*.conf, all sockets under /run/php-fpm/*.sock, all slowlogs under /var/log/php-fpm/*-slow.log. The main FPM config at /etc/php-fpm.conf includes /etc/php-fpm.d/*.conf at the end.

Pool config (/etc/php-fpm.d/nextcloud.conf), minimal:

[nextcloud]
user = nextcloud
group = nextcloud
listen = /run/php-fpm/nextcloud.sock
listen.acl_users = apache,nginx
listen.mode = 0660

pm = dynamic
pm.max_children = 80
pm.start_servers = 10
pm.min_spare_servers = 5
pm.max_spare_servers = 20
pm.max_requests = 500

pm.status_path = /nextcloud-fpm-status
ping.path = /nextcloud-fpm-ping

slowlog = /var/log/php-fpm/nextcloud-slow.log
request_slowlog_timeout = 10s
request_terminate_timeout = 300s

php_admin_value[error_log]  = /var/log/php-fpm/nextcloud-error.log
php_admin_flag[log_errors]  = on
php_admin_value[memory_limit] = 1G
php_admin_value[opcache.memory_consumption] = 256
php_admin_value[opcache.interned_strings_buffer] = 16
php_admin_value[open_basedir] = /var/www/nextcloud:/tmp:/var/lib/nextcloud
php_value[session.save_path] = /var/lib/php/session-nextcloud
env[PATH] = /usr/local/bin:/usr/bin:/bin

A single php-fpm.service systemd unit manages all pools. Reload with systemctl reload php-fpm after changes.

Apache vhost routes the app and its status/ping endpoints to the app-specific socket:

<VirtualHost *:443>
    ServerName nextcloud.example.com
    DocumentRoot /var/www/nextcloud

    <FilesMatch \.php$>
        SetHandler "proxy:unix:/run/php-fpm/nextcloud.sock|fcgi://localhost/"
    </FilesMatch>

    # status page, local only
    <Location /nextcloud-fpm-status>
        Require local
        ProxyPass unix:/run/php-fpm/nextcloud.sock|fcgi://localhost/nextcloud-fpm-status
    </Location>
    <Location /nextcloud-fpm-ping>
        Require local
        ProxyPass unix:/run/php-fpm/nextcloud.sock|fcgi://localhost/nextcloud-fpm-ping
    </Location>
</VirtualHost>

How it should look on Debian

Debian/Ubuntu version PHP in the filesystem: pool configs live at /etc/php/<ver>/fpm/pool.d/*.conf, sockets at /run/php/php<ver>-fpm-<pool>.sock by convention (or wherever you want), slowlogs at /var/log/php<ver>-fpm-<pool>-slow.log. The systemd unit is versioned, e.g. php8.2-fpm.service.

Same pool config content as on RHEL, but with Debian paths:

[nextcloud]
user = www-data
group = www-data
listen = /run/php/php8.2-fpm-nextcloud.sock
listen.owner = www-data
listen.group = www-data
listen.mode = 0660

pm = dynamic
pm.max_children = 80
...

slowlog = /var/log/php8.2-fpm-nextcloud-slow.log
...

php_admin_value[error_log] = /var/log/php8.2-fpm-nextcloud-error.log

Key differences between RHEL and Debian to keep in mind in the Ansible templates (the role template already splits these into RedHat-pool.conf.j2 and Debian-pool.conf.j2):

  • Default pool user/group: apache/apache on RHEL, www-data/www-data on Debian. On RHEL we can still run the pool as the app user (nextcloud:nextcloud), but the webserver user (apache) needs to be in listen.acl_users so it can connect(2) to the socket. On Debian the convention is to have the pool user and the webserver user be the same (www-data), or set listen.owner = www-data explicitly.
  • Per-SAPI php.ini: RHEL has a single /etc/php.ini for all SAPIs; Debian has separate /etc/php/<ver>/fpm/php.ini, /etc/php/<ver>/cli/php.ini, /etc/php/<ver>/apache2/php.ini. For per-app tuning we should prefer pool-level php_admin_value[...] anyway, which works identically on both.
  • systemd unit name: php-fpm.service on RHEL, php<ver>-fpm.service on Debian.
  • Socket path convention: /run/php-fpm/ on RHEL, /run/php/ on Debian.
  • Log path convention: /var/log/php-fpm/ on RHEL, /var/log/php<ver>-fpm-* on Debian.
  • `pm.status_path` routing: identical on both; the pool-level directive plus a matching webserver Location routes the request to the correct socket on both platforms.

Acceptance criteria

  • Every PHP-using app role in lfops ships a dedicated pool via `__php__fpm_pools__dependent_var` in `defaults/main.yml`, following the librenms reference.
  • The pool is named after the app, not `www`, also on single-app hosts.
  • Per-app PHP ini overrides that the upstream documentation recommends (memory_limit, opcache, open_basedir, session path, upload/post max size where relevant) are in the pool config, not in the host-wide `php.ini`.
  • The default `www` pool in `roles/php/defaults/main.yml` is removed (preferred: Option B). If Option A is chosen instead, every app role explicitly overrides `php__fpm_pools__role_var: []` and a follow-up ticket tracks the eventual removal of the role-level default.
  • The RHEL and Debian pool templates default `session.save_path` to a per-pool directory, and the php role creates that directory as the pool user.
  • The app role READMEs document which pool name, socket path, and status URL the webserver (`apache_httpd` / `nginx`) should route to.
  • The monitoring-plugins `php-fpm-status` check, once multi-pool support lands, is configurable with one `--url` per pool, so that a single host can monitor `/nextcloud-fpm-status` and `/wordpress-fpm-status` in one service.
  • Existing deployments are migrated with a clear changelog note (pool name change means the PHP-FPM status URL path and socket path change too, and any Icinga2 service definition that references the old URL must be updated).

References

  • Monitoring-Plugins discussion that triggered this ticket: the rework of `check-plugins/php-fpm-status` for multi-pool support exposed that essentially every lfops app currently runs on an unnamed `www` pool.
  • Existing reference: `roles/librenms/defaults/main.yml` for the `librenms__php__fpm_pools__dependent_var` pattern.
  • `roles/php/templates/etc/php-fpm.d/RedHat-pool.conf.j2` and `Debian-pool.conf.j2` for the current pool template that can already handle per-app pools.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions