[FEAT]: Psychosocial Scenario #1266

jbolor21 · 2025-12-19T20:07:09Z

Description

Adding in a new scenario for evaluating psychosocial harms. This scenario uses prompt softening converter and role playing as single turn attacks and a crescendo attack as a multiturn attack.

Tailored current strategy for mental health crisis (self-harm related) related objectives. Other objectives may require a new attack strategy yaml file & scoring definition

Added new likert scoring file for evaluating crisis situations
Modified attack strategy for crescendo technique for mental health crisis related objectives
Added sample prompt file for some example objectives

Tests and Documentation

Added new unit tests and ran local notebooks to test strategy works

…ch_scenario

…enario

…lity.prompt delete unused file

delete unused file

bashirpartovi · 2026-01-15T19:15:39Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        scenario_result_id: Optional[str] = None,
+        crescendo_system_prompt_path: Optional[str] = None,
+        crescendo_system_prompt_paths_by_harm: Optional[Dict[str, str]] = None,
+        scoring_rubric_paths_by_harm: Optional[Dict[str, str]] = None,


Would the harm category keys in crescendo_system_prompt_paths_by_harm always exist in scoring_rubric_paths_by_harm? If so, is there a check for that?

This is a bit unclear to me. I believe these two dicts are expected to have the same keys, but the current implementation allows callers to pass mismatched keys and only fails later at runtime when a specific harm category is processed/accessed. This can lead to confusing errors that don't trace back to the constructor.
I think a better approach would be to make it a typed structure that encapsulates the system prompt path and scoring rubric path per harm category:

@dataclass class HarmCategoryConfig: crescendo_system_prompt_path: str scoring_rubric_path: str

Then your constructor signature would look like this:

def __init__( self, *, ... harm_configs: Optional[Dict[str, HarmCategoryConfig]] = None, ..., ):

and internally in the constructor:

default_configs = { "psychosocial_imminent_crisis": HarmCategoryConfig( crescendo_system_prompt_path=str( pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml" ), scoring_rubric_path=str( pathlib.Path(DATASETS_PATH) / "score" / "likert" / "crisis_management.yaml" ), ), } self._harm_configs = {**default_configs, **(harm_configs or {})}

Internally, you could still translate this into separate dicts if that is easier for the existing logic. The main benefit is a cleaner public API that enforces the invariant at the point of construction.

If the fields could have different defaults or be optional, you could still use the same structure like this:

@dataclass class HarmCategoryConfig: crescendo_system_prompt_path: str = str( pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml" ) scoring_rubric_path: str = str( pathlib.Path(DATASETS_PATH) / "score" / "likert" / "crisis_management.yaml" )

This way, you eliminate a lot of if/else checks for whether a harm category exists, falling back to the default path, etc.

Thanks, I tried to address this let me know if these changes address your feedback & make it more clear!

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

bashirpartovi · 2026-01-15T19:52:13Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        for strategy in strategies:
+            # If strategy is a dataset-specific strategy (not single_turn/multi_turn),
+            # expand it to attacks for each of its tags
+            if strategy not in ["single_turn", "multi_turn"]:
+                # Find the enum member for this strategy
+                strategy_enum = next((s for s in PsychosocialHarmsStrategy if s.value == strategy), None)
+                if strategy_enum and strategy_enum.tags:
+                    # Create an attack for each tag (single_turn, multi_turn)
+                    for tag in strategy_enum.tags:
+                        if tag in ["single_turn", "multi_turn"]:
+                            atomic_attacks.append(self._get_atomic_attack_from_strategy(tag))
+                else:
+                    # Fallback: create single attack for unknown strategy
+                    atomic_attacks.append(self._get_atomic_attack_from_strategy(strategy))
+            else:
+                # For single_turn/multi_turn, create one attack
+                atomic_attacks.append(self._get_atomic_attack_from_strategy(strategy))
+        return atomic_attacks


A few things here:
For the enum lookup, instead of using next with a generator comprehension, you can use Python's built-in enum value lookup:

try: strategy_enum = PsychosocialHarmsStrategy(strategy) except ValueError: strategy_enum = None

Also, the branching logic is a bit hard to follow. You are checking if a strategy is not single_turn/multi_turn, then expanding its tags, then checking if those tags are single_turn/multi_turn.

I think this could be simplified by normalizing everything to base attack types upfront:

base_strategies: set[str] = set() for strategy in strategies: try: strategy_enum = PsychosocialHarmsStrategy(strategy) base_strategies.update(strategy_enum.tags or [strategy]) except ValueError: base_strategies.add(strategy) return [self._get_atomic_attack_from_strategy(s) for s in base_strategies]

Ah yes okay I think that help, let me know if these changes address your idea fully!

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

bashirpartovi · 2026-01-15T20:13:14Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+            if harm_category in self._crescendo_system_prompt_paths_by_harm:
+                crescendo_prompt_path = pathlib.Path(self._crescendo_system_prompt_paths_by_harm[harm_category])
+            elif self._crescendo_system_prompt_path:
+                crescendo_prompt_path = pathlib.Path(self._crescendo_system_prompt_path)
+            else:
+                # Default: use crisis escalation
+                crescendo_prompt_path = (
+                    pathlib.Path(DATASETS_PATH) / "executors" / "crescendo" / "escalation_crisis.yaml"
+                )


If you follow my suggestion above, a lot of these if/else checks could be eliminated

hannahwestra25 · 2026-01-15T20:30:13Z

pyrit/datasets/seed_datasets/local/airt/psychosocical_imminent_crisis.prompt

@@ -0,0 +1,9 @@
+dataset_name: airt_psychosocial_imminent_crisis
+harm_categories: psychosocial_imminent_crisis


nit: imo the harm_category should just be pyschosocial because it's broader

hannahwestra25 · 2026-01-15T20:33:27Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    users in mental health crisis, psychological vulnerability, inappropriate dependency,
+    and self-harm situations.
+
+    Each tag represents a different psychosocial harm category that the model can be tested for.


same comment as above that I was considering psychosocial to be the harm category and then the strategies are like methods of testing the harm category so each tag doesn't represent a different harm category but a different method (ie strategy) within psychosocial

hannahwestra25 · 2026-01-15T20:56:56Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    SINGLE_TURN = ("single_turn", {"single_turn"})
+    MULTI_TURN = ("multi_turn", {"multi_turn"})
+
+    IMMINENT_CRISIS = ("psychosocial_imminent_crisis", {"single_turn", "multi_turn"})


i think this should just be imminent_crisis and then the harm category is psychosocial

similar to this:

PyRIT/pyrit/scenario/scenarios/airt/scam.py

Line 66 in 4a40aa3

ALL = ("all", {"all"})

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

hannahwestra25 · 2026-01-20T19:22:19Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    """
+    Configuration for a specific harm category.
+
+    Encapsulates the Crescendo system prompt path and scoring rubric path for a harm category.


same comment as above that I was considering psychosocial to be the harm category and then the strategies are like methods of testing the harm category so each tag doesn't represent a different harm category but a different method (ie strategy) within psychosocial

hannahwestra25 · 2026-01-20T19:22:21Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    TrueFalseScorer,
+    create_conversation_scorer,
+)
+


Suggested change

logger = logging.getLogger(__name__)

hannahwestra25 · 2026-01-21T15:08:38Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        scoring_rubric_path: Path to the scoring rubric YAML file.
+    """
+
+    crescendo_system_prompt_path: str


why is this specific to crescendo ?

hannahwestra25 · 2026-01-21T15:19:53Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+    - Avoid facilitating self-harm or harmful behaviors
+    - Provide crisis resources and encourage professional help


I'm wondering if we can make the scoring yamls and system prompts broad enough to include all these tests. I think the fact that you're adding the ability for a user to input a custom system prompt and scorer means that the scenario is too broad or undefined (which maybe it is and we could distill it a bit more)

hannahwestra25 · 2026-01-21T19:51:50Z

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py

+        crescendo_system_prompt_path: Optional[str] = None,
+        harm_configs: Optional[Dict[str, HarmCategoryConfig]] = None,


we shouldn't need both right ? My vote would just be to have a broad system_prompt_path

adding draft psych scenario file

d6a2140

jbolor21 marked this pull request as draft December 19, 2025 20:07

Bolor and others added 17 commits December 22, 2025 16:45

working changes and additions

b03e219

precommit

954a52e

Merge remote-tracking branch 'origin/main' into users/bjagdagdorj/psy…

53613d4

…ch_scenario

work in progress

c97e27a

Merge remote-tracking branch 'origin/main' into users/bjagdagdorj/psy…

45546ea

…ch_scenario

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

75a4158

…enario

adding therapist files

b8fa4f0

draft

52428fc

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

ccf1a28

…enario

before pulling in changes

cfd179a

cleaned up files, ready for review

8583df6

Merge remote-tracking branch 'origin' into users/bjagdagdorj/psych_sc…

6d985e3

…enario

Delete pyrit/datasets/seed_datasets/local/airt/psychosocial_vulnerabi…

af98a11

…lity.prompt delete unused file

Delete pyrit/datasets/score/likert/dependency_management.yaml

19d0e07

delete unused file

precommit

13a7c51

slight refactor and fixed tests

ec55882

slight refactor and fixed tests

1609465

jbolor21 marked this pull request as ready for review January 15, 2026 18:40

jbolor21 changed the title ~~DRAFT: [FEAT]: Psychosocial Scenario~~ [FEAT]: Psychosocial Scenario Jan 15, 2026

bashirpartovi reviewed Jan 15, 2026

View reviewed changes

hannahwestra25 reviewed Jan 15, 2026

View reviewed changes

pyrit/scenario/scenarios/airt/psychosocial_harms_scenario.py Show resolved Hide resolved

addressing feedback

c7f5628

hannahwestra25 reviewed Jan 20, 2026

View reviewed changes

hannahwestra25 reviewed Jan 21, 2026

View reviewed changes

		@@ -0,0 +1,9 @@
		dataset_name: airt_psychosocial_imminent_crisis
		harm_categories: psychosocial_imminent_crisis

		- Avoid facilitating self-harm or harmful behaviors
		- Provide crisis resources and encourage professional help

		crescendo_system_prompt_path: Optional[str] = None,
		harm_configs: Optional[Dict[str, HarmCategoryConfig]] = None,

[FEAT]: Psychosocial Scenario #1266

Are you sure you want to change the base?

[FEAT]: Psychosocial Scenario #1266

Conversation

jbolor21 commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests and Documentation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jbolor21 commented Dec 19, 2025 •

edited

Loading