Add ontology term name and cell state to obs #52

parashardhapola · 2025-11-20T02:11:20Z

PR Type

Enhancement

Description

Add ontology term IDs and cell states to obs columns
Separate ontology term names from IDs in API response
Update API domain to production endpoint
Improve documentation with visualization examples

Diagram Walkthrough

flowchart LR
  API["API Response"] -->|Extract cellOntologyTermName| OntologyName["ontologyTerm"]
  API -->|Extract cellOntologyTerm| OntologyID["ontologyTermID"]
  API -->|Extract cellState| CellState["cellState"]
  OntologyName --> OBS["adata.obs columns"]
  OntologyID --> OBS
  CellState --> OBS

File Walkthrough

Relevant files

Configuration changes

__init__.py `Version bump to 0.11.0` cytetype/init.py Version bump from 0.10.0 to 0.11.0	+1/-1
config.py `Update API domain to production endpoint` cytetype/config.py Update DEFAULT_API_URL from Modal endpoint to production domain Change from `https://nygen-labs-prod--cytetype-api.modal.run` to `https://prod.cytetype.nygen.io`	+1/-1

Enhancement

api.py `Separate ontology term names and IDs` cytetype/api.py Split ontology term extraction into two fields: `ontologyTerm` (name) and `ontologyTermID` (ID) Extract `cellOntologyTermName` for term names Extract `cellOntologyTerm` for term IDs	+3/-0
main.py `Store ontology IDs and cell states in obs` cytetype/main.py Add ontology term ID mapping to `adata.obs` with key `{results_prefix}_cellOntologyTermID_{group_key}` Add cell state mapping to `adata.obs` with key `{results_prefix}_cellState_{group_key}` Update success log message to include new columns	+27/-0

Documentation

README.md `Update docs with improved examples` README.md Update example API domain URL from Modal to production Improve code example with variable `group_key` for reusability Add UMAP visualization step to pipeline Update visualization examples to show all four annotation columns	+13/-7
examples.md `Update all report links to production domain` docs/examples.md Replace all Modal API URLs with production domain across multiple dataset links Update 16 report links to use `https://prod.cytetype.nygen.io`	+20/-20

qodo-code-review · 2025-11-20T02:11:50Z

PR Compliance Guide 🔍

(Compliance updated until commit `763984d`)

Below is a summary of compliance checks for this PR:

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing audit logs: New data handling for ontology IDs and cell states lacks explicit audit logging of critical actions (e.g., transformations, updates to obs), which may be handled elsewhere but is not evident in the added code. Referred Code # Update ontology term IDs ontology_id_map = { item["clusterId"]: item["ontologyTermID"] for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellOntologyTermID_{self.group_key}"] = ( pd.Series( [ ontology_id_map.get(cluster_id, "Unknown") for cluster_id in self.clusters ], index=self.adata.obs.index, ).astype("category") ) # Update cell states cell_state_map = { item["clusterId"]: item.get("cellState", "") for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellState_{self.group_key}"] = pd.Series( ... (clipped 4 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Edge cases unhandled: Mapping for new fields (`ontologyTermID`, `cellState`) uses defaults like "Unknown" or empty strings without explicit validation or error handling for malformed inputs, which may be acceptable but is not clearly justified in the diff. Referred Code # Update ontology term IDs ontology_id_map = { item["clusterId"]: item["ontologyTermID"] for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellOntologyTermID_{self.group_key}"] = ( pd.Series( [ ontology_id_map.get(cluster_id, "Unknown") for cluster_id in self.clusters ], index=self.adata.obs.index, ).astype("category") ) # Update cell states cell_state_map = { item["clusterId"]: item.get("cellState", "") for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellState_{self.group_key}"] = pd.Series( ... (clipped 4 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Log content review: Success message logs mention results keys but do not appear to include sensitive data; however, without visibility into the content of `result_data`, it's unclear if downstream logging could inadvertently expose sensitive fields. Referred Code logger.success( f"Annotations successfully added to `adata.obs['{results_prefix}_annotation_{self.group_key}']`\n" f"Ontology terms added to `adata.obs['{results_prefix}_cellOntologyTerm_{self.group_key}']`\n" f"Ontology term IDs added to `adata.obs['{results_prefix}_ontologyTermID_{self.group_key}']`\n" f"Cell states added to `adata.obs['{results_prefix}_cellState_{self.group_key}']`\n" f"Full results added to `adata.uns['{results_prefix}_results']`." ) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input validation: Transformation trusts external API fields (`cellOntologyTermName`, `cellOntologyTerm`, `cellState`) without explicit validation/sanitization in the added code, which may be performed elsewhere but is not evident here. Referred Code transformed_annotation = { "clusterId": annotation_data.get("clusterId", cluster_id), "annotation": annotation_data.get("annotation", "Unknown"), "ontologyTerm": annotation_data.get( "cellOntologyTermName", "Unknown" ), "ontologyTermID": annotation_data.get( "cellOntologyTerm", "Unknown" ), # Include additional fields from new format Learn more about managing compliance generic rules or creating your own custom rules

Compliance status legend

🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

Previous compliance checks

Compliance check up to commit 6369c5a

Security Compliance
🟢	No security concerns identified No security vulnerabilities detected by AI analysis. Human verification advised for critical code.
Ticket Compliance
⚪	🎫 No ticket provided Create ticket/issue
Codebase Duplication Compliance
⚪	Codebase context is not defined Follow the guide to enable codebase context checks.
Custom Compliance
🟢	Generic: Meaningful Naming and Self-Documenting Code Objective: Ensure all identifiers clearly express their purpose and intent, making code self-documenting Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Error Handling Objective: To prevent the leakage of sensitive system information through error messages while providing sufficient detail for internal debugging. Status: Passed Learn more about managing compliance generic rules or creating your own custom rules
⚪	Generic: Comprehensive Audit Trails Objective: To create a detailed and reliable record of critical system actions for security analysis and compliance. Status: Missing audit logs: New functionality updates `adata.obs` with ontology IDs and cell states without any corresponding audit logging, which may be acceptable for a client library but cannot be fully verified from the diff. Referred Code # Update ontology term IDs ontology_id_map = { item["clusterId"]: item["ontologyTermID"] for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellOntologyTermID_{self.group_key}"] = ( pd.Series( [ ontology_id_map.get(cluster_id, "Unknown") for cluster_id in self.clusters ], index=self.adata.obs.index, ).astype("category") ) # Update cell states cell_state_map = { item["clusterId"]: item.get("cellState", "") for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellState_{self.group_key}"] = pd.Series( ... (clipped 4 lines) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Robust Error Handling and Edge Case Management Objective: Ensure comprehensive error handling that provides meaningful context and graceful degradation Status: Edge case defaults: The transform uses default "Unknown" or empty strings for missing fields but no explicit error handling or validation of external API inputs is shown in the new lines. Referred Code transformed_annotation = { "clusterId": annotation_data.get("clusterId", cluster_id), "annotation": annotation_data.get("annotation", "Unknown"), "ontologyTerm": annotation_data.get( "cellOntologyTermName", "Unknown" ), "ontologyTermID": annotation_data.get( "cellOntologyTerm", "Unknown" ), # Include additional fields from new format Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Secure Logging Practices Objective: To ensure logs are useful for debugging and auditing without exposing sensitive information like PII, PHI, or cardholder data. Status: Log content check: The success log message references column names only; while no sensitive data is logged in the new lines, full context of logging elsewhere is unknown. Referred Code logger.success( f"Annotations successfully added to `adata.obs['{results_prefix}_annotation_{self.group_key}']`\n" f"Ontology terms added to `adata.obs['{results_prefix}_cellOntologyTerm_{self.group_key}']`\n" f"Ontology term IDs added to `adata.obs['{results_prefix}_ontologyTermID_{self.group_key}']`\n" f"Cell states added to `adata.obs['{results_prefix}_cellState_{self.group_key}']`\n" f"Full results added to `adata.uns['{results_prefix}_results']`." ) Learn more about managing compliance generic rules or creating your own custom rules
	Generic: Security-First Input Validation and Data Handling Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent vulnerabilities Status: Input validation: New parsing of external API fields (`cellOntologyTermName`, `cellOntologyTerm`, `cellState`) lacks explicit validation or sanitization beyond defaulting values. Referred Code "ontologyTerm": annotation_data.get( "cellOntologyTermName", "Unknown" ), "ontologyTermID": annotation_data.get( "cellOntologyTerm", "Unknown" ), # Include additional fields from new format Learn more about managing compliance generic rules or creating your own custom rules

qodo-code-review · 2025-11-20T02:12:53Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
High-level	Consolidate annotation processing logic Refactor the repetitive logic in `cytetype/main.py` for processing and storing annotation fields. Use a configuration map and a single loop to handle all fields, improving code scalability and maintainability. Examples: cytetype/main.py [197-244] annotation_map = { item["clusterId"]: item["annotation"] for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_annotation_{self.group_key}"] = pd.Series( [annotation_map.get(cluster_id, "Unknown") for cluster_id in self.clusters], index=self.adata.obs.index, ).astype("category") # Update ontology terms ... (clipped 38 lines) Solution Walkthrough: Before: def _store_results_and_annotations(...): annotations = result_data.get("annotations", []) # Process 'annotation' annotation_map = {item["clusterId"]: item["annotation"] for item in annotations} self.adata.obs[f"{results_prefix}_annotation_{self.group_key}"] = pd.Series( [annotation_map.get(c, "Unknown") for c in self.clusters], ... ).astype("category") # Process 'cellOntologyTerm' ontology_term_map = {item["clusterId"]: item["ontologyTerm"] for item in annotations} self.adata.obs[f"{results_prefix}_cellOntologyTerm_{self.group_key}"] = pd.Series( [ontology_term_map.get(c, "Unknown") for c in self.clusters], ... ).astype("category") # Process 'ontologyTermID' (new) ontology_id_map = {item["clusterId"]: item["ontologyTermID"] for item in annotations} self.adata.obs[f"{results_prefix}_cellOntologyTermID_{self.group_key}"] = pd.Series( [ontology_id_map.get(c, "Unknown") for c in self.clusters], ... ).astype("category") # ... and so on for 'cellState' After: def _store_results_and_annotations(...): annotations = result_data.get("annotations", []) annotation_fields = { "annotation": ("annotation", "Unknown"), "cellOntologyTerm": ("ontologyTerm", "Unknown"), "cellOntologyTermID": ("ontologyTermID", "Unknown"), "cellState": ("cellState", ""), } for obs_suffix, (api_key, default) in annotation_fields.items(): field_map = {item["clusterId"]: item.get(api_key, default) for item in annotations} obs_key = f"{results_prefix}_{obs_suffix}_{self.group_key}" self.adata.obs[obs_key] = pd.Series( [field_map.get(cluster, default) for cluster in self.clusters], index=self.adata.obs.index, ).astype("category") Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies significant code duplication in `cytetype/main.py` for processing annotation fields, which was expanded in this PR, and proposes a valid refactoring that improves maintainability and scalability.	Medium
Possible issue	Handle potential None from API Improve robustness by handling potential `None` values from the API for `cellOntologyTermName` and `cellOntologyTerm`. Coalesce `None` to the "Unknown" default string to ensure consistent data types. cytetype/api.py [159-164] "ontologyTerm": annotation_data.get( - "cellOntologyTermName", "Unknown" -), + "cellOntologyTermName" +) or "Unknown", "ontologyTermID": annotation_data.get( - "cellOntologyTerm", "Unknown" -), + "cellOntologyTerm" +) or "Unknown", Apply / Chat Suggestion importance[1-10]: 7 __ Why: The suggestion correctly identifies that `get()` with a default value does not handle an explicit `None` return from the API. The proposed `or "Unknown"` idiom is a robust and pythonic way to ensure a default string value in all cases (missing key or `None` value).	Medium
General	Ensure consistent types before conversion To prevent potential type errors when converting to a pandas categorical type, ensure all values from the `cellState` field are strings. This can be done by explicitly casting values to strings, which also handles potential `None` values from the API. cytetype/main.py [236-244] # Update cell states cell_state_map = { - item["clusterId"]: item.get("cellState", "") + item["clusterId"]: item.get("cellState") for item in result_data.get("annotations", []) } self.adata.obs[f"{results_prefix}_cellState_{self.group_key}"] = pd.Series( - [cell_state_map.get(cluster_id, "") for cluster_id in self.clusters], + [str(cell_state_map.get(cluster_id, "")) for cluster_id in self.clusters], index=self.adata.obs.index, ).astype("category") Apply / Chat Suggestion importance[1-10]: 3 __ Why: The suggestion correctly identifies a potential type inconsistency with `None` values, but the proposed fix of using `str()` would convert `None` to the string `'None'`, which is likely not the desired behavior.	Low
Update

parashardhapola added 5 commits November 20, 2025 02:48

add ontology term and cell state to obs

5b8e6a3

change domain

80ce46a

update docs domain

8b285ec

version bump

b114487

name fix and docs update

6369c5a

qodo-code-review bot added the Review effort 2/5 label Nov 20, 2025

Update test_main.py

763984d

parashardhapola merged commit 6c40ed4 into master Nov 20, 2025
1 check passed

parashardhapola deleted the more_obs branch November 20, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ontology term name and cell state to obs #52

Add ontology term name and cell state to obs #52

Uh oh!

parashardhapola commented Nov 20, 2025 •

edited by qodo-code-review bot

Loading

Uh oh!

qodo-code-review bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

qodo-code-review bot commented Nov 20, 2025 •

edited

Loading

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add ontology term name and cell state to obs #52

Add ontology term name and cell state to obs #52

Uh oh!

Conversation

parashardhapola commented Nov 20, 2025 • edited by qodo-code-review bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Description

Diagram Walkthrough

File Walkthrough

Uh oh!

qodo-code-review bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Compliance Guide 🔍

(Compliance updated until commit 763984d)

Previous compliance checks

Uh oh!

qodo-code-review bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Code Suggestions ✨

Examples:

Solution Walkthrough:

Before:

After:

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

parashardhapola commented Nov 20, 2025 •

edited by qodo-code-review bot

Loading

qodo-code-review bot commented Nov 20, 2025 •

edited

Loading

(Compliance updated until commit `763984d`)

qodo-code-review bot commented Nov 20, 2025 •

edited

Loading