[DCP Ingestion] Add scaling factor by gmechali · Pull Request #580 · datacommonsorg/import

gmechali · 2026-06-23T20:50:08Z

No description provided.

codacy-production · 2026-06-23T20:51:10Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 0 complexity

Metric Results

Complexity 0

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

gemini-code-assist

Code Review

This pull request adds the mapping for dcid:scalingFactor to constants.COLUMN_SCALING_FACTOR in variable_per_row_importer.py. However, a critical bug was identified in _apply_property_defaults where it checks for the original property name instead of the renamed column name, which prevents this and other per-row mappings from working correctly. A fix has been suggested to check for the internal column name instead.

gemini-code-assist · 2026-06-23T20:51:34Z

+    "dcid:scalingFactor":
+        constants.COLUMN_SCALING_FACTOR,  # "scaling_factor"


There is an existing bug in _apply_property_defaults that will prevent this new dcid:scalingFactor mapping (as well as the existing dcid:measurementMethod and dcid:observationPeriod mappings) from working when specified per-row in the CSV.\n\n### The Issue\nIn _write_observations, _apply_column_mappings is called before _apply_property_defaults:\npython\nobservations_df = (self._apply_column_mappings(self.df).pipe(\n self._track_entity_dcids).pipe(\n _apply_property_defaults, obs_props)...\n\n_apply_column_mappings renames the physical columns to their internal names (e.g., constants.COLUMN_SCALING_FACTOR which is "scaling_factor").\n\nHowever, _apply_property_defaults checks for the predicate name (prop, e.g., "scalingFactor") instead of the internal column name (col_name, e.g., "scaling_factor"):\npython\nfor prop, col_name in property_mapping.items():\n default_value = getattr(obs_props, col_name, \"\")\n if prop in df.columns: # <--- This will be False because the column was renamed to col_name\n ...\n else:\n # If the column doesn't exist, use default for all rows\n df[col_name] = default_value # <--- This overwrites the per-row values with the default!\n\n\n### Recommended Fix\nIn _apply_property_defaults (lines 85-119), update the loop to check for col_name instead of prop:\npython\ndef _apply_property_defaults(df: pd.DataFrame,\n obs_props: ObservationProperties) -> pd.DataFrame:\n \"\"\"Apply property defaults, using per-row values where available.\"\"\"\n property_mapping = {\n sc.PREDICATE_UNIT: constants.COLUMN_UNIT,\n sc.PREDICATE_SCALING_FACTOR: constants.COLUMN_SCALING_FACTOR,\n sc.PREDICATE_MEASUREMENT_METHOD: constants.COLUMN_MEASUREMENT_METHOD,\n sc.PREDICATE_OBSERVATION_PERIOD: constants.COLUMN_OBSERVATION_PERIOD,\n }\n\n for prop, col_name in property_mapping.items():\n default_value = getattr(obs_props, col_name, \"\")\n if col_name in df.columns:\n # Replace empty strings with NaN for consistent handling\n source_col = df[col_name].replace(\"\", pd.NA)\n\n # Check if source is numeric before filling (to preserve int format for numeric columns)\n is_source_numeric = pd.api.types.is_numeric_dtype(source_col)\n\n if is_source_numeric:\n df[col_name] = _convert_numeric_to_string(source_col,\n default_for_na=default_value)\n else:\n df[col_name] = source_col.fillna(default_value).astype(str)\n else:\n # If the column doesn't exist, use default for all rows\n df[col_name] = default_value\n\n # Custom properties column (always empty for variable_per_row_importer)\n df[constants.COLUMN_PROPERTIES] = \"\"\n return df\n

References

Ensure JSON fields such as scalingFactor are not cast to numeric types (like FLOAT64) if they are defined as strings in the schema.

Add scaling factor

9f93827

gmechali requested a review from dwnoble June 23, 2026 20:50

gemini-code-assist Bot reviewed Jun 23, 2026

View reviewed changes

gmechali added 2 commits June 23, 2026 22:52

Lint

d7a7af7

More fix on scalingf

6ecdbfd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DCP Ingestion] Add scaling factor#580

[DCP Ingestion] Add scaling factor#580
gmechali wants to merge 3 commits into
datacommonsorg:masterfrom
gmechali:scalingf

gmechali commented Jun 23, 2026

Uh oh!

codacy-production Bot commented Jun 23, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		"dcid:scalingFactor":
		constants.COLUMN_SCALING_FACTOR, # "scaling_factor"

Conversation

gmechali commented Jun 23, 2026

Uh oh!

codacy-production Bot commented Jun 23, 2026

Up to standards ✅

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant