fabricatio_plot.toolboxes.synthesize

Synthetic Data Generation Toolbox Module (Minimalist).

This module provides simple, flat functions for generating synthetic data columns. Each function returns a pandas Series. Designed for direct LLM invocation.

Attributes

data_syn_toolbox

Functions

numeric_column(→ pandas.Series)

Generate a uniform numeric column.

normal_column(→ pandas.Series)

Generate a normally distributed numeric column.

categorical_column(→ pandas.Series)

Generate a categorical column.

datetime_column(→ pandas.Series)

Generate a random datetime column.

text_column(→ pandas.Series)

Generate a simple text column with random suffixes.

correlated_column(→ pandas.Series)

Generate a new column correlated with a base series using closure and vectorized ops.

inject_missing(→ pandas.Series)

Inject missing (NaN) values into a series.

Module Contents

fabricatio_plot.toolboxes.synthesize.data_syn_toolbox
fabricatio_plot.toolboxes.synthesize.numeric_column(n_rows: int, low: float = 0.0, high: float = 1.0) pandas.Series

Generate a uniform numeric column.

Parameters:
  • n_rows – Number of values to generate.

  • low – Minimum value.

  • high – Maximum value.

Returns:

Generated numeric series.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.normal_column(n_rows: int, mean: float = 0.0, std: float = 1.0) pandas.Series

Generate a normally distributed numeric column.

Parameters:
  • n_rows – Number of values to generate.

  • mean – Mean of the distribution.

  • std – Standard deviation.

Returns:

Generated numeric series.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.categorical_column(n_rows: int, categories: List[str]) pandas.Series

Generate a categorical column.

Parameters:
  • n_rows – Number of values to generate.

  • categories – List of possible categories.

Returns:

Generated categorical series.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.datetime_column(n_rows: int, start: str = '2020-01-01', end: str = '2023-12-31') pandas.Series

Generate a random datetime column.

Parameters:
  • n_rows – Number of values to generate.

  • start – Start date (ISO format).

  • end – End date (ISO format).

Returns:

Generated datetime series.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.text_column(n_rows: int, prefix: str = 'item_') pandas.Series

Generate a simple text column with random suffixes.

Parameters:
  • n_rows – Number of values to generate.

  • prefix – Prefix for each text item.

Returns:

Generated text series.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.correlated_column(base_series: pandas.Series, correlation: float = 0.8) pandas.Series

Generate a new column correlated with a base series using closure and vectorized ops.

This function creates a new series that has approximately the specified correlation with the input base_series. It uses numpy’s random functions internally.

Parameters:
  • base_series – The existing pandas Series to correlate with.

  • correlation – Target correlation coefficient (between 0 and 1).

Returns:

New series with desired correlation.

Return type:

pd.Series

fabricatio_plot.toolboxes.synthesize.inject_missing(series: pandas.Series, rate: float = 0.05) pandas.Series

Inject missing (NaN) values into a series.

Parameters:
  • series – Input pandas Series.

  • rate – Proportion of values to set as NaN (0.0 to 1.0).

Returns:

Series with injected missing values.

Return type:

pd.Series