fabricatio_plot.toolboxes.synthesize
Synthetic Data Generation Toolbox Module (Minimalist).
This module provides simple, flat functions for generating synthetic data columns. Each function returns a pandas Series. Designed for direct LLM invocation.
Attributes
Functions
|
Generate a uniform numeric column. |
|
Generate a normally distributed numeric column. |
|
Generate a categorical column. |
|
Generate a random datetime column. |
|
Generate a simple text column with random suffixes. |
|
Generate a new column correlated with a base series using closure and vectorized ops. |
|
Inject missing (NaN) values into a series. |
Module Contents
- fabricatio_plot.toolboxes.synthesize.data_syn_toolbox
- fabricatio_plot.toolboxes.synthesize.numeric_column(n_rows: int, low: float = 0.0, high: float = 1.0) pandas.Series
Generate a uniform numeric column.
- Parameters:
n_rows – Number of values to generate.
low – Minimum value.
high – Maximum value.
- Returns:
Generated numeric series.
- Return type:
pd.Series
- fabricatio_plot.toolboxes.synthesize.normal_column(n_rows: int, mean: float = 0.0, std: float = 1.0) pandas.Series
Generate a normally distributed numeric column.
- Parameters:
n_rows – Number of values to generate.
mean – Mean of the distribution.
std – Standard deviation.
- Returns:
Generated numeric series.
- Return type:
pd.Series
- fabricatio_plot.toolboxes.synthesize.categorical_column(n_rows: int, categories: List[str]) pandas.Series
Generate a categorical column.
- Parameters:
n_rows – Number of values to generate.
categories – List of possible categories.
- Returns:
Generated categorical series.
- Return type:
pd.Series
- fabricatio_plot.toolboxes.synthesize.datetime_column(n_rows: int, start: str = '2020-01-01', end: str = '2023-12-31') pandas.Series
Generate a random datetime column.
- Parameters:
n_rows – Number of values to generate.
start – Start date (ISO format).
end – End date (ISO format).
- Returns:
Generated datetime series.
- Return type:
pd.Series
- fabricatio_plot.toolboxes.synthesize.text_column(n_rows: int, prefix: str = 'item_') pandas.Series
Generate a simple text column with random suffixes.
- Parameters:
n_rows – Number of values to generate.
prefix – Prefix for each text item.
- Returns:
Generated text series.
- Return type:
pd.Series
Generate a new column correlated with a base series using closure and vectorized ops.
This function creates a new series that has approximately the specified correlation with the input base_series. It uses numpy’s random functions internally.
- Parameters:
base_series – The existing pandas Series to correlate with.
correlation – Target correlation coefficient (between 0 and 1).
- Returns:
New series with desired correlation.
- Return type:
pd.Series
- fabricatio_plot.toolboxes.synthesize.inject_missing(series: pandas.Series, rate: float = 0.05) pandas.Series
Inject missing (NaN) values into a series.
- Parameters:
series – Input pandas Series.
rate – Proportion of values to set as NaN (0.0 to 1.0).
- Returns:
Series with injected missing values.
- Return type:
pd.Series