fabricatio_plot.capabilities.synthesize_data

Module for synthesizing data using LLM capabilities in a concurrent and batched manner.

Classes

SynthesizeData

Abstract base class for synthesizing structured data based on natural language requirements.

Module Contents

class fabricatio_plot.capabilities.synthesize_data.SynthesizeData(/, **data: Any)

Bases: fabricatio_core.capabilities.usages.UseLLM, abc.ABC

Abstract base class for synthesizing structured data based on natural language requirements.

Inherits core functionality from UseLLM and ABC, enabling LLM-driven data generation workflows. Provides methods to generate headers, CSV content, and aggregated data batches.

async generate_header(requirement: str | List[str], **kwargs: Unpack[fabricatio_core.models.kwargs_types.ListStringKwargs]) None | List[str] | List[List[str] | None]

Generate appropriate column headers based on the given requirement(s).

Parameters:
  • requirement – A single or list of natural language descriptions of the required data.

  • **kwargs – Additional keyword arguments passed to the underlying LLM processing.

Returns:

A list of generated headers matching the input requirement structure, or None if generation fails.

async generate_csv_data(requirement: str, header: List[str] | None, rows: int = 100, **kwargs: Unpack[fabricatio_core.models.kwargs_types.ValidateKwargs[str]]) pandas.DataFrame | None

Generate CSV-formatted synthetic data matching the specified requirement and header.

Parameters:
  • requirement – Natural language description of the required dataset characteristics.

  • header – Optional list of column names; if not provided, will be auto-generated.

  • rows – Number of data rows to generate (default: 100).

  • **kwargs – Additional validation-aware keyword arguments for LLM processing.

Returns:

A pandas DataFrame containing the synthesized data if successful, or None if parsing or generation fails.

async synthesize_data(requirement: str, header: List[str] | None = None, rows: int = 1000, batch_size: int = 100, **kwargs: Unpack[fabricatio_core.models.kwargs_types.ValidateKwargs[str]]) pandas.DataFrame | None

Synthesize large datasets efficiently by parallel batch generation and concatenation.

Parameters:
  • requirement – Natural language specification of the desired dataset.

  • rows – Total number of rows to generate (default: 1000).

  • batch_size – Number of rows per parallel batch (default: 100).

  • header – Optional explicit column header list; if omitted, auto-generated.

  • **kwargs – Validation-aware keyword arguments passed to LLM processing.

Returns:

A unified DataFrame containing all successfully generated data, or None if no batches succeed.