fabricatio_plot.capabilities.synthesize_data
Module for synthesizing data using LLM capabilities in a concurrent and batched manner.
Classes
Abstract base class for synthesizing structured data based on natural language requirements. |
Module Contents
- class fabricatio_plot.capabilities.synthesize_data.SynthesizeData(/, **data: Any)
Bases:
fabricatio_core.capabilities.usages.UseLLM,abc.ABCAbstract base class for synthesizing structured data based on natural language requirements.
Inherits core functionality from UseLLM and ABC, enabling LLM-driven data generation workflows. Provides methods to generate headers, CSV content, and aggregated data batches.
- async generate_header(requirement: str | List[str], **kwargs: Unpack[fabricatio_core.models.kwargs_types.ListStringKwargs]) None | List[str] | List[List[str] | None]
Generate appropriate column headers based on the given requirement(s).
- Parameters:
requirement – A single or list of natural language descriptions of the required data.
**kwargs – Additional keyword arguments passed to the underlying LLM processing.
- Returns:
A list of generated headers matching the input requirement structure, or None if generation fails.
- async generate_csv_data(requirement: str, header: List[str] | None, rows: int = 100, **kwargs: Unpack[fabricatio_core.models.kwargs_types.ValidateKwargs[str]]) pandas.DataFrame | None
Generate CSV-formatted synthetic data matching the specified requirement and header.
- Parameters:
requirement – Natural language description of the required dataset characteristics.
header – Optional list of column names; if not provided, will be auto-generated.
rows – Number of data rows to generate (default: 100).
**kwargs – Additional validation-aware keyword arguments for LLM processing.
- Returns:
A pandas DataFrame containing the synthesized data if successful, or None if parsing or generation fails.
- async synthesize_data(requirement: str, header: List[str] | None = None, rows: int = 1000, batch_size: int = 100, **kwargs: Unpack[fabricatio_core.models.kwargs_types.ValidateKwargs[str]]) pandas.DataFrame | None
Synthesize large datasets efficiently by parallel batch generation and concatenation.
- Parameters:
requirement – Natural language specification of the desired dataset.
rows – Total number of rows to generate (default: 1000).
batch_size – Number of rows per parallel batch (default: 100).
header – Optional explicit column header list; if omitted, auto-generated.
**kwargs – Validation-aware keyword arguments passed to LLM processing.
- Returns:
A unified DataFrame containing all successfully generated data, or None if no batches succeed.