Sunday, February 8, 2026
HomeArtificial IntelligenceThe way to Design Manufacturing-Grade Mock Information Pipelines Utilizing Polyfactory with Dataclasses,...

The way to Design Manufacturing-Grade Mock Information Pipelines Utilizing Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Fashions

On this tutorial, we stroll by way of a sophisticated, end-to-end exploration of Polyfactory, specializing in how we are able to generate wealthy, practical mock information immediately from Python kind hints. We begin by establishing the setting and progressively construct factories for information lessons, Pydantic fashions, and attrs-based lessons, whereas demonstrating customization, overrides, calculated fields, and the technology of nested objects. As we transfer by way of every snippet, we present how we are able to management randomness, implement constraints, and mannequin real-world constructions, making this tutorial immediately relevant to testing, prototyping, and data-driven growth workflows. Try the FULL CODES right here.

import subprocess
import sys


def install_package(bundle):
   subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", package])


packages = [
   "polyfactory",
   "pydantic",
   "email-validator",
   "faker",
   "msgspec",
   "attrs"
]


for bundle in packages:
   attempt:
       install_package(bundle)
       print(f"✓ Put in {bundle}")
   besides Exception as e:
       print(f"✗ Failed to put in {bundle}: {e}")


print("n")


print("=" * 80)
print("SECTION 2: Primary Dataclass Factories")
print("=" * 80)


from dataclasses import dataclass
from typing import Listing, Optionally available
from datetime import datetime, date
from uuid import UUID
from polyfactory.factories import DataclassFactory


@dataclass
class Handle:
   road: str
   metropolis: str
   nation: str
   zip_code: str


@dataclass
class Individual:
   id: UUID
   title: str
   e-mail: str
   age: int
   birth_date: date
   is_active: bool
   deal with: Handle
   phone_numbers: Listing[str]
   bio: Optionally available[str] = None


class PersonFactory(DataclassFactory[Person]):
   go


individual = PersonFactory.construct()
print(f"Generated Individual:")
print(f"  ID: {individual.id}")
print(f"  Title: {individual.title}")
print(f"  E-mail: {individual.e-mail}")
print(f"  Age: {individual.age}")
print(f"  Handle: {individual.deal with.metropolis}, {individual.deal with.nation}")
print(f"  Cellphone Numbers: {individual.phone_numbers[:2]}")
print()


individuals = PersonFactory.batch(5)
print(f"Generated {len(individuals)} individuals:")
for i, p in enumerate(individuals, 1):
   print(f"  {i}. {p.title} - {p.e-mail}")
print("n")

We arrange the setting and guarantee all required dependencies are put in. We additionally introduce the core concept of utilizing Polyfactory to generate mock information from kind hints. By initializing the essential dataclass factories, we set up the inspiration for all subsequent examples.

print("=" * 80)
print("SECTION 3: Customizing Manufacturing unit Conduct")
print("=" * 80)


from faker import Faker
from polyfactory.fields import Use, Ignore


@dataclass
class Worker:
   employee_id: str
   full_name: str
   division: str
   wage: float
   hire_date: date
   is_manager: bool
   e-mail: str
   internal_notes: Optionally available[str] = None


class EmployeeFactory(DataclassFactory[Employee]):
   __faker__ = Faker(locale="en_US")
   __random_seed__ = 42


   @classmethod
   def employee_id(cls) -> str:
       return f"EMP-{cls.__random__.randint(10000, 99999)}"


   @classmethod
   def full_name(cls) -> str:
       return cls.__faker__.title()


   @classmethod
   def division(cls) -> str:
       departments = ["Engineering", "Marketing", "Sales", "HR", "Finance"]
       return cls.__random__.selection(departments)


   @classmethod
   def wage(cls) -> float:
       return spherical(cls.__random__.uniform(50000, 150000), 2)


   @classmethod
   def e-mail(cls) -> str:
       return cls.__faker__.company_email()


staff = EmployeeFactory.batch(3)
print("Generated Workers:")
for emp in staff:
   print(f"  {emp.employee_id}: {emp.full_name}")
   print(f"    Division: {emp.division}")
   print(f"    Wage: ${emp.wage:,.2f}")
   print(f"    E-mail: {emp.e-mail}")
   print()
print()


print("=" * 80)
print("SECTION 4: Area Constraints and Calculated Fields")
print("=" * 80)


@dataclass
class Product:
   product_id: str
   title: str
   description: str
   value: float
   discount_percentage: float
   stock_quantity: int
   final_price: Optionally available[float] = None
   sku: Optionally available[str] = None


class ProductFactory(DataclassFactory[Product]):
   @classmethod
   def product_id(cls) -> str:
       return f"PROD-{cls.__random__.randint(1000, 9999)}"


   @classmethod
   def title(cls) -> str:
       adjectives = ["Premium", "Deluxe", "Classic", "Modern", "Eco"]
       nouns = ["Widget", "Gadget", "Device", "Tool", "Appliance"]
       return f"{cls.__random__.selection(adjectives)} {cls.__random__.selection(nouns)}"


   @classmethod
   def value(cls) -> float:
       return spherical(cls.__random__.uniform(10.0, 1000.0), 2)


   @classmethod
   def discount_percentage(cls) -> float:
       return spherical(cls.__random__.uniform(0, 30), 2)


   @classmethod
   def stock_quantity(cls) -> int:
       return cls.__random__.randint(0, 500)


   @classmethod
   def construct(cls, **kwargs):
       occasion = tremendous().construct(**kwargs)
       if occasion.final_price is None:
           occasion.final_price = spherical(
               occasion.value * (1 - occasion.discount_percentage / 100), 2
           )
       if occasion.sku is None:
           name_part = occasion.title.exchange(" ", "-").higher()[:10]
           occasion.sku = f"{occasion.product_id}-{name_part}"
       return occasion


merchandise = ProductFactory.batch(3)
print("Generated Merchandise:")
for prod in merchandise:
   print(f"  {prod.sku}")
   print(f"    Title: {prod.title}")
   print(f"    Worth: ${prod.value:.2f}")
   print(f"    Low cost: {prod.discount_percentage}%")
   print(f"    Last Worth: ${prod.final_price:.2f}")
   print(f"    Inventory: {prod.stock_quantity} models")
   print()
print()

We give attention to producing easy however practical mock information utilizing dataclasses and default Polyfactory habits. We present find out how to shortly create single cases and batches with out writing any customized logic. It helps us validate how Polyfactory mechanically interprets kind hints to populate nested constructions.

print("=" * 80)
print("SECTION 6: Advanced Nested Buildings")
print("=" * 80)


from enum import Enum


class OrderStatus(str, Enum):
   PENDING = "pending"
   PROCESSING = "processing"
   SHIPPED = "shipped"
   DELIVERED = "delivered"
   CANCELLED = "cancelled"


@dataclass
class OrderItem:
   product_name: str
   amount: int
   unit_price: float
   total_price: Optionally available[float] = None


@dataclass
class ShippingInfo:
   service: str
   tracking_number: str
   estimated_delivery: date


@dataclass
class Order:
   order_id: str
   customer_name: str
   customer_email: str
   standing: OrderStatus
   objects: Listing[OrderItem]
   order_date: datetime
   shipping_info: Optionally available[ShippingInfo] = None
   total_amount: Optionally available[float] = None
   notes: Optionally available[str] = None


class OrderItemFactory(DataclassFactory[OrderItem]):
   @classmethod
   def product_name(cls) -> str:
       merchandise = ["Laptop", "Mouse", "Keyboard", "Monitor", "Headphones",
                  "Webcam", "USB Cable", "Phone Case", "Charger", "Tablet"]
       return cls.__random__.selection(merchandise)


   @classmethod
   def amount(cls) -> int:
       return cls.__random__.randint(1, 5)


   @classmethod
   def unit_price(cls) -> float:
       return spherical(cls.__random__.uniform(5.0, 500.0), 2)


   @classmethod
   def construct(cls, **kwargs):
       occasion = tremendous().construct(**kwargs)
       if occasion.total_price is None:
           occasion.total_price = spherical(occasion.amount * occasion.unit_price, 2)
       return occasion


class ShippingInfoFactory(DataclassFactory[ShippingInfo]):
   @classmethod
   def service(cls) -> str:
       carriers = ["FedEx", "UPS", "DHL", "USPS"]
       return cls.__random__.selection(carriers)


   @classmethod
   def tracking_number(cls) -> str:
       return ''.be a part of(cls.__random__.selections('0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ', okay=12))


class OrderFactory(DataclassFactory[Order]):
   @classmethod
   def order_id(cls) -> str:
       return f"ORD-{datetime.now().12 months}-{cls.__random__.randint(100000, 999999)}"


   @classmethod
   def objects(cls) -> Listing[OrderItem]:
       return OrderItemFactory.batch(cls.__random__.randint(1, 5))


   @classmethod
   def construct(cls, **kwargs):
       occasion = tremendous().construct(**kwargs)
       if occasion.total_amount is None:
           occasion.total_amount = spherical(sum(merchandise.total_price for merchandise in occasion.objects), 2)
       if occasion.shipping_info is None and occasion.standing in [OrderStatus.SHIPPED, OrderStatus.DELIVERED]:
           occasion.shipping_info = ShippingInfoFactory.construct()
       return occasion


orders = OrderFactory.batch(2)
print("Generated Orders:")
for order in orders:
   print(f"n  Order {order.order_id}")
   print(f"    Buyer: {order.customer_name} ({order.customer_email})")
   print(f"    Standing: {order.standing.worth}")
   print(f"    Gadgets ({len(order.objects)}):")
   for merchandise so as.objects:
       print(f"      - {merchandise.amount}x {merchandise.product_name} @ ${merchandise.unit_price:.2f} = ${merchandise.total_price:.2f}")
   print(f"    Complete: ${order.total_amount:.2f}")
   if order.shipping_info:
       print(f"    Transport: {order.shipping_info.service} - {order.shipping_info.tracking_number}")
print("n")

We construct extra complicated area logic by introducing calculated and dependent fields inside factories. We present how we are able to derive values comparable to last costs, totals, and delivery particulars after object creation. This permits us to mannequin practical enterprise guidelines immediately inside our check information turbines.

print("=" * 80)
print("SECTION 7: Attrs Integration")
print("=" * 80)


import attrs
from polyfactory.factories.attrs_factory import AttrsFactory


@attrs.outline
class BlogPost:
   title: str
   creator: str
   content material: str
   views: int = 0
   likes: int = 0
   revealed: bool = False
   published_at: Optionally available[datetime] = None
   tags: Listing[str] = attrs.subject(manufacturing facility=listing)


class BlogPostFactory(AttrsFactory[BlogPost]):
   @classmethod
   def title(cls) -> str:
       templates = [
           "10 Tips for {}",
           "Understanding {}",
           "The Complete Guide to {}",
           "Why {} Matters",
           "Getting Started with {}"
       ]
       matters = ["Python", "Data Science", "Machine Learning", "Web Development", "DevOps"]
       template = cls.__random__.selection(templates)
       subject = cls.__random__.selection(matters)
       return template.format(subject)


   @classmethod
   def content material(cls) -> str:
       return " ".be a part of(Faker().sentences(nb=cls.__random__.randint(3, 8)))


   @classmethod
   def views(cls) -> int:
       return cls.__random__.randint(0, 10000)


   @classmethod
   def likes(cls) -> int:
       return cls.__random__.randint(0, 1000)


   @classmethod
   def tags(cls) -> Listing[str]:
       all_tags = ["python", "tutorial", "beginner", "advanced", "guide",
                  "tips", "best-practices", "2024"]
       return cls.__random__.pattern(all_tags, okay=cls.__random__.randint(2, 5))


posts = BlogPostFactory.batch(3)
print("Generated Weblog Posts:")
for submit in posts:
   print(f"n  '{submit.title}'")
   print(f"    Creator: {submit.creator}")
   print(f"    Views: {submit.views:,} | Likes: {submit.likes:,}")
   print(f"    Revealed: {submit.revealed}")
   print(f"    Tags: {', '.be a part of(submit.tags)}")
   print(f"    Preview: {submit.content material[:100]}...")
print("n")


print("=" * 80)
print("SECTION 8: Constructing with Particular Overrides")
print("=" * 80)


custom_person = PersonFactory.construct(
   title="Alice Johnson",
   age=30,
   e-mail="[email protected]"
)
print(f"Customized Individual:")
print(f"  Title: {custom_person.title}")
print(f"  Age: {custom_person.age}")
print(f"  E-mail: {custom_person.e-mail}")
print(f"  ID (auto-generated): {custom_person.id}")
print()


vip_customers = PersonFactory.batch(
   3,
   bio="VIP Buyer"
)
print("VIP Prospects:")
for buyer in vip_customers:
   print(f"  {buyer.title}: {buyer.bio}")
print("n")

We lengthen Polyfactory utilization to validated Pydantic fashions and attrs-based lessons. We reveal how we are able to respect subject constraints, validators, and default behaviors whereas nonetheless producing legitimate information at scale. It ensures our mock information stays appropriate with actual utility schemas.

print("=" * 80)
print("SECTION 9: Area-Degree Management with Use and Ignore")
print("=" * 80)


from polyfactory.fields import Use, Ignore


@dataclass
class Configuration:
   app_name: str
   model: str
   debug: bool
   created_at: datetime
   api_key: str
   secret_key: str


class ConfigFactory(DataclassFactory[Configuration]):
   app_name = Use(lambda: "MyAwesomeApp")
   model = Use(lambda: "1.0.0")
   debug = Use(lambda: False)


   @classmethod
   def api_key(cls) -> str:
       return f"api_key_{''.be a part of(cls.__random__.selections('0123456789abcdef', okay=32))}"


   @classmethod
   def secret_key(cls) -> str:
       return f"secret_{''.be a part of(cls.__random__.selections('0123456789abcdef', okay=64))}"


configs = ConfigFactory.batch(2)
print("Generated Configurations:")
for config in configs:
   print(f"  App: {config.app_name} v{config.model}")
   print(f"    Debug: {config.debug}")
   print(f"    API Key: {config.api_key[:20]}...")
   print(f"    Created: {config.created_at}")
   print()
print()


print("=" * 80)
print("SECTION 10: Mannequin Protection Testing")
print("=" * 80)


from pydantic import BaseModel, ConfigDict
from typing import Union


class PaymentMethod(BaseModel):
   model_config = ConfigDict(use_enum_values=True)
   kind: str
   card_number: Optionally available[str] = None
   bank_name: Optionally available[str] = None
   verified: bool = False


class PaymentMethodFactory(ModelFactory[PaymentMethod]):
   __model__ = PaymentMethod


payment_methods = [
   PaymentMethodFactory.build(type="card", card_number="4111111111111111"),
   PaymentMethodFactory.build(type="bank", bank_name="Chase Bank"),
   PaymentMethodFactory.build(verified=True),
]


print("Fee Methodology Protection:")
for i, pm in enumerate(payment_methods, 1):
   print(f"  {i}. Sort: {pm.kind}")
   if pm.card_number:
       print(f"     Card: {pm.card_number}")
   if pm.bank_name:
       print(f"     Financial institution: {pm.bank_name}")
   print(f"     Verified: {pm.verified}")
print("n")


print("=" * 80)
print("TUTORIAL SUMMARY")
print("=" * 80)
print("""
This tutorial coated:


1. ✓ Primary Dataclass Factories - Easy mock information technology
2. ✓ Customized Area Turbines - Controlling particular person subject values
3. ✓ Area Constraints - Utilizing PostGenerated for calculated fields
4. ✓ Pydantic Integration - Working with validated fashions
5. ✓ Advanced Nested Buildings - Constructing associated objects
6. ✓ Attrs Help - Various to dataclasses
7. ✓ Construct Overrides - Customizing particular cases
8. ✓ Use and Ignore - Specific subject management
9. ✓ Protection Testing - Guaranteeing complete check information


Key Takeaways:
- Polyfactory mechanically generates mock information from kind hints
- Customise technology with classmethods and interior decorators
- Helps a number of libraries: dataclasses, Pydantic, attrs, msgspec
- Use PostGenerated for calculated/dependent fields
- Override particular values whereas conserving others random
- Good for testing, growth, and prototyping


For extra info:
- Documentation: https://polyfactory.litestar.dev/
- GitHub: https://github.com/litestar-org/polyfactory
""")
print("=" * 80)

We cowl superior utilization patterns comparable to express overrides, fixed subject values, and protection testing situations. We present how we are able to deliberately assemble edge instances and variant cases for strong testing. This last step ties all the things collectively by demonstrating how Polyfactory helps complete and production-grade check information methods.

In conclusion, we demonstrated how Polyfactory permits us to create complete, versatile check information with minimal boilerplate whereas nonetheless retaining fine-grained management over each subject. We confirmed find out how to deal with easy entities, complicated nested constructions, and Pydantic mannequin validation, in addition to express subject overrides, inside a single, constant factory-based strategy. General, we discovered that Polyfactory permits us to maneuver sooner and check extra confidently, because it reliably generates practical datasets that carefully mirror production-like situations with out sacrificing readability or maintainability.


Try the FULL CODES right here. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as effectively.


RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments