Skip to main content

Python Pydantic: Validate Data Like a Pro

Advanced25 min6 exercises120 XP
0/6 exercises

Every application deals with messy data. User forms with missing fields, API responses with wrong types, config files with invalid values. Data validation is the practice of checking that data matches your expectations before you use it.

The Pydantic library is the most popular tool for this in Python. But instead of installing a library, we can learn the patterns behind Pydantic using pure Python. In this tutorial, you'll build validation systems using dataclasses and __post_init__ — the same concepts that power Pydantic under the hood.

By the end, you'll know how to validate types, enforce constraints, build custom validators, create nested models, and serialize data — all patterns you'll recognize when you use Pydantic in a real project.

How Do You Validate Data in a Dataclass?

Python's dataclasses give you structured data containers. By adding a __post_init__ method, you can run validation logic right after the object is created. This is exactly how Pydantic's validators work conceptually.

Basic validation with __post_init__
Loading editor...

__post_init__ runs automatically after __init__. If the data is invalid, we raise a ValueError with a clear message. This is the foundation of all data validation.

What Is Type Coercion and Why Does It Matter?

Real-world data is messy. An API might send age as "25" (a string) instead of 25 (an integer). Type coercion means automatically converting data to the expected type when possible.

Type coercion in __post_init__
Loading editor...

The string "9.99" was automatically converted to the float 9.99, and "3" became the integer 3. This is exactly what Pydantic does by default.

How Do You Build Reusable Validators?

When you have validation rules that apply to many fields or many classes, you can extract them into reusable validator functions. This mirrors Pydantic's @field_validator decorator pattern.

Reusable validator functions
Loading editor...

Notice how each validator both validates and transforms the data — validate_email lowercases the email, validate_non_empty strips whitespace. This pattern of validate-then-transform is central to Pydantic.

How Do You Handle Nested Data?

Real data is often nested — an Order contains a Customer, which contains an Address. Pydantic handles this elegantly with nested BaseModels. We can do the same with nested dataclasses.

Nested dataclasses with validation
Loading editor...

The key trick is in Customer.__post_init__: if address is a raw dictionary, we convert it to an Address object. This mirrors Pydantic's automatic dict-to-model conversion.

How Do You Convert Models to Dictionaries and JSON?

Pydantic models have .model_dump() and .model_dump_json() methods. We can add similar capabilities to our dataclasses using dataclasses.asdict() and the json module.

Serialization and deserialization
Loading editor...

The to_dict, to_json, and from_dict methods mirror Pydantic's model_dump(), model_dump_json(), and model_validate(). This pattern makes your data models easy to save, send over networks, and reconstruct.

Dataclass vs Pydantic: At a Glance

Plain dataclass (no validation)
@dataclass
class User:
    name: str
    age: int

# Accepts anything — no checks!
user = User(name=123, age="not a number")
# No error, but data is wrong
Validated dataclass (Pydantic-style)
@dataclass
class User:
    name: str
    age: int

    def __post_init__(self):
        self.name = str(self.name).strip()
        self.age = int(self.age)
        if self.age < 0:
            raise ValueError("age < 0")

Practice Exercises

Basic Validated Model
Write Code

Create a Temperature dataclass with a single field celsius (float). In __post_init__, coerce celsius to float and validate that it's above absolute zero (-273.15). If invalid, raise ValueError.

Create two instances: Temperature(celsius='100') and Temperature(celsius=-10). Print each one's celsius value.

Loading editor...
Multiple Field Validation
Write Code

Create a PasswordPolicy dataclass with fields min_length (int), require_uppercase (bool), and require_digits (bool).

Add a method validate_password(self, password: str) -> bool that returns True if the password meets all enabled requirements, False otherwise.

Create a policy with min_length=8, require_uppercase=True, require_digits=True.

Test it with these passwords and print the results:

  • "hello" (should be False)
  • "HelloWorld1" (should be True)
  • "ALLCAPS123" (should be True)
  • Loading editor...
    Predict the Output: Coercion Behavior
    Predict Output

    Read the code carefully and predict exactly what it prints.

    from dataclasses import dataclass
    
    @dataclass
    class Score:
        value: float
        label: str
    
        def __post_init__(self):
            self.value = float(self.value)
            self.label = str(self.label).upper()
            if self.value > 100:
                self.value = 100.0
    
    s1 = Score(value="85", label="math")
    s2 = Score(value=150, label="science")
    print(f"{s1.label}: {s1.value}")
    print(f"{s2.label}: {s2.value}")
    Loading editor...
    Nested Validated Models
    Write Code

    Create two dataclasses:

    1. Item with name (str) and price (float). Validate that price >= 0.

    2. Order with customer (str) and items (list). In __post_init__, convert any dicts in the items list to Item objects. Add a method total() that returns the sum of all item prices.

    Create an order:

    order = Order(
        customer='Alice',
        items=[{'name': 'Book', 'price': 12.99}, {'name': 'Pen', 'price': 1.50}]
    )

    Print the customer name and then the total.

    Loading editor...
    Fix the Bug: Validation Order
    Fix the Bug

    This UserProfile class has a bug in its validation logic. The type coercion happens *after* the range check, so string inputs cause a TypeError instead of being validated properly. Fix the order of operations in __post_init__.

    The code should print the profile when given age="25".

    Loading editor...
    Build a Mini Validator Framework
    Write Code

    Build a ValidatedModel base class with a to_dict() method that returns a dictionary of all fields, and a from_dict(cls, data) classmethod that creates an instance from a dictionary.

    Then create a Book class that inherits from ValidatedModel with fields title (str), pages (int), and rating (float). Validate in __post_init__ that pages > 0 and 0 <= rating <= 5.

    Create a book from a dict, print its to_dict() output, and then reconstruct it with from_dict and print the title.

    Loading editor...