Python Pydantic: Validate Data Like a Pro
Every application deals with messy data. User forms with missing fields, API responses with wrong types, config files with invalid values. Data validation is the practice of checking that data matches your expectations before you use it.
The Pydantic library is the most popular tool for this in Python. But instead of installing a library, we can learn the patterns behind Pydantic using pure Python. In this tutorial, you'll build validation systems using dataclasses and __post_init__ — the same concepts that power Pydantic under the hood.
By the end, you'll know how to validate types, enforce constraints, build custom validators, create nested models, and serialize data — all patterns you'll recognize when you use Pydantic in a real project.
How Do You Validate Data in a Dataclass?
Python's dataclasses give you structured data containers. By adding a __post_init__ method, you can run validation logic right after the object is created. This is exactly how Pydantic's validators work conceptually.
__post_init__ runs automatically after __init__. If the data is invalid, we raise a ValueError with a clear message. This is the foundation of all data validation.
What Is Type Coercion and Why Does It Matter?
Real-world data is messy. An API might send age as "25" (a string) instead of 25 (an integer). Type coercion means automatically converting data to the expected type when possible.
The string "9.99" was automatically converted to the float 9.99, and "3" became the integer 3. This is exactly what Pydantic does by default.
How Do You Build Reusable Validators?
When you have validation rules that apply to many fields or many classes, you can extract them into reusable validator functions. This mirrors Pydantic's @field_validator decorator pattern.
Notice how each validator both validates and transforms the data — validate_email lowercases the email, validate_non_empty strips whitespace. This pattern of validate-then-transform is central to Pydantic.
How Do You Handle Nested Data?
Real data is often nested — an Order contains a Customer, which contains an Address. Pydantic handles this elegantly with nested BaseModels. We can do the same with nested dataclasses.
The key trick is in Customer.__post_init__: if address is a raw dictionary, we convert it to an Address object. This mirrors Pydantic's automatic dict-to-model conversion.
How Do You Convert Models to Dictionaries and JSON?
Pydantic models have .model_dump() and .model_dump_json() methods. We can add similar capabilities to our dataclasses using dataclasses.asdict() and the json module.
The to_dict, to_json, and from_dict methods mirror Pydantic's model_dump(), model_dump_json(), and model_validate(). This pattern makes your data models easy to save, send over networks, and reconstruct.
Dataclass vs Pydantic: At a Glance
@dataclass
class User:
name: str
age: int
# Accepts anything — no checks!
user = User(name=123, age="not a number")
# No error, but data is wrong@dataclass
class User:
name: str
age: int
def __post_init__(self):
self.name = str(self.name).strip()
self.age = int(self.age)
if self.age < 0:
raise ValueError("age < 0")Practice Exercises
Create a Temperature dataclass with a single field celsius (float). In __post_init__, coerce celsius to float and validate that it's above absolute zero (-273.15). If invalid, raise ValueError.
Create two instances: Temperature(celsius='100') and Temperature(celsius=-10). Print each one's celsius value.
Create a PasswordPolicy dataclass with fields min_length (int), require_uppercase (bool), and require_digits (bool).
Add a method validate_password(self, password: str) -> bool that returns True if the password meets all enabled requirements, False otherwise.
Create a policy with min_length=8, require_uppercase=True, require_digits=True.
Test it with these passwords and print the results:
"hello" (should be False)"HelloWorld1" (should be True)"ALLCAPS123" (should be True)Read the code carefully and predict exactly what it prints.
from dataclasses import dataclass
@dataclass
class Score:
value: float
label: str
def __post_init__(self):
self.value = float(self.value)
self.label = str(self.label).upper()
if self.value > 100:
self.value = 100.0
s1 = Score(value="85", label="math")
s2 = Score(value=150, label="science")
print(f"{s1.label}: {s1.value}")
print(f"{s2.label}: {s2.value}")Create two dataclasses:
1. Item with name (str) and price (float). Validate that price >= 0.
2. Order with customer (str) and items (list). In __post_init__, convert any dicts in the items list to Item objects. Add a method total() that returns the sum of all item prices.
Create an order:
order = Order(
customer='Alice',
items=[{'name': 'Book', 'price': 12.99}, {'name': 'Pen', 'price': 1.50}]
)Print the customer name and then the total.
This UserProfile class has a bug in its validation logic. The type coercion happens *after* the range check, so string inputs cause a TypeError instead of being validated properly. Fix the order of operations in __post_init__.
The code should print the profile when given age="25".
Build a ValidatedModel base class with a to_dict() method that returns a dictionary of all fields, and a from_dict(cls, data) classmethod that creates an instance from a dictionary.
Then create a Book class that inherits from ValidatedModel with fields title (str), pages (int), and rating (float). Validate in __post_init__ that pages > 0 and 0 <= rating <= 5.
Create a book from a dict, print its to_dict() output, and then reconstruct it with from_dict and print the title.