Simplifying Data Validation in Python – Real Python

Pydantic’s primary way of defining data schemas is through models. A Pydantic model is an object, similar to a Python dataclass, that defines and stores data about an entity with annotated fields. Unlike dataclasses, Pydantic’s focus is centered around automatic data parsing, validation, and serialization.The best way to understand this is to create your own models, and that’s what you’ll do next. Working With Pydantic BaseModels
Suppose you’re building an application used by a human resources department to manage employee information, and you need a way to verify that new employee information is in the correct form. For example, each employee should have an ID, name, email, date of birth, salary, department, and benefits selection. This is a perfect use case for a Pydantic model!
To define your employee model, you create a class that inherits from Pydantic’s BaseModel:

First, you import the dependencies you need to define your employee model. You then create an enum to represent the different departments in your company, and you’ll use this to annotate the department field in your employee model.
Then, you define your Pydantic model, Employee, which inherits from BaseModel and defines the names and expected types of your employee fields via annotations. Here’s a breakdown of each field you’ve defined in Employee and how Pydantic validates it when an Employee object is instantiated:

employee_id: This is the UUID for the employee you want to store information for. By using the UUID annotation, Pydantic ensures this field is always a valid UUID. Each instance of Employee will be assigned a UUID by default, as you specified by calling uuid4().
name: The employee’s name, which Pydantic expects to be a string.
email: Pydantic will ensure that each employee email is valid by using Python’s email-validator library under the hood.
date_of_birth: Each employee’s date of birth must be a valid date, as annotated by date from Python’s datetime module. If you pass a string into date_of_birth, Pydantic will attempt to parse and convert it to a date object.
salary: This is the employee’s salary, and it’s expected to be a float.
department: Each employee’s department must be one of HR, SALES, IT, or ENGINEERING, as defined in your Department enum.
elected_benefits: This field stores whether the employee has elected benefits, and Pydantic expects it to be a Boolean.

The simplest way to create an Employee object is to instantiate it as you would any other Python object. To do this, open a Python REPL and run the following code:

In this block, you import Employee and create an object with all of the required employee fields. Pydantic successfully validates and coerces the fields you passed in, and it creates a valid Employee object. Notice how Pydantic automatically converts your date string into a date object and your IT string to its respective Department enum.
Next, look at how Pydantic responds when you try to pass invalid data to an Employee instance:

In this example, you created an Employee object with invalid data fields. Pydantic gives you a detailed error message for each field, telling you what was expected, what was received, and where you can go to learn more about the error.
This detailed validation is powerful because it prevents you from storing invalid data in Employee. This also gives you confidence that the Employee objects you instantiate without errors contain the data you’re expecting, and you can trust this data downstream in your code or in other applications.
Pydantic’s BaseModel is equipped with a suite of methods that make it easy to create models from other objects, such as dictionaries and JSON. For example, if you want to instantiate an Employee object from a dictionary, you can use the .model_validate() class method:

Here, you create new_employee_dict, a dictionary with your employee fields, and pass it into .model_validate() to create an Employee instance. Under the hood, Pydantic validates each dictionary entry to ensure it conforms with the data you’re expecting. If any of the data is invalid, Pydantic will throw an error in the same way you saw previously. You’ll also be notified if any fields are missing from the dictionary.
You can do the same thing with JSON objects using .model_validate_json():

In this example, new_employee_json is a valid JSON string that stores your employee fields, and you use .model_validate_json() to validate and create an Employee object from new_employee_json. While it may seem subtle, the ability to create and validate Pydantic models from JSON is powerful because JSON is one of the most popular ways to transfer data across the web. This is one of the reasons why FastAPI relies on Pydantic to create REST APIs.
You can also serialize Pydantic models as dictionaries and JSON:

Here, you use .model_dump() and .model_dump_json() to convert your new_employee model to a dictionary and JSON string, respectively. Notice how .model_dump_json() returns a JSON object with date_of_birth and department stored as strings.
While Pydantic already validated these fields and converted your model to JSON, whoever uses this JSON downstream won’t know that date_of_birth needs to be a valid date and department needs to be a category in your Department enum. To solve this, you can create a JSON schema from your Employee model.
JSON schemas tell you what fields are expected and what values are represented in a JSON object. You can think of this as the JSON version of your Employee class definition. Here’s how you generate a JSON schema for Employee:

When you call .model_json_schema(), you get a dictionary representing your model’s JSON schema. The first entry you see shows you the values that department can take on. You also see information about how your fields should be formatted. For instance, according to this JSON schema, employee_id is expected to be a UUID and date_of_birth is expected to be a date.
You can convert your JSON schema to a JSON string using json.dumps(), which enables just about any programming language to validate JSON objects produced by your Employee model. In other words, not only can Pydantic validate incoming data and serialize it as JSON, but it also provides other programming languages with the information they need to validate your model’s data via JSON schemas.
With that, you now understand how to use Pydantic’s BaseModel to validate and serialize your data. Up next, you’ll learn how to use fields to further customize your validation.
Using Fields for Customization and Metadata
So far, your Employee model validates the data type of each field and ensures some of the fields, such as email, date_of_birth, and department, take on valid formats. However, let’s say you also want to ensure that salary is a positive number, name isn’t an empty string, and email contains your company’s domain name. You can use Pydantic’s Field class to accomplish this.
The Field class allows you to customize and add metadata to your model’s fields. To see how this works, take a look at this example:

Here, you import Field along with the other dependencies you used previously, and you assign default values to some of the Employee fields. Here’s a breakdown of the Field parameters you used to add additional validation and metadata to your fields:

default_factory: You use this to define a callable that generates default values. In the example above, you set default_factory to uuid4. This calls uuid4() to generate a random UUID for employee_id when needed. You can also use a lambda function for more flexibility.
frozen: This is a Boolean parameter you can set to make your fields immutable. This means, when frozen is set to True, the corresponding field can’t be changed after your model is instantiated. In this example, employee_id, name, and date_of_birth are made immutable using the frozen parameter.
min_length: You can control the length of string fields with min_length and max_length. In the example above, you ensure that name is at least one character long.
pattern: For string fields, you can set pattern to a regex expression to match whatever pattern you’re expecting for that field. For instance, when you use the regex expression in the example above for email, Pydantic will ensure that every email ends with
alias: You can use this parameter when you want to assign an alias to your fields. For example, you can allow date_of_birth to be called birth_date or salary to be called compensation. You can use these aliases when instantiating or serializing a model.
gt: This parameter, short for “greater than”, is used for numeric fields to set minimum values. In this example, setting gt=0 ensures salary is always a positive number. Pydantic also has other numeric constraints, such as lt which is short for “less than”.
repr: This Boolean parameter determines whether a field is displayed in the model’s field representation. In this example, you won’t see date_of_birth or salary when you print an Employee instance.

To see this extra validation in action, notice what happens when you try to create an Employee model with incorrect data:

Here, you import your updated Employee model and attempt to validate a dictionary with incorrect data. In response, Pydantic gives you three validation errors saying the name needs to be at least one character, email should match your company’s domain name, and salary should be greater than zero.
Now notice the additional features you get when you validate correct Employee data:

In this block, you create a dictionary and an Employee model with .model_validate(). In employee_data, notice how you used birth_date instead of date_of_birth and compensation instead of salary. Pydantic recognizes these aliases and assigns their values to the correct field name internally.
Because you set repr=False, you can see that salary and date_of_birth aren’t displayed in the Employee representation. You have to explicitly access them as attributes to see their values. Lastly, notice what happens when you try to change a frozen field:

Here, you first change the value of department from IT to HR. This is perfectly acceptable because department isn’t a frozen field. However, when you try to change name, Pydantic gives you an error saying that name is a frozen field.
You now have a solid grasp of Pydantic’s BaseModel and Field classes. With these alone, you can define many different validation rules and metadata on your data schemas, but sometimes this isn’t enough. Up next, you’ll take your field validation even further with Pydantic validators.