In this post we’re going to cover the basics of working with Marshmallow. Marshmallow is a Python library which enables us to easily sanitize and validate content according to a schema.
Schemas are useful when we want to sift through user provided data en-masse as opposed to dealing with each item individually.

Please note that this article refers to the version 2.x of the library.

from marshmallow import Schema, fields

class _Schema(Schema):
    class Meta:
        dateformat = "%Y-%m-%d"

Here we extend the default Schema class with our own which will output a custom date format. The Meta subclass supports plenty of methods to customize our schema, such as excluding some of the fields or explictly including them. More can be read in the manual: help(marshmallow.Schema.Meta).

class Slug(fields.Field):
    def _serialize(self, value, attr, obj):
        if value:
            return str(value).lower().replace(' ', '-')

Above we define a custom field which will serialize its value using the provided function. And we use this field below in our crude blog schema.

class CommentSchema(_Schema):
    author = fields.Str()
    body = fields.Str()

class PostSchema(_Schema):
    slug = Slug(required=True)
    date = fields.DateTime(required=True)
    title = fields.Str(required=True)
    body = fields.Str(required=True)
    tags = fields.List(fields.Str())
    comments = fields.List(fields.Nested(CommentSchema()))

The last two are compound fields, with comments being more interesting as it is a list of comment schemas. When referring to another schema instead of a field type we need to use the Nested method.

And in a nutshell there we have it. Now all that’s left is to use it to sanitize our blog post through the schema:

schema = PostSchema()
post = schema.dump(raw_post).data

Validation was left out from here and we will cover it in another blog post.