PythonのData Validation Library「Cerberus」を使ってみた

簡単すぎて全俺が泣いた。

Pythonって型が厳密じゃないのでお気楽ご気楽にプログラミングできるのはいいのだけれども、辞書の内容が正しいかチェックするのって結構面倒だよね。

data = {
    'id': 100,
    'name': 'ponkotsu',
    'boke': True
}

print( data['okozukai'])

KeyError: 'okozukai'

キーがなかったり

data = {
    'id': 100,
    'name': 'ponkotsu',
    'boke': True
}

print(len(data['id']))

TypeError: object of type 'int' has no len()

型が違ったり。

ということで、自分で辞書の中身をチェックするのも面倒なので探しました。

Cerberus

http://docs.python-cerberus.org/en/stable/

Cerberus is a lightweight and extensible data validation library for Python.
(snip)
Cerberus provides type checking and other base functionality out of the box and is designed to be easily extensible, allowing for easy custom validation. It has no dependencies and is thoroughly tested under Python 2.6, Python 2.7, Python 3.3, Python 3.4, PyPy and PyPy3.

詳しい使い方は以下に親切に書かれている。

Cerberus Usage — Cerberus is a lightweight and extensible data validation library for Python

今回は簡単な紹介と、割と使いそうな所を。

基本

検証したいデータ構造をスキーマとして辞書で定義する。それだけ。

import cerberus

schema = {
    'id': {
        'required': True,
        'type': 'integer',
        'min': 1,
        'max': 100
    },
    'name': {
        'required': True,
        'type': 'string',
        'minlength': 3,
        'maxlength': 10,
    },
    'boke': {
        'type': 'boolean'
    }
}

v = cerberus.Validator(schema)
print(v.validate({'id': 'aa'})) 
# False
print(v.errors) 
# {'id': 'must be of integer type', 'name': 'required field'}

print(v.validate({'id': 0})) 
# False
print(v.errors) 
# {'id': 'min value is 1', 'name': 'required field'}

print(v.validate({'name': ''})) 
# False
print(v.errors) 
# {'name': 'min length is 3', 'id': 'required field'}

print(v.validate({'id': 100, 'name': 'ponkotsu'})) 
# True

print(v.validate({'id': 100, 'name': 'ponkotsu', 'poke': True})) 
# False
print(v.errors) 
# {'poke': 'unknown field'}

これだけで、辞書のキーの存在とか型、範囲とかをチェックしてくれるし、エラーメッセージも生成してくれる。
でも、時にはチェック対象外のキーとかあるよね。

v.allow_unknown = True
print(v.validate({'id': 100, 'name': 'ponkotsu', 'poke': True})) # True

そんな時は「allow_unknown」を有効にするとOK。代わりに検証したいキーには必ず「'required': True」が必要となる。

辞書のネスト

実際によくあるのが辞書のネスト構造。そういうのを自前でチェックするのが面倒なんだよね。

import cerberus

schema = {
    'name': {
        'required': True,
        'type': 'string',
        'empty': False,
    },
    'skill': {
        'required': True,
        'type': 'dict',
        'schema': {
            'lang-c': {'type': 'boolean'},
        }
    },
}

v = cerberus.Validator(schema)
print(v.validate({'name': 'pomkotsu', 'skill': {'lang-c': True}})) 
# True

綺麗に対応してくれてます。

print(v.validate({'name': 'pomkotsu', 'skill': {'lang-c': True,'lang-c++': True}})) 
# False
print(v.errors)
# {'skill': {'lang-c++': 'unknown field'}}

でも、辞書のキーが追加になると、

import cerberus

schema = {
    'name': {
        'required': True,
        'type': 'string',
        'empty': False,
    },
    'skill': {
        'required': True,
        'type': 'dict',
        'schema': {
            'lang-c':    {'type': 'boolean'},
            'lang-c++':  {'type': 'boolean'},
        }
    },
}

v = cerberus.Validator(schema)
# True
print(v.validate({'name': 'pomkotsu', 'skill': {'lang-c': True,'lang-c++': True}}))

スキーマ定義し直すの面倒。

辞書のキー文字列が任意の場合。

データ構造において、キーが任意の文字列で同じデータ構造っていうのはよくある。というかありまくり。

そんな時はコチラ。

import cerberus

schema = {
    'name': {
        'required': True,
        'type': 'string',
        'empty': False,
    },
    'skill': {
        'required': True,
        'type': 'dict',
        'valueschema': {'type': 'boolean'},
    },
}

v = cerberus.Validator(schema)
# True
print(v.validate({'name': 'pomkotsu', 'skill': {'lang-c': True, 'lang-python': True}}))

"valueschema"で定義すればこんなこともできる。当然型に辞書を使うのもOK.

リスト内の辞書

そういう冗長なデータ構造だとリスト内の辞書っていうパターンもあるよね。

import cerberus

schema = {
    'name': {
        'required': True,
        'type': 'string',
        'empty': False,
    },
    'skill': {
        'required': True,
        'type': 'list',
        'schema': {
            'type': 'dict',
            'schema': {
                'lang': {
                    'type': 'string',
                    'empty': False,
                },
                'level': {
                    'type': 'integer',
                }
            }

        },
    },
}

v = cerberus.Validator(schema)
print(v.validate({'name': 'pomkotsu', 'skill': [{'lang': 'c', 'level': 1}]}))
# True