Page title
Section title
Generating Python from JSON Typedef schemas
JSON Type Definition, aka RFC 8927, is an easy-to-learn, standardized way to define a schema for JSON data. You can use JSON Typedef to portably validate data across programming languages, create dummy data, generate code, and more.
This article is about how you can use JSON Typedef to generate Python code from
schemas. If you’re interested in generating code in other languages, see this
article on jtd-codegen
. The rest of this article focuses
on using jtd-codegen
with Python in particular.
Generating Python with jtd-codegen
As a prerequisite, you need to first install jtd-codegen
. Installation
instructions are available here.
You can generate Python with jtd-codegen
using the --python-out
option,
whose value must be a directory that jtd-codegen
can generate code into.
For example, if you have this schema in schemas/user.jtd.json
:
{
"properties": {
"id": { "type": "string" },
"createdAt": { "type": "timestamp" },
"karma": { "type": "int32" },
"isAdmin": { "type": "boolean" }
}
}
Then you can generate Python code into the src/user
directory by running:
jtd-codegen schemas/user.jtd.json --python-out src/user
Which will output something like:
📝 Writing Python code to: src/user
📦 Generated Python code.
📦 Root schema converted into type: User
And you should see code along these lines in src/user/__init.py__
:
from dataclasses import dataclass
from typing import Any, Optional, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class User:
created_at: 'str'
id: 'str'
is_admin: 'bool'
karma: 'int'
@classmethod
def from_json(cls, data) -> 'User':
return cls(
_from_json(str, data.get("createdAt")),
_from_json(str, data.get("id")),
_from_json(bool, data.get("isAdmin")),
_from_json(int, data.get("karma")),
)
def to_json(self):
data = {}
data["createdAt"] = _to_json(self.created_at)
data["id"] = _to_json(self.id)
data["isAdmin"] = _to_json(self.is_admin)
data["karma"] = _to_json(self.karma)
return data
Note: at the time of writing, generated code is not always formatted in a
pretty way. If you require pretty-formatted code, it’s recommended that you use
a code formatter on jtd-codegen
-generated code.
Using generated Python code
jtd-codegen
will always output code into a __init__.py
inside the directory
you specify with --python-out
. In the previous example, we outputted code into
src/user
, so we can import it like so:
from .user import User
The generated Python code does not presume a particular JSON implementation. You
can use jtd-codegen
-generated types with the standard library’s
json
module, or you can use
alternatives like orjson
or
simplejson
.
The generated code is JSON-library independent because every generated class or
enum has a from_json
and to_json
method. from_json
takes in already-parsed
JSON data, such as data returned from json.loads
, and returns an instance of
the type. to_json
does the reverse, constructing data that you can pass to
json.dumps
.
Do not directly pass instances of jtd-codegen
-generated types to JSON
libraries. You will get the wrong JSON result if you do this. Always pass
through from_json
or to_json
first.
Here’s an example of you’d use the User
class we imported above:
import json
# To read in JSON, assuming you know the JSON is valid, do something like:
input_json = '...'
user = User.from_json(json.loads(input_json))
# To write out JSON, do something like:
output_json = json.dumps(user.to_json())
In the example above, we directly use from_json
on parse JSON data. This will
only work correctly if the JSON input is valid. Otherwise, you may encounter
runtime exceptions, and the types of the class’s attributes may be different
from those indicated on their type annotations. If you do run into runtime
exceptions, these exceptions will be Python-specific and low-level.
To prevent this from happening, you should first validate the JSON input using a
JTD validation implementation, such as the jtd
package. What you would do
is:
- Parse the input using your preferred JSON backend.
- Validate that the parsed JSON is valid against the schema you generated your types from, using a JTD validation implementation. If there are validation errors, you can return those, because JTD validation errors are standardized and platform-independent.
- If the input is valid, then construct an instance of your generated type
using its
from_json
method. Pass in the JSON you just validated tofrom_json
.
This solution lets you produce portable validation errors and lets you be more deliberate about what inputs you do and don’t accept.
Customizing Python output
Python code generation supports the following metadata properties shared across
all languages supported by jtd-codegen
:
-
description
customizes the documentation comment to put on a type or property in generated code. For example, this schema:{ "metadata": { "description": "A user in our system" }, "properties": { "name": { "metadata": { "description": "The user's name" }, "type": "string" }, "isAdmin": { "metadata": { "description": "Whether the user is an admin" }, "type": "boolean" } } }
Generates into:
@dataclass class User: """ A user in our system """ is_admin: 'bool' """ Whether the user is an admin """ name: 'str' """ The user's name """ @classmethod def from_json(cls, data) -> 'User': return cls( _from_json(bool, data.get("isAdmin")), _from_json(str, data.get("name")), ) def to_json(self): data = {} data["isAdmin"] = _to_json(self.is_admin) data["name"] = _to_json(self.name) return data
-
enumDescription
is likedescription
, but for the members of anenum
. The keys ofenumDescription
should correspond to the values in the schema’senum
, and the values should be descriptions for those values. For example, this schema:{ "metadata": { "enumDescription": { "PENDING": "The job is waiting to be processed.", "IN_PROGRESS": "The job is being processed.", "DONE": "The job has been processed." } }, "enum": ["PENDING", "IN_PROGRESS", "DONE"] }
Generates into:
class Status(Enum): DONE = "DONE" """ The job has been processed. """ IN_PROGRESS = "IN_PROGRESS" """ The job is being processed. """ PENDING = "PENDING" """ The job is waiting to be processed. """ @classmethod def from_json(cls, data) -> 'Status': return cls(data) def to_json(self): return self.value
Additionally, Python code generation supports the following Python-specific options:
-
pythonType
overrides the type thatjtd-codegen
should generate.jtd-codegen
will not generate any code for schemas withpythonType
, and instead use the value ofpythonType
as-is.It is your responsibility to ensure that the value of
pythonType
is valid code.jtd-codegen
will not attempt to validate its value.For example, this schema:
{ "properties": { "name": { "type": "string" }, "isAdmin": { "metadata": { "pythonType": "MyCustomType" }, "type": "boolean" } } }
Generates into:
@dataclass class OverrideDemo: is_admin: 'bool' name: 'str' @classmethod def from_json(cls, data) -> 'OverrideDemo': return cls( _from_json(bool, data.get("isAdmin")), _from_json(str, data.get("name")), ) def to_json(self): data = {} data["isAdmin"] = _to_json(self.is_admin) data["name"] = _to_json(self.name) return data
Generated Python code
This section details the sort of Python code that jtd-codegen
will generate.
Code generated from “Empty” schemas
“Empty” schemas will be converted into a
Python typing.Any
:
{}
Generates into:
from dataclasses import dataclass
from typing import Any, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Empty:
value: 'Any'
@classmethod
def from_json(cls, data) -> 'Empty':
return cls(_from_json(Any, data))
def to_json(self):
return _to_json(self.value)
Note: jtd-codegen
had to generate a custom type alias here, which is why the
code has a bit of extra stuff. If you use “empty”, “type”, “ref”, “elements”, or
“values” schemas at the top level of a schema, jtd-codegen
has to emit type
aliases in Python. In real-world schemas, this doesn’t happen very often.
Code generated from “Ref” schemas
“Ref” schemas will be converted into a reference to the definition being referred to:
{
"definitions": {
"example": { "type": "string" }
},
"ref": "example"
}
Generates into:
from dataclasses import dataclass
from typing import Any, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Ref:
value: 'Example'
@classmethod
def from_json(cls, data) -> 'Ref':
return cls(_from_json(Example, data))
def to_json(self):
return _to_json(self.value)
@dataclass
class Example:
value: 'str'
@classmethod
def from_json(cls, data) -> 'Example':
return cls(_from_json(str, data))
def to_json(self):
return _to_json(self.value)
Note: jtd-codegen
had to generate a custom type alias here, which is why the
code has a bit of extra stuff. If you use “empty”, “type”, “ref”, “elements”, or
“values” schemas at the top level of a schema, jtd-codegen
has to emit type
aliases in Python. In real-world schemas, this doesn’t happen very often.
Code generated from “Type” schemas
“Type” schemas will be converted into the following types:
JSON Typedef type | Python type |
---|---|
boolean |
bool |
string |
str |
timestamp |
str |
float32 |
float |
float64 |
float |
int8 |
int |
uint8 |
int |
int16 |
int |
uint16 |
int |
int32 |
int |
uint32 |
int |
For example,
{
"properties": {
"boolean": { "type": "boolean" },
"string": { "type": "string" },
"timestamp": { "type": "timestamp" },
"float32": { "type": "float32" },
"float64": { "type": "float64" },
"int8": { "type": "int8" },
"uint8": { "type": "uint8" },
"int16": { "type": "int16" },
"uint16": { "type": "uint16" },
"int32": { "type": "int32" },
"uint32": { "type": "uint32" }
}
}
Generates into:
from dataclasses import dataclass
from typing import Any, Optional, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Type:
boolean: 'bool'
float32: 'float'
float64: 'float'
int16: 'int'
int32: 'int'
int8: 'int'
string: 'str'
timestamp: 'str'
uint16: 'int'
uint32: 'int'
uint8: 'int'
@classmethod
def from_json(cls, data) -> 'Type':
return cls(
_from_json(bool, data.get("boolean")),
_from_json(float, data.get("float32")),
_from_json(float, data.get("float64")),
_from_json(int, data.get("int16")),
_from_json(int, data.get("int32")),
_from_json(int, data.get("int8")),
_from_json(str, data.get("string")),
_from_json(str, data.get("timestamp")),
_from_json(int, data.get("uint16")),
_from_json(int, data.get("uint32")),
_from_json(int, data.get("uint8")),
)
def to_json(self):
data = {}
data["boolean"] = _to_json(self.boolean)
data["float32"] = _to_json(self.float32)
data["float64"] = _to_json(self.float64)
data["int16"] = _to_json(self.int16)
data["int32"] = _to_json(self.int32)
data["int8"] = _to_json(self.int8)
data["string"] = _to_json(self.string)
data["timestamp"] = _to_json(self.timestamp)
data["uint16"] = _to_json(self.uint16)
data["uint32"] = _to_json(self.uint32)
data["uint8"] = _to_json(self.uint8)
return data
Code generated from “Enum” schemas
“Enum” schemas will be converted into a
Python class that extends Enum
:
{
"enum": ["PENDING", "IN_PROGRESS", "DONE"]
}
Generates into:
from enum import Enum
from typing import Any, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
class Enum(Enum):
DONE = "DONE"
IN_PROGRESS = "IN_PROGRESS"
PENDING = "PENDING"
@classmethod
def from_json(cls, data) -> 'Enum':
return cls(data)
def to_json(self):
return self.value
Code generated from “Elements” schemas
“Elements” schemas will be converted
into a Python typing.List[T]
, where T
is the type of the elements of the
array:
{
"elements": {
"type": "string"
}
}
Generates into:
from dataclasses import dataclass
from typing import Any, List, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Elements:
value: 'List[str]'
@classmethod
def from_json(cls, data) -> 'Elements':
return cls(_from_json(List[str], data))
def to_json(self):
return _to_json(self.value)
Note: jtd-codegen
had to generate a custom type alias here, which is why the
code has a bit of extra stuff. If you use “empty”, “type”, “ref”, “elements”, or
“values” schemas at the top level of a schema, jtd-codegen
has to emit type
aliases in Python. In real-world schemas, this doesn’t happen very often.
Code generated from “Properties” schemas
“Properties” schemas will be
converted into a Python @dataclass
. Optional properties will be marked with
typing.Optional
, and will be omitted from JSON if set to None
.
{
"properties": {
"name": { "type": "string" },
"isAdmin": { "type": "boolean" }
},
"optionalProperties": {
"middleName": { "type": "string" }
},
"additionalProperties": true
}
Generates into:
from dataclasses import dataclass
from typing import Any, Optional, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Properties:
is_admin: 'bool'
name: 'str'
middle_name: 'Optional[str]'
@classmethod
def from_json(cls, data) -> 'Properties':
return cls(
_from_json(bool, data.get("isAdmin")),
_from_json(str, data.get("name")),
_from_json(Optional[str], data.get("middleName")),
)
def to_json(self):
data = {}
data["isAdmin"] = _to_json(self.is_admin)
data["name"] = _to_json(self.name)
if self.middle_name is not None:
data["middleName"] = _to_json(self.middle_name)
return data
Code generated from “Values” schemas
“Values” schemas will be converted into
a Python typing.Dict[str, T]
, where T
is the type of the values of the
object:
{
"values": {
"type": "string"
}
}
Generates into:
from dataclasses import dataclass
from typing import Any, Dict, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Values:
value: 'Dict[str, str]'
@classmethod
def from_json(cls, data) -> 'Values':
return cls(_from_json(Dict[str, str], data))
def to_json(self):
return _to_json(self.value)
Note: jtd-codegen
had to generate a custom type alias here, which is why the
code has a bit of extra stuff. If you use “empty”, “type”, “ref”, “elements”, or
“values” schemas at the top level of a schema, jtd-codegen
has to emit type
aliases in Python. In real-world schemas, this doesn’t happen very often.
Code generated from “Discriminator” schemas
“Discriminator” schemas will be
converted into a Python @dataclass
, and each mapping will be a Python
@dataclass
that extends the discriminator:
{
"discriminator": "eventType",
"mapping": {
"USER_CREATED": {
"properties": {
"id": { "type": "string" }
}
},
"USER_PAYMENT_PLAN_CHANGED": {
"properties": {
"id": { "type": "string" },
"plan": { "enum": ["FREE", "PAID"] }
}
},
"USER_DELETED": {
"properties": {
"id": { "type": "string" },
"softDelete": { "type": "boolean" }
}
}
}
}
Generates into:
from dataclasses import dataclass
from enum import Enum
from typing import Any, Union, get_args, get_origin
def _from_json(cls, data):
if data is None or cls in [bool, int, float, str, object] or cls is Any:
return data
if get_origin(cls) is Union:
return _from_json(get_args(cls)[0], data)
if get_origin(cls) is list:
return [_from_json(get_args(cls)[0], d) for d in data]
if get_origin(cls) is dict:
return { k: _from_json(get_args(cls)[1], v) for k, v in data.items() }
return cls.from_json(data)
def _to_json(data):
if data is None or type(data) in [bool, int, float, str, object]:
return data
if type(data) is list:
return [_to_json(d) for d in data]
if type(data) is dict:
return { k: _to_json(v) for k, v in data.items() }
return data.to_json()
@dataclass
class Discriminator:
event_type: 'str'
@classmethod
def from_json(cls, data) -> 'Discriminator':
return {
"USER_CREATED": DiscriminatorUserCreated,
"USER_DELETED": DiscriminatorUserDeleted,
"USER_PAYMENT_PLAN_CHANGED": DiscriminatorUserPaymentPlanChanged,
}[data["eventType"]].from_json(data)
def to_json(self):
pass
@dataclass
class DiscriminatorUserCreated(Discriminator):
id: 'str'
@classmethod
def from_json(cls, data) -> 'DiscriminatorUserCreated':
return cls(
"USER_CREATED",
_from_json(str, data.get("id")),
)
def to_json(self):
data = { "eventType": "USER_CREATED" }
data["id"] = _to_json(self.id)
return data
@dataclass
class DiscriminatorUserDeleted(Discriminator):
id: 'str'
soft_delete: 'bool'
@classmethod
def from_json(cls, data) -> 'DiscriminatorUserDeleted':
return cls(
"USER_DELETED",
_from_json(str, data.get("id")),
_from_json(bool, data.get("softDelete")),
)
def to_json(self):
data = { "eventType": "USER_DELETED" }
data["id"] = _to_json(self.id)
data["softDelete"] = _to_json(self.soft_delete)
return data
class DiscriminatorUserPaymentPlanChangedPlan(Enum):
FREE = "FREE"
PAID = "PAID"
@classmethod
def from_json(cls, data) -> 'DiscriminatorUserPaymentPlanChangedPlan':
return cls(data)
def to_json(self):
return self.value
@dataclass
class DiscriminatorUserPaymentPlanChanged(Discriminator):
id: 'str'
plan: 'DiscriminatorUserPaymentPlanChangedPlan'
@classmethod
def from_json(cls, data) -> 'DiscriminatorUserPaymentPlanChanged':
return cls(
"USER_PAYMENT_PLAN_CHANGED",
_from_json(str, data.get("id")),
_from_json(DiscriminatorUserPaymentPlanChangedPlan, data.get("plan")),
)
def to_json(self):
data = { "eventType": "USER_PAYMENT_PLAN_CHANGED" }
data["id"] = _to_json(self.id)
data["plan"] = _to_json(self.plan)
return data
Section title
-
-
-
-
Tooling
-
Advanced Concepts
-
Language-Specific Documentation