Page title
Section title
Generating Example Data from Schemas with jtd-fuzz
JSON Type Definition, aka RFC 8927, is an easy-to-learn, standardized way to define a schema for JSON data. You can use JSON Typedef to portably validate data across programming languages, create dummy data, generate code, and more.
jtd-fuzz
is a tool that can generate example data from a JSON Typedef schema.
This is also called “fuzzing”. It lives on GitHub
here.
This article will go over why jtd-fuzz
may be useful to you, how to install
it, and then go through an example of using jtd-fuzz
on a few schemas.
Why fuzzing schemas is useful
Generating example data from schemas is useful because you can use it to do fuzz testing, load testing, or to seed your system with interesting data.
-
Fuzz testing is when you run your system against randomized but valid data to try to find corner cases where your system crashes, hangs, or otherwise misbehaves. You can use
jtd-fuzz
to generate this randomized valid data. -
Load testing is when you send your system a lot of requests, and then see how it behaves when under stress. Oftentimes, load tests produce more interesting results when the requests are randomized, that way you exercize more branches in your code.
jtd-fuzz
can help generate these randomized requests. -
Seeding is when you initialize a system, usually a database, with some example data. This is useful because it lets you see how your APIs or UIs look when they have data inside of them, rather than working from a blank slate. Oftentimes, development or staging environment databases are filled with seed data.
jtd-fuzz
can help you generate seed data.
Installing jtd-fuzz
If you’re on macOS, the easiest way to install jtd-fuzz
is via Homebrew:
brew install jsontypedef/jsontypedef/jtd-fuzz
For all other platforms, you can install a prebuilt binary from the latest
release of
jtd-fuzz
.
Supported platforms are:
- Windows (x86_64-pc-windows-gnu.zip)
- macOS (x86_64-apple-darwin.zip)
- Linux (x86_64-unknown-linux-gnu.zip)
Finally, you can also install jtd-fuzz
from source. See the jtd-fuzz
repo for more information if
you go this route.
Using jtd-fuzz
You can always run jtd-fuzz --help
to get details, but at a high level here’s
how you usually use jtd-fuzz
:
-
First, you need a schema to generate examples from.
-
Then, you need to decide how many examples you want to generate. By default,
jtd-fuzz
will generate an infinite stream of examples, but you can override this by passing-n XXX
tojtd-fuzz
, whereXXX
is the number of examples you want. -
Then, you invoke
jtd-fuzz
on your schema. -
Optionally, you can use
fuzzHint
to generate specific sorts of fake data, if that makes sense for your use-case. -
Optionally, if you want to be able generate this exact dataset again later, you can use the
-s
flag to generate consistent results.
For example, if you want to generate five examples from this schema, which you
placed inside user.jtd.json
:
{
"properties": {
"username": {
"type": "string"
},
"bio": {
"type": "string"
}
}
}
Then you would run:
jtd-fuzz user.jtd.json -n 5
Which would output something like:
{"bio":"c0FW!","username":"L{0'%"}
{"bio":"2bf*s|%","username":"R q"}
{"bio":"2i","username":"z;L"}
{"bio":"","username":""}
{"bio":"(","username":""}
Using fuzzHint
In the example above, the username
and bio
properties were short, randomized
strings. That might not be the most useful behavior. Oftentimes, it’s more
convenient if fuzzed data uses realistic emails, names, or lorem
ipsum text.
jtd-fuzz
can generate this sort of data using the fuzzHint
metadata
property. Here’s an example of how
you can adapt the previous example to use fuzzHint
:
{
"properties": {
"username": {
"metadata": {
"fuzzHint": "en_us/internet/username"
},
"type": "string"
},
"bio": {
"metadata": {
"fuzzHint": "lorem/sentence"
},
"type": "string"
}
}
}
If you run jtd-fuzz
on that schema, you’ll get output like:
{"bio":"Error ut voluptatem nihil sunt laboriosam ut.","username":"obechtelar"}
{"bio":"Perspiciatis nostrum quidem laudantium esse dignissimos possimus.","username":"wrunolfsdottir"}
{"bio":"Vitae error ratione optio et.","username":"tcarroll76"}
{"bio":"Assumenda quaerat et.","username":"ogleichner"}
{"bio":"Officia vel aspernatur.","username":"rbeier"}
fuzzHint
will only work on schemas of {"type": "string"}
. Here are some
commonly-used values for fuzzHint
:
en_us/company/company_name
generates strings likeHayes, Murray, and Kiehn
en_us/internet/email
generates strings likealainatorphy@johnson.com
en_us/names/full_name
generates strings likeAlexa Wisozk
A full list of possible values for fuzzHint
is available
here.
Remember that metadata properties don’t affect
validation. What you put in
fuzzHint
will have no effect on what validators will consider valid input or
not.
Generating consistent results
By default, jtd-fuzz
will generate different output on every invocation. If
you run jtd-fuzz
on the same schema twice, you’ll get back two different sets
of results:
echo '{}' | jtd-fuzz -n 1
echo '{}' | jtd-fuzz -n 1
{"[jD|6W":null}
null
If you’d like to get consistent output from jtd-fuzz
, or be able to reproduce
its output, you can use the -s
option to provide a seed to its internal
pseudo-random number generator. For the same seed and schema, jtd-fuzz
will
output the same data every time:
echo '{}' | jtd-fuzz -n 1 -s 123
echo '{}' | jtd-fuzz -n 1 -s 123
48
48
The -s
option takes an integer between 0 and 2^64 - 1.
Seeding jtd-fuzz
can be useful if you’re using jtd-fuzz
to do automated
testing against a system. Your automated testing system can pass jtd-fuzz
a
randomly-generated seed, and if the automated tester finds a seed that reveals a
bug, it can output the seed it used. That way, developers can re-use that seed,
and try to reproduce the issue locally.
Note that jtd-fuzz
is only guaranteed to produce consistent output if you use
the same seed, schema, and version of jtd-fuzz
. Different versions on
jtd-fuzz
may output different results, even if you give them the same seed and
schema.
Section title
-
-
-
-
Tooling
-
Advanced Concepts
-
Language-Specific Documentation