Fake data at the speed of Rust.
A high-performance fake data generation library for Python, powered by Rust. Designed to be 50-100x faster than Faker for batch operations.
pip install forgerygit clone https://github.com/williajm/forgery.git
cd forgery
pip install maturin
maturin develop --releasefrom forgery import fake
# Generate 10,000 names in one fast call
names = fake.names(10_000)
# Single values work too
email = fake.email()
name = fake.name()
# Deterministic output with seeding
fake.seed(42)
data1 = fake.names(100)
fake.seed(42)
data2 = fake.names(100)
assert data1 == data2- Batch-first design: Generate thousands of values in a single call
- 50-100x faster than Faker for batch operations
- Multi-locale support: 7 locales with locale-specific data
- Deterministic seeding: Reproducible output for testing
- Type hints: Full type stub support for IDE autocompletion
- Familiar API: Method names match Faker for easy migration
forgery supports 7 locales with locale-specific names, addresses, phone numbers, and more:
| Locale | Language | Country |
|---|---|---|
en_US |
English | United States (default) |
en_GB |
English | United Kingdom |
de_DE |
German | Germany |
fr_FR |
French | France |
es_ES |
Spanish | Spain |
it_IT |
Italian | Italy |
ja_JP |
Japanese | Japan |
from forgery import Faker
# Default locale is en_US
fake = Faker()
fake.names(5) # American names
# Use a different locale
german = Faker("de_DE")
german.names(5) # German names
japanese = Faker("ja_JP")
japanese.addresses(3) # Japanese addresses with prefectureEach locale provides:
- Names: First names, last names, and full names in the local language
- Addresses: Cities, regions/states, postal codes in the correct format
- Phone numbers: Country-specific formats and country codes
- Companies: Local company names and job titles
- Colors: Color names in the local language
- SSN/National IDs: Country-specific formats (US SSN, UK NINO, DE Steuer-ID, etc.)
- License plates: Country-specific formats
from forgery import seed, names, emails, integers, uuids
seed(42) # Seed for reproducibility
# Batch generation (fast path)
names(1000) # list[str] of full names
emails(1000) # list[str] of email addresses
integers(1000, 0, 100) # list[int] in range
uuids(1000) # list[str] of UUIDv4
# Single values
name() # str
email() # str
integer(0, 100) # int
uuid() # strfrom forgery import Faker
# Each instance has its own RNG state
fake1 = Faker()
fake2 = Faker()
fake1.seed(42)
fake2.seed(99)
# Generate independently
fake1.names(100)
fake2.emails(100)| Batch | Single | Description |
|---|---|---|
names(n) |
name() |
Full names (first + last) |
first_names(n) |
first_name() |
First names |
last_names(n) |
last_name() |
Last names |
| Batch | Single | Description |
|---|---|---|
emails(n) |
email() |
Email addresses |
safe_emails(n) |
safe_email() |
Safe domain emails (@example.com, etc.) |
free_emails(n) |
free_email() |
Free provider emails (@gmail.com, etc.) |
phone_numbers(n) |
phone_number() |
Phone numbers in (XXX) XXX-XXXX format |
| Batch | Single | Description |
|---|---|---|
integers(n, min, max) |
integer(min, max) |
Random integers in range |
floats(n, min, max) |
float_(min, max) |
Random floats in range (Note: float_ avoids shadowing Python's float builtin) |
uuids(n) |
uuid() |
UUID v4 strings |
md5s(n) |
md5() |
Random 32-char hex strings (MD5-like format, not cryptographic hashes) |
sha256s(n) |
sha256() |
Random 64-char hex strings (SHA256-like format, not cryptographic hashes) |
| Batch | Single | Description |
|---|---|---|
dates(n, start, end) |
date(start, end) |
Random dates (YYYY-MM-DD) |
datetimes(n, start, end) |
datetime_(start, end) |
Random datetimes (ISO 8601). Note: datetime_ avoids shadowing Python's datetime module |
dates_of_birth(n, min_age, max_age) |
date_of_birth(min_age, max_age) |
Birth dates for given age range |
| Batch | Single | Description |
|---|---|---|
street_addresses(n) |
street_address() |
Street addresses (e.g., "123 Main Street") |
cities(n) |
city() |
City names |
states(n) |
state() |
State names |
countries(n) |
country() |
Country names |
zip_codes(n) |
zip_code() |
ZIP codes (5 or 9 digit) |
addresses(n) |
address() |
Full addresses |
| Batch | Single | Description |
|---|---|---|
companies(n) |
company() |
Company names |
jobs(n) |
job() |
Job titles |
catch_phrases(n) |
catch_phrase() |
Business catch phrases |
| Batch | Single | Description |
|---|---|---|
urls(n) |
url() |
URLs with https:// |
domain_names(n) |
domain_name() |
Domain names |
ipv4s(n) |
ipv4() |
IPv4 addresses |
ipv6s(n) |
ipv6() |
IPv6 addresses |
mac_addresses(n) |
mac_address() |
MAC addresses |
| Batch | Single | Description |
|---|---|---|
credit_cards(n) |
credit_card() |
Credit card numbers (valid Luhn) |
credit_card_providers(n) |
credit_card_provider() |
Card network name (Visa, Mastercard, Amex, Discover) |
credit_card_expires(n) |
credit_card_expire() |
Expiry date in MM/YY format |
credit_card_security_codes(n) |
credit_card_security_code() |
CVV: 3 digits (Visa/MC/Discover) or 4 digits (Amex) |
credit_card_fulls(n) |
credit_card_full() |
Complete card info dict (provider, number, expire, security_code, name) |
ibans(n) |
iban() |
IBAN numbers (valid checksum) |
bics(n) |
bic() |
BIC/SWIFT codes (8 or 11 characters) |
bank_accounts(n) |
bank_account() |
Bank account numbers (8-17 digits) |
bank_names(n) |
bank_name() |
Bank names (locale-specific) |
| Batch | Single | Description |
|---|---|---|
currency_codes(n) |
currency_code() |
ISO 4217 currency codes (e.g., "USD", "EUR") |
currency_names(n) |
currency_name() |
Currency names in English (e.g., "United States Dollar") |
currencies(n) |
currency() |
(code, name) tuples |
prices(n, min, max) |
price(min, max) |
Prices with 2 decimal places |
| Batch | Single | Description |
|---|---|---|
sort_codes(n) |
sort_code() |
UK sort codes (XX-XX-XX format) |
uk_account_numbers(n) |
uk_account_number() |
UK account numbers (exactly 8 digits) |
transaction_amounts(n, min, max) |
transaction_amount(min, max) |
Transaction amounts (2 decimal places) |
transactions(n, balance, start, end) |
- | Full transaction records with running balance |
| Batch | Single | Description |
|---|---|---|
passwords(n, ...) |
password(...) |
Random passwords with configurable character sets |
Password options:
length: Password length (default: 12)uppercase: Include uppercase letters (default: True)lowercase: Include lowercase letters (default: True)digits: Include digits (default: True)symbols: Include symbols (default: True)
| Batch | Single | Description |
|---|---|---|
sentences(n, word_count) |
sentence(word_count) |
Lorem ipsum sentences |
paragraphs(n, sentence_count) |
paragraph(sentence_count) |
Lorem ipsum paragraphs |
texts(n, min_chars, max_chars) |
text(min_chars, max_chars) |
Text blocks with length limits |
| Batch | Single | Description |
|---|---|---|
colors(n) |
color() |
Color names |
hex_colors(n) |
hex_color() |
Hex color codes (#RRGGBB) |
rgb_colors(n) |
rgb_color() |
RGB tuples (r, g, b) |
| Batch | Single | Description |
|---|---|---|
latitudes(n) |
latitude() |
Random latitude in [-90.0, 90.0] |
longitudes(n) |
longitude() |
Random longitude in [-180.0, 180.0] |
coordinates(n) |
coordinate() |
(latitude, longitude) tuples |
| Batch | Single | Description |
|---|---|---|
user_agents(n) |
user_agent() |
Random browser user agent string (any browser) |
chromes(n) |
chrome() |
Chrome user agent string |
firefoxes(n) |
firefox() |
Firefox user agent string |
safaris(n) |
safari() |
Safari user agent string |
| Batch | Single | Description |
|---|---|---|
booleans(n, probability) |
boolean(probability) |
Random booleans (default: 50% True) |
| Batch | Single | Description |
|---|---|---|
numerify_batch(pattern, n) |
numerify(pattern) |
Replace # with random digits (0-9) |
letterify_batch(pattern, n) |
letterify(pattern) |
Replace ? with random lowercase letters (a-z) |
bothify_batch(pattern, n) |
bothify(pattern) |
Replace # with digits and ? with lowercase letters |
lexify_batch(pattern, n) |
lexify(pattern) |
Replace ? with random uppercase letters (A-Z) |
from forgery import Faker
fake = Faker()
fake.numerify("###-###-####") # "847-321-9056"
fake.letterify("??-??") # "kx-bp"
fake.bothify("??-####") # "mz-7314"
fake.lexify("???-###") # "QWR-###" (only ? is replaced)| Batch | Single | Description |
|---|---|---|
ean13s(n) |
ean13() |
EAN-13 barcodes (valid check digit) |
ean8s(n) |
ean8() |
EAN-8 barcodes (valid check digit) |
upc_as(n) |
upc_a() |
UPC-A barcodes (valid check digit) |
upc_es(n) |
upc_e() |
UPC-E barcodes (valid check digit) |
| Batch | Single | Description |
|---|---|---|
isbn10s(n) |
isbn10() |
ISBN-10 with hyphens (valid check digit, may end in X) |
isbn13s(n) |
isbn13() |
ISBN-13 with hyphens (978/979 prefix, valid check digit) |
| Batch | Single | Description |
|---|---|---|
file_names(n) |
file_name() |
File names with extension (e.g., "report.pdf") |
file_extensions(n) |
file_extension() |
File extensions (e.g., "pdf", "csv") |
mime_types(n) |
mime_type() |
MIME types (e.g., "application/pdf") |
file_paths(n) |
file_path_() |
File paths (e.g., "/home/user/documents/report.pdf") |
| Batch | Single | Description |
|---|---|---|
product_names(n) |
product_name() |
Product names (e.g., "Ergonomic Steel Chair") |
product_categories(n) |
product_category() |
Product categories (e.g., "Electronics") |
departments(n) |
department() |
Store departments (e.g., "Home & Garden") |
product_materials(n) |
product_material() |
Product materials (e.g., "Cotton", "Steel") |
| Batch | Single | Description |
|---|---|---|
ssns(n) |
ssn() |
Locale-specific national ID numbers |
Formats by locale:
| Locale | Format | Example |
|---|---|---|
en_US |
SSN (XXX-XX-XXXX) | "123-45-6789" |
en_GB |
NI Number (XX 99 99 99 X) | "AB 12 34 56 C" |
de_DE |
Steuer-ID (11 digits) | "12345678901" |
fr_FR |
NSS (15 digits with check key) | "185076923400145" |
es_ES |
DNI (8 digits + letter) | "12345678Z" |
it_IT |
Codice Fiscale (16 alphanumeric) | "RSSMRA85M01H501Z" |
ja_JP |
My Number (12 digits with check) | "123456789012" |
| Batch | Single | Description |
|---|---|---|
license_plates(n) |
license_plate() |
Locale-specific license plates |
vehicle_makes(n) |
vehicle_make() |
Vehicle manufacturers (e.g., "Toyota") |
vehicle_models(n) |
vehicle_model() |
Vehicle models (e.g., "Camry") |
vehicle_years(n) |
vehicle_year() |
Model years (1990-2026) |
vins(n) |
vin() |
17-character VINs (valid check digit, no I/O/Q) |
License plate formats by locale:
| Locale | Format | Example |
|---|---|---|
en_US |
ABC-1234 | "KHX-4829" |
en_GB |
AB12 CDE | "LM65 NXR" |
de_DE |
X AB 1234 | "B KL 3847" |
fr_FR |
AB-123-CD | "FG-482-HJ" |
es_ES |
1234 ABC | "4829 FKH" |
it_IT |
AB 123 CD | "FG 482 HJ" |
ja_JP |
300 12-34 | "500 38-47" |
| Batch | Single | Description |
|---|---|---|
profiles(n) |
profile() |
Complete personal profiles (returns dict) |
Each profile dict contains: first_name, last_name, name, email, phone, address, city, state, zip_code, country, company, job, date_of_birth.
from forgery import Faker
fake = Faker()
fake.seed(42)
p = fake.profile()
# {"first_name": "Ryan", "last_name": "Grant", "name": "Ryan Grant",
# "email": "rgrant@example.com", "phone": "(555) 123-4567", ...}For batch methods that select from finite lists (names, cities, countries, etc.), you can request unique values:
from forgery import Faker
fake = Faker()
fake.seed(42)
# Generate 50 unique names (no duplicates)
unique_names = fake.names(50, unique=True)
assert len(unique_names) == len(set(unique_names))
# Generate 20 unique cities
unique_cities = fake.cities(20, unique=True)
# Generate 50 unique countries
unique_countries = fake.countries(50, unique=True)Important Notes:
- Unique generation will raise
ValueErrorif you request more unique values than are available in the underlying data set. - Performance: Unique generation uses O(n) memory (stores all outputs in a HashSet) and can be O(n × 100) time in worst case due to retry logic. For very large unique batches, consider whether duplicates are actually problematic for your use case.
Generate realistic bank transaction data with running balances:
from forgery import Faker
fake = Faker()
fake.seed(42)
# Generate 50 transactions from Jan to Mar 2024, starting with £1000 balance
txns = fake.transactions(50, 1000.0, "2024-01-01", "2024-03-31")
for txn in txns[:3]:
print(f"{txn['date']} | {txn['transaction_type']:15} | {txn['amount']:>10.2f} | {txn['balance']:>10.2f}")
# 2024-01-03 | Card Payment | -42.50 | 957.50
# 2024-01-05 | Direct Debit | -125.00 | 832.50
# 2024-01-08 | Faster Payment | 1250.00 | 2082.50Each transaction dict contains:
reference: 8-character alphanumeric referencedate: Transaction date (YYYY-MM-DD)amount: Transaction amount (negative for debits)transaction_type: e.g., "Card Payment", "Direct Debit", "Salary"description: Merchant or payee namebalance: Running balance after transaction
Generate entire datasets with a single call using schema definitions:
Returns a list of dictionaries:
from forgery import records, seed
seed(42)
data = records(1000, {
"id": "uuid",
"name": "name",
"email": "email",
"age": ("int", 18, 65),
"salary": ("float", 30000.0, 150000.0),
"hire_date": ("date", "2020-01-01", "2024-12-31"),
"bio": ("text", 50, 200),
"status": ("choice", ["active", "inactive", "pending"]),
})
# data[0] = {"id": "88917925-...", "name": "Austin Bell", "age": 50, ...}Returns a list of tuples (faster, values in alphabetical key order):
from forgery import records_tuples, seed
seed(42)
data = records_tuples(1000, {
"age": ("int", 18, 65),
"name": "name",
})
# data[0] = (50, "Ryan Grant") # (age, name) - alphabetical orderReturns a PyArrow RecordBatch for high-performance data processing:
import pyarrow as pa
from forgery import records_arrow, seed
seed(42)
batch = records_arrow(100_000, {
"id": "uuid",
"name": "name",
"age": ("int", 18, 65),
"salary": ("float", 30000.0, 150000.0),
})
# batch is a pyarrow.RecordBatch
print(batch.num_rows) # 100000
print(batch.num_columns) # 4
print(batch.schema)
# age: int64 not null
# id: string not null
# name: string not null
# salary: double not null
# Convert to pandas DataFrame
df = batch.to_pandas()
# Or to Polars DataFrame
import polars as pl
df_polars = pl.from_arrow(batch)Note: Requires pyarrow to be installed: pip install pyarrow
The records_arrow() function generates data in columnar format, which is more efficient
for large batches and integrates seamlessly with the Arrow ecosystem (PyArrow, Polars,
pandas, DuckDB, etc.).
| Type | Syntax | Example |
|---|---|---|
| Simple types | "type_name" |
"name", "email", "uuid", "int", "float" |
| Integer range | ("int", min, max) |
("int", 18, 65) |
| Float range | ("float", min, max) |
("float", 0.0, 100.0) |
| Text with limits | ("text", min_chars, max_chars) |
("text", 50, 200) |
| Date range | ("date", start, end) |
("date", "2020-01-01", "2024-12-31") |
| Choice | ("choice", [options]) |
("choice", ["a", "b", "c"]) |
All simple types from the generators above are supported: name, first_name, last_name, email, safe_email, free_email, phone, uuid, int, float, date, datetime, street_address, city, state, country, zip_code, address, company, job, catch_phrase, url, domain_name, ipv4, ipv6, mac_address, credit_card, iban, sentence, paragraph, text, color, hex_color, rgb_color, md5, sha256, latitude, longitude, coordinate, boolean, ssn, file_name, file_extension, mime_type, file_path, license_plate, vehicle_make, vehicle_model, vehicle_year, vin, ean13, ean8, upc_a, upc_e, isbn10, isbn13, product_name, product_category, department, product_material.
For large datasets (millions of records), async methods prevent blocking the Python event loop:
import asyncio
from forgery import records_async, seed
async def main():
seed(42)
records = await records_async(1_000_000, {
"id": "uuid",
"name": "name",
"email": "email",
})
print(f"Generated {len(records)} records")
asyncio.run(main())import asyncio
from forgery import records_tuples_async, seed
async def main():
seed(42)
records = await records_tuples_async(1_000_000, {
"age": ("int", 18, 65),
"name": "name",
})
return records
asyncio.run(main())import asyncio
from forgery import records_arrow_async, seed
async def main():
seed(42)
batch = await records_arrow_async(1_000_000, {
"id": "uuid",
"name": "name",
"salary": ("float", 30000.0, 150000.0),
})
return batch.to_pandas()
asyncio.run(main())All async methods accept an optional chunk_size parameter (default: 10,000) that controls how frequently control is yielded to the event loop. Smaller chunks yield more frequently but have slightly higher overhead.
Note: Async methods use a snapshot of the RNG state at call time. The main Faker instance's RNG is not advanced, so calling the same async method twice with the same seed produces identical results. For unique results across multiple async calls, use different seeds or different Faker instances.
Arrow async chunking caveat: For records_arrow_async(), when n > chunk_size, the output differs from records_arrow() due to column-major RNG consumption within each chunk. If you need identical results to the sync version, set chunk_size >= n. The records_async() and records_tuples_async() methods always match their sync counterparts regardless of chunk size.
Register your own data providers for domain-specific generation:
from forgery import Faker
fake = Faker()
# Register a uniform (equal probability) provider
fake.add_provider("team", ["Engineering", "Sales", "HR", "Marketing"])
# Generate values
team = fake.generate("team")
teams = fake.generate_batch("team", 100)# Register a weighted provider (higher weights = more likely)
fake.add_weighted_provider("status", [
("active", 80), # 80% probability
("inactive", 20), # 20% probability
])
# Generate with weighted distribution
statuses = fake.generate_batch("status", 1000)
# Expect ~800 "active", ~200 "inactive"Custom providers integrate seamlessly with records():
from forgery import Faker
fake = Faker()
fake.add_provider("team", ["Eng", "Sales", "HR"])
fake.add_weighted_provider("priority", [("high", 20), ("medium", 50), ("low", 30)])
data = fake.records(1000, {
"id": "uuid",
"name": "name",
"team": "team", # Custom provider
"priority": "priority", # Weighted custom provider
})fake.has_provider("team") # Check if provider exists
fake.list_providers() # List all custom provider names
fake.remove_provider("team") # Remove a providerfrom forgery import add_provider, generate, generate_batch, seed
seed(42)
add_provider("tier", ["gold", "silver", "bronze"])
tier = generate("tier")
tiers = generate_batch("tier", 100)Note: Custom provider names cannot conflict with built-in types (e.g., "name", "email", "uuid").
Benchmark generating 100,000 items:
Names:
forgery.names(): 0.015s
Faker.name(): 1.523s
Speedup: 101x
Emails:
forgery.emails(): 0.021s
Faker.email(): 2.134s
Speedup: 101x
Benchmark generating 1,000,000 items:
Names:
forgery.names(): 0.108s
Faker.name(): 47.111s
Speedup: 436x
Emails:
forgery.emails(): 0.167s
Faker.email(): 46.984s
Speedup: 281x
seed(n)affects the defaultfakeinstance only- Each
Fakerinstance has its own independent RNG state - Single-threaded determinism only: Results are reproducible within one thread
- No cross-version guarantee: Output may differ between forgery versions
forgery is NOT thread-safe. Each Faker instance maintains mutable RNG state.
For multi-threaded applications, create one Faker instance per thread:
from concurrent.futures import ThreadPoolExecutor
from forgery import Faker
def generate_names(seed: int) -> list[str]:
fake = Faker() # Create per-thread instance
fake.seed(seed)
return fake.names(1000)
with ThreadPoolExecutor(max_workers=4) as executor:
results = list(executor.map(generate_names, range(4)))Do NOT share a Faker instance across threads.
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Install maturin
pip install maturin
# Build and install locally
maturin develop --release
# Run tests
cargo test # Rust tests
pytest # Python tests
# Run benchmarks
python tests/benchmarks/bench_vs_faker.pyMIT