Learns schemas at runtime from sample JSON, strips keys entirely, encodes values optimally. No .proto files, no codegen, zero config.
Compression ratio: % of original JSON size (lower is better). Verified on identical data, single-threaded.
| Dataset | brotli | zstd | Proto+zstd | Pidgin |
|---|---|---|---|---|
| Users x1000 | 7.78 | 3.67 | 5.76 | 6.41 |
| Orders x500 | 10.52 | 6.41 | 6.97 | 7.96 |
| Events x5000 | 30.49 | 14.85 | 24.92 | 25.62 |
Best compression across all datasets. Speed competitive with Proto+zstd, faster than brotli on all except Users.
Two layers, each useful independently.
Runtime-learned binary compression. Feed it sample JSON, it infers the schema -- field names, types, enum domains, nesting. Then it strips keys entirely and encodes values with type-optimal encoders. C extension for native speed.
Forward-secrecy encryption for persistent channels (WebSocket, SSE, IoT). Keys ratchet after every message -- compromising a future key cannot decrypt past messages. Install with pip install pidgin[crypto].
from pidgin import SchemaCodec
# Learn schema from sample records
codec = SchemaCodec.learn(sample_records)
# Compress: ~10-19% of original JSON
compressed = codec.compress(records)
# Decompress: lossless round-trip
original = codec.decompress(compressed)
# v1 schema
codec_v1 = SchemaCodec.learn(users_v1)
# Data gains new fields, types change
profile_v2 = codec_v1.profile.evolve(users_v2)
codec_v2 = SchemaCodec(profile_v2)
# v2 handles both old and new data shapes
codec_v2.compress(old_data) # missing new fields → absent marker
codec_v2.compress(new_data) # new fields encoded, removed fields skipped
Two-stage pipeline. Each stage is independent and composable.
Generic compressors (gzip, brotli, zstd) treat data as opaque byte streams. They find repeated byte patterns but cannot exploit structural knowledge. In a JSON array of 1000 objects, the key "email" appears 1000 times -- generic compressors reduce this to a few bytes via backreferences, but Pidgin eliminates it entirely. It knows the schema, so the key is encoded once in the profile header, and each record contains only values in a fixed order. Combined with type-specific encoding (varints for integers, enum indices for categorical strings, nested sub-profiles for objects), this produces consistently smaller output.
Complete public interface. All classes importable from pidgin.
| Class / Method | Description |
|---|---|
SchemaCodec.learn(samples) | Learn schema from list of dicts, return codec |
codec.compress(data) | Compress dict or list[dict] to binary bytes |
codec.decompress(data) | Decompress binary back to dict or list[dict] |
codec.profile | Access learned SchemaProfile |
profile.evolve(new_samples) | Evolve schema -- backward + forward compatible |
profile.diff(other) | Show changes between profile versions |
profile.to_json() / from_json() | Serialize / deserialize profile for sharing |
RatchetCipher(shared_secret) | Init encryption with shared secret |
cipher.encrypt(data) | Encrypt bytes + ratchet key forward |
cipher.decrypt(data) | Decrypt bytes + ratchet key forward |
SecureChannel.create(name) | E2E encrypted channel (X25519 + SchemaCodec + Ratchet) |
One line to enable. Zero backend changes.
| Server | Config | Integration |
|---|---|---|
| Kong | pidgin = true | Lua FFI plugin |
| nginx | pidgin on; | Dynamic C module |
| Apache | PidginEnable On | Output filter |
| Caddy | pidgin | Go cgo middleware |
| Traefik | middleware config | Pure Go plugin |
| HAProxy | SPOE filter | External agent |
| FastAPI | add_middleware(PidginMiddleware) | Python |
| Django | MIDDLEWARE = [...] | Python |
All modules use libpidgin (portable C library). Auto-learns schemas, auto-compresses, auto-evolves on API changes. Profiles served at /.well-known/pidgin/.
API changes handled automatically. Zero downtime.
| API Change | What Happens |
|---|---|
| New field added | JSON fallback, then auto-evolve incorporates it as typed field |
| Field removed | ABSENT marker (1 byte), old clients unaffected |
| Field returns | Already in schema as nullable, encodes typed immediately |
| New enum value | Appended to list (old indices preserved) |
| Type widened (int to float) | Auto-widened safely |
| Schema drift detected | Auto-evolve, profile version bumped, new ETag |
Old clients with v1 profiles decode v2 data (unknown fields in JSON fallback). New clients with v2 profiles decode v1 data (missing fields as absent). Bidirectional compatibility guaranteed.