Collections
Zatabase Collections provide a schema-flexible NoSQL document model layered on top of the relational engine. You can ingest raw JSON documents, define declarative projections to transform and type fields, and then query the projected data using standard SQL.
Creating a Collection
Section titled “Creating a Collection”Create a collection with an optional projection that maps JSON fields to typed columns:
curl -s -X POST https://your-project.zatabase.io/v1/collections \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "name": "events", "projection": { "event_type": "lowercase", "user_id": "text", "timestamp": "int", "amount": "float", "metadata": "text" } }' | jqProjection Transforms
Section titled “Projection Transforms”Projections define how raw JSON field values are transformed and stored in the underlying table. Each field maps to a transform type:
| Transform | Description | JSON Input | Stored Value |
|---|---|---|---|
text | Store as-is (string) | "Hello" | "Hello" |
lowercase | Lowercase the string | "HELLO" | "hello" |
int | Parse as integer | "42" or 42 | 42 |
float | Parse as float | "3.14" or 3.14 | 3.14 |
Fields present in the JSON but not in the projection are ignored. Fields in the projection but missing from a document are stored as NULL.
Managing Projections
Section titled “Managing Projections”Get a collection’s current projection:
curl -s https://your-project.zatabase.io/v1/collections/events/projection \ -H "Authorization: Bearer $ZATABASE_TOKEN" | jqUpdate a projection (existing data is not retroactively transformed):
curl -s -X PUT https://your-project.zatabase.io/v1/collections/events/projection \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "event_type": "lowercase", "user_id": "text", "timestamp": "int", "amount": "float", "source": "lowercase" }' | jqListing Collections
Section titled “Listing Collections”curl -s https://your-project.zatabase.io/v1/collections \ -H "Authorization: Bearer $ZATABASE_TOKEN" | jqIngesting Data
Section titled “Ingesting Data”The ingest endpoint supports three formats, auto-detected by Content-Type header. All formats support optional gzip compression (add Content-Encoding: gzip).
NDJSON (Newline-Delimited JSON)
Section titled “NDJSON (Newline-Delimited JSON)”Best for streaming large datasets. Each line is an independent JSON document:
curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/x-ndjson" \ --data-binary @- <<'EOF'{"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0}{"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99}{"event_type": "SIGNUP", "user_id": "u_003", "timestamp": 1709000120, "amount": 0.0}EOFNDJSON ingestion uses simd-json for high-performance parsing and processes each line as it arrives, keeping memory usage constant regardless of file size.
JSON Array
Section titled “JSON Array”A single JSON array of objects:
curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/json" \ -d '[ {"event_type": "CLICK", "user_id": "u_001", "timestamp": 1709000000, "amount": 0.0}, {"event_type": "PURCHASE", "user_id": "u_002", "timestamp": 1709000060, "amount": 29.99} ]'Gzip Compressed
Section titled “Gzip Compressed”Both NDJSON and JSON array formats support gzip compression for reduced transfer size:
# Compress and ingestgzip -c events.ndjson | curl -s -X POST https://your-project.zatabase.io/v1/collections/events/ingest \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/x-ndjson" \ -H "Content-Encoding: gzip" \ --data-binary @-Ingest Response
Section titled “Ingest Response”The response reports the number of successfully ingested records and any errors:
{ "inserted": 3, "errors": 0, "error_details": []}If individual records fail (e.g., type conversion errors), the rest of the batch continues. Error details include the line number and error message for each failed record.
Querying Collection Data
Section titled “Querying Collection Data”Once data is ingested, the projected columns are available via standard SQL:
# Count events by typecurl -s -X POST https://your-project.zatabase.io/v1/sql \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/json" \ -d '{"query": "SELECT * FROM events WHERE event_type = '\''click'\'' AND amount > 0"}'Since projections apply transforms at ingest time, the stored data reflects the transform. For example, with a lowercase projection on event_type, querying WHERE event_type = 'click' matches documents originally ingested as "CLICK".
# Find recent high-value eventscurl -s -X POST https://your-project.zatabase.io/v1/sql \ -H "Authorization: Bearer $ZATABASE_TOKEN" \ -H "Content-Type: application/json" \ -d '{"query": "SELECT * FROM events WHERE amount > 10.0 ORDER BY timestamp DESC LIMIT 20"}'Use Cases
Section titled “Use Cases”- Event ingestion: Stream application events as NDJSON, project relevant fields, query with SQL
- Log aggregation: Ingest structured logs, project severity/service/timestamp, analyze with WHERE clauses
- Data lake queries: Ingest raw JSON exports from external systems, define projections for the fields you need
- ETL pipelines: Use collections as a staging area; ingest raw data, project to typed columns, then query or export
Performance Considerations
Section titled “Performance Considerations”- NDJSON is preferred for large datasets because it processes line-by-line with constant memory
- Batch sizes of 1,000-10,000 records per HTTP request provide optimal throughput
- Gzip compression reduces network transfer by 5-10x for typical JSON payloads
- Projections are applied at write time, so reads are fast regardless of projection complexity
- Zatabase uses
simd-jsonfor NDJSON parsing, achieving multi-GB/s parse rates on modern CPUs