Ocient Hyperscale Data Warehouse

01 — Product Overview

What is Ocient?

Ocient is a modern, purpose-built hyperscale data warehouse engineered to run real-time OLAP analytics on the world's largest and most complex structured and semi-structured datasets using standard ANSI SQL.

Founded in 2016 by industry veterans, Ocient set out to solve a fundamental problem: traditional analytical databases don't scale cost-effectively to extremely large datasets, while Big Data solutions can scale but are expensive, complex, and too slow for interactive analysis. Ocient bridges that gap — delivering interactive query speeds at petabyte scale.

"Queries that used to take hours, or not run at all, now execute in seconds, and systems that used to fill up multiple data center racks now require up to 90% less space and energy." — Ocient Documentation

⚡

Ultra-Fast Analytics

Purpose-built execution engine and I/O layer delivers 10×–50× faster performance than competitive solutions on hyperscale datasets.

📐

ANSI SQL Standard

Full SQL dialect closely following PostgreSQL conventions. JDBC, pyocient, and ODBC drivers available. No proprietary language learning required.

🔧

Commodity Hardware

Designed to run on standard hardware. Supports on-premises, OcientCloud®, and public cloud (AWS, GCP) deployments.

🤖

In-Database ML/AI

OcientML® places the entire machine learning stack inside the database. Train and score models against petabytes of data with SQL.

🗺️

Native Geospatial

OcientGeo® provides native geospatial and spatiotemporal analytics including point, polygon, and complex geographic analysis.

♻️

Energy Efficient

Up to 90% smaller data center footprint. OcientCloud® runs on 100% renewable energy in a LEED-certified facility.

Key differentiation: Ocient consolidates real-time analytics, traditional OLAP, ETL/ELT pipelines, geospatial analysis, and in-database machine learning onto a single unified platform — eliminating the cost and complexity of multiple specialized systems.

02 — Architecture

System Architecture

Ocient is a distributed system built around the Compute-Adjacent Storage Architecture™ (CASA) — co-locating NVMe SSD storage with compute resources to eliminate common bottlenecks.

The Three Node Types

Every Ocient system is composed of three distinct node roles, each with a specific responsibility. These roles are consistent across all environments — only the number of nodes varies.

CLIENT CONNECTIONS

JDBC Client

Port 4050

pyocient

Python Driver

BI Tools / CLI

3rd Party Clients

TCP / Network

SQL NODES — Query Parsing & Orchestration

SQL Node 1

Parse · Plan · Orchestrate

SQL Node 2

Load Balanced

100 Gbps High-Speed Network

FOUNDATION NODES — Storage + Bulk Processing (CASA)

FN 1

NVMe · Columnar

FN 2

NVMe · Columnar

FN 3

NVMe · Columnar

FN N…

Scalable

100 Gbps Loading Network

LOADER NODES — ETL / Stream Ingestion

Loader 1

Extract · Transform · Index

Loader 2

Kafka · S3 · Files

SQL Node

Foundation Node

Loader Node

SQL Nodes

SQL nodes are the entry point to the system. They receive incoming SQL from JDBC/pyocient clients, parse statements, and create an execution plan using one of two optimization methods. Once planned, the work is distributed to foundation nodes. SQL nodes also handle final aggregations and joins on intermediate result sets returned from foundation nodes, then package and return results to the client.

Administrators connect to SQL nodes via CLI or SQL client to issue DDL/DCL commands, which the node then propagates throughout the system. Multiple SQL nodes provide automatic load balancing across connections.

Foundation Nodes

Foundation nodes are the heart of Ocient — they store user data in columnar format on NVMe SSDs and perform the bulk of query processing. The CASA principle means data and compute are co-located: when a query arrives, a foundation node processes as much of it as possible against its own local data before returning intermediate results upstream.

Foundation nodes contain the majority of storage in an Ocient system and are typically the largest in number. They are connected to SQL nodes via 100 Gbps high-speed network, and all foundation nodes are connected to all SQL nodes.

Loader Nodes

Loader nodes handle the full ETL/ELT ingestion lifecycle: extracting from batch file sources (e.g., S3) or streaming sources (e.g., Kafka), transforming using SQL functions, indexing data, and loading into foundation nodes. They operate in a horizontal scale-out fashion, so adding more loader nodes increases ingestion throughput transparently.

Loader nodes also enforce exactly-once delivery guarantees — ensuring data is never duplicated even in the case of network or system failure during loading.

Networking

Two separate networks connect the nodes of an Ocient system. A 100 Gbps high-speed network handles query execution traffic between SQL nodes and foundation nodes, and data movement between loader nodes and foundation nodes. A 10 Gbps network handles administrative flows — DDL/DCL command propagation across the system.

Every foundation node is connected to every SQL node. All nodes are connected through these two networks, ensuring high throughput and low latency for every workload type.

CASA (Compute-Adjacent Storage Architecture): By co-locating NVMe drive storage with compute resources, Ocient avoids the network bottlenecks common in cloud-style separated storage/compute architectures. Data never needs to leave the node for first-pass processing.

03 — Storage Technology

Storage, Segments & Erasure Coding

Ocient stores data in a columnar format organized into segments and segment groups, protected by erasure coding for fault tolerance without the overhead of full data replication.

Columnar Storage

While tables are created and queried using familiar SQL syntax, Ocient stores data on disk in a highly compressed columnar format. Segments are the fundamental storage unit: they contain rows organized by column, along with embedded indexes and statistical metadata used to accelerate query processing.

Pages → Segments → Segment Groups

As data is ingested by loader nodes, it is initially stored in row-based pages on foundation nodes for rapid ingestion throughput. As pages accumulate, loader nodes convert them into columnar segments — highly compressed structures that include data, multiple indexes, and metadata.

Multiple segments combine to form segment groups. A segment group has a fixed width (number of segments) and a defined number of parity blocks for resilience. Segment groups are physically stored in a storage cluster — a set of foundation nodes with an associated storage space.

Storage Configuration

When configuring an Ocient system, administrators define at least one storage space and storage cluster. At the storage space level, administrators set the width (number of segments per group) and parity_width (number of parity blocks), which determines the level of fault tolerance.

Erasure Coding for Fault Tolerance

Ocient uses erasure coding — not data replication — for hardware fault tolerance. Erasure coding computes parity blocks that allow the system to reconstruct any missing data, without needing to store a second (or third) full copy of all data.

This design means an Ocient system requires significantly less storage than replication-based approaches, while still providing full recovery from hardware failures. The coding block is the smallest unit of recovery and the unit of parity calculation.

Data

→

Parity

Data Block

Parity Block

2 parity blocks can recover any 2 failed data blocks

Data Types Supported

INT8/16/32/64 FLOAT/DOUBLE DECIMAL VARCHAR CHAR BOOLEAN TIMESTAMP DATE / TIME IP ADDRESS ARRAYS TUPLES ST_POINT ST_POLYGON ST_LINESTRING VECTOR BINARY

TimeKey & Clustering Key

⏱ TimeKey

Each table can designate a column as its TimeKey. This column partitions data on disk by time, enabling the system to rapidly skip irrelevant time ranges during query execution without reading unnecessary data. Since most analytical queries include a time filter, this is a critical performance mechanism.

🗂 Clustering Key

In addition to the TimeKey, tables can specify a Clustering Key: one or more columns that are frequently queried together. The system subdivides time-partitioned segments further on disk according to these key columns, enabling fast lookup of records with matching key values within a partition.

04 — Data Ingestion

Data Pipelines & Loading

Ocient uses SQL-defined data pipelines as the primary mechanism for ingesting data from both batch file sources and real-time streaming sources, with real-time transformation and exactly-once delivery semantics.

Pipeline Overview

A data pipeline is a database object that defines end-to-end data processing: extraction from a source, optional transformations, and load into one or more Ocient tables. Pipelines are defined, started, stopped, and modified using standard DDL statements.

📥

Source

S3, Kafka, File

📋

Extract Format

Delimited, JSON, Binary, Parquet

⚙️

Transform

SQL Functions on records

📄

Page Buffer

Row-based staging

🗄️

Segment

Columnar + Indexes

✅

Foundation Nodes

Queryable storage

Pipeline DDL Structure

A pipeline definition contains three sections: the data source, the extract format, and the transformation/target specification:

-- Create a pipeline loading CSV data from S3 into the orders table
CREATE PIPELINE orders_pipeline
  SOURCE S3
    ENDPOINT 'https://s3.us-east-1.amazonaws.com'
    BUCKET   'my-data-bucket'
    FILTER   'orders/'
  EXTRACT FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY  '\n'
  INTO "public"."orders"
  SELECT
    $1 AS id,
    $2 AS user_id,
    $3 AS product_id,
    TO_TIMESTAMP($4, 'YYYY-MM-DD HH24:MI:SS') AS order_ts,
    CAST($5 AS DECIMAL(10,2)) AS amount;

-- Lifecycle management
START PIPELINE  orders_pipeline;
STOP PIPELINE   orders_pipeline;
DROP PIPELINE   orders_pipeline;
PREVIEW PIPELINE orders_pipeline;

Pipeline Lifecycle States

State	Description
CREATED	Pipeline defined but never started. No tasks created, no files listed.
RUNNING	Actively processing data. At least one task is queued, running, or cancelling.
STOPPED	User-initiated stop. All tasks complete, failed, or cancelled. Position retained.
COMPLETED	All assigned work finished within error limits. All tasks complete.
FAILED	Error limits exceeded. At least one task failed. Pipeline will not retry.

Supported Sources

☁️ AWS S3

📡 Apache Kafka

📁 Local Files

☁️ Google Cloud Storage

☁️ Azure Blob Storage

Supported Extract Formats

CSV / Delimited

JSON

Binary

Parquet

Avro

ORC

Key Guarantees

Exactly-Once Delivery: Loader nodes maintain pipeline position and deduplication state. If a pipeline is stopped and restarted, it continues from where it left off — no data is duplicated and no data is lost.

Scale-Out Loading: Pipelines execute across all available loader nodes in parallel. Work is partitioned across tasks (file chunks or Kafka partitions). Add more loader nodes to increase throughput without any configuration changes.

ELT Support: In addition to pipelines, Ocient supports CREATE TABLE AS SELECT (CTAS) and INSERT INTO … SELECT for ELT workflows that extract data and write results directly into new or existing tables.

05 — Query Engine

Querying & Performance

Ocient's query engine was built from scratch using modern design principles, deeply integrated with the storage and I/O layer to minimize on-disk reads and maximize parallel processing throughput.

Query Execution Flow

When a user issues a SQL statement through a JDBC or pyocient client:

Client sends SQL to a SQL node (port 4050, load balanced)
SQL node parses the statement and creates an execution plan
Plan is distributed to relevant foundation nodes
Each foundation node processes its local data and returns intermediate results
SQL node performs final aggregations, joins, and sorts across intermediates
Packaged result set is returned to the client

The foundation nodes do as much work as possible locally (filtering, projection, local aggregation) before returning data up to the SQL node, minimizing data movement across the network.

Custom I/O Pipeline Per Query

For each query, the Ocient system compiles a custom I/O pipeline for every relevant data segment. These pipelines are tailored to use any applicable keys or indexes to reduce the volume of data that must be read from disk.

This design means that query performance directly benefits from proper index configuration — the system does not perform table scans when indexes can be used to skip irrelevant data blocks.

Query Optimization Methods

SQL nodes use two complementary query optimization strategies. The cost-based optimizer analyzes data statistics and available indexes to construct efficient execution plans. For complex workloads, the system can also apply rule-based optimizations to rewrite query plans before execution.

SQL Capabilities

🔗

Joins

INNER, LEFT/RIGHT OUTER, CROSS joins across large tables. Join pushdown to foundation nodes where possible.

📊

Aggregate Functions

Standard (SUM, AVG, COUNT, MIN, MAX) plus sorted aggregates and window/analytic functions.

🪟

Window Functions

Full OVER() clause support with PARTITION BY, ORDER BY, ROWS/RANGE frames, LAG, LEAD, RANK, DENSE_RANK, NTILE.

📋

DDL & DCL

CREATE/ALTER/DROP TABLE, VIEW, INDEX, SCHEMA, DATABASE. GRANT/REVOKE role-based access control.

🔍

Semi-Structured Data

Query JSON and complex data types including arrays, tuples, and IP addresses inline with SQL operators.

💾

Result Set Caching

Configurable result set caching to avoid re-executing identical queries. Managed per database via DBA settings.

🔄

Workload Management

Assign priority levels to users and groups. Control resource allocation and query priority across concurrent workloads.

📐

PostgreSQL Compatibility

SQL dialect closely follows PostgreSQL conventions. Most PostgreSQL functions work identically in Ocient.

Connecting to Ocient

-- JDBC Connection String
jdbc:ocient://<sql_node_host>:4050/<database>

// Java JDBC Example
Class.forName("com.ocient.jdbc.JDBCDriver");
Connection conn = DriverManager.getConnection(
    "jdbc:ocient://host:4050/mydb",
    "username", "password"
);

# Python pyocient Example
import pyocient
conn = pyocient.connect(
    dsn="ocient://user:pass@host:4050/mydb"
)
cursor = conn.cursor()
cursor.execute("SELECT COUNT(*) FROM events WHERE ts > '2024-01-01'")
rows = cursor.fetchall()

06 — Indexing

Indexing Strategy

Multi-layer indexing is central to Ocient's performance model. Indexes are embedded in segments alongside data and require no separate storage overhead for segment keys.

SEGMENT KEY

TimeKey (Temporal Partition Key)

Partitions all data in a table by time. Queries with time filters skip entire partitions, dramatically reducing I/O. Defined at table creation time and cannot be changed later. Recommended for any time-series dataset. No additional storage required.

SEGMENT KEY

Clustering Key (Multi-Column Sort Key)

Sorts and subdivides data within time partitions by one or more columns frequently queried together. Enables fast lookup without full partition scans. Ideal for high-cardinality filter columns like user_id, device_id, or IP address. No additional storage required.

SECONDARY

Numeric Index

Dense or sparse B-tree style index on numeric columns. Dramatically reduces I/O for equality and range queries on numeric columns that are not part of the clustering key. Useful for columns with medium-to-high cardinality.

SECONDARY

String (Full-Text) Index

Full text index on VARCHAR columns for exact-match and prefix queries. Enables fast lookup on string identifier columns without full segment scans.

SECONDARY

N-Gram Index (Partial String)

An N-gram index on VARCHAR columns enables efficient LIKE queries with wildcard patterns (e.g., WHERE col LIKE '%substring%'). Particularly useful for log analysis and text search workloads.

SECONDARY

Geospatial Index

Spatial index on ST_POINT and ST_POLYGON columns for bounding box and containment queries. Used by OcientGeo® functions to efficiently resolve geographic predicates without scanning entire segments.

Defining Indexes — DDL Examples

-- Create table with TimeKey and Clustering Key
CREATE TABLE events (
  event_id   BIGINT,
  user_id    BIGINT,
  event_ts   TIMESTAMP,
  event_type VARCHAR(64),
  ip_addr    IP,
  payload    VARCHAR(512)
)
TIMEKEY     event_ts
CLUSTERING KEY (user_id, event_type);

-- Add secondary indexes after table creation
CREATE INDEX idx_ip         ON events(ip_addr)    USING NUMERIC;
CREATE INDEX idx_payload    ON events(payload)   USING NGRAM;
CREATE INDEX idx_event_type ON events(event_type) USING STRING;

Best Practice: Configure all keys and indexes before loading large amounts of data. Data loaded before index creation is stored without the index structures; re-indexing existing data requires a segment rebuild operation.

07 — OcientML®

In-Database Machine Learning

OcientML® places the entire machine learning stack inside the Ocient Hyperscale Data Warehouse, eliminating the need for separate ML tooling, data movement, or external model training infrastructure.

Key insight: Traditional ML workflows require extracting data from a data warehouse, loading it into a separate ML platform, training the model, and then returning predictions. OcientML® collapses this entire workflow into a single system — train and score directly against petabytes of fresh data using SQL.

Supported ML Algorithms

Linear Regression

Ordinary least squares regression for continuous numeric target prediction. Train and score in SQL.

Logistic Regression

Binary and multinomial classification. Outputs class probabilities alongside predictions.

K-Means Clustering

Unsupervised cluster assignment. Assign cluster IDs to new data using a trained K-means model in SQL queries.

K-Nearest Neighbors (KNN)

Instance-based classification and regression. Find nearest neighbors at query time across large datasets.

Decision Tree

Interpretable classification and regression trees with configurable depth and split criteria.

Feedforward Neural Network

Multi-layer perceptrons for both classification and regression tasks. Configurable layers and activation functions.

Naive Bayes

Probabilistic classifier based on Bayes' theorem. Fast and effective for text classification and anomaly detection.

Autoregression

Time series forecasting using autoregressive models. Forecast future values based on historical patterns.

Principal Component Analysis (PCA)

Dimensionality reduction. Reduce feature space for downstream analysis or visualization.

Support Vector Machine (SVM)

Classification and regression with kernel support. Effective for high-dimensional data.

Training & Scoring with SQL

-- Train a logistic regression model
CREATE MODEL churn_model
  TYPE LOGISTIC
  TARGET churned
  FEATURES (tenure_days, monthly_spend, support_tickets, last_login_days)
  AS SELECT tenure_days, monthly_spend, support_tickets,
             last_login_days, churned
     FROM   customer_features
     WHERE  training_set = TRUE;

-- Score new customers using the trained model
SELECT
  customer_id,
  PREDICT(churn_model, tenure_days, monthly_spend, support_tickets, last_login_days) AS churn_probability
FROM customers
WHERE active = TRUE;

-- K-Means clustering on user behavior
CREATE MODEL user_segments
  TYPE KMEANS
  CLUSTERS 5
  AS SELECT page_views, session_duration, purchase_count
     FROM   user_behavior;

Trained models are stored as database objects, subject to the same role-based access controls as tables and views. Models can be exported to BI tools via built-in connectors or scored directly inside dashboards using the SQL interface.

08 — OcientGeo®

Native Geospatial Analytics

OcientGeo® provides a comprehensive suite of geospatial and spatiotemporal analysis capabilities built directly into the SQL engine — enabling complex geographic queries at petabyte scale without external GIS systems.

Geospatial Data Types

Type	Description
ST_POINT	Single geographic coordinate (lon, lat)
ST_LINESTRING	Ordered sequence of points forming a line or path
ST_POLYGON	Closed polygon defined by a ring of coordinates
GEOGRAPHY	Generic geography type with spherical earth calculations
GEOHASH	Compact string encoding of a geographic location

Key Spatial Functions

ST_Distance ST_DistanceSphere ST_DistanceSpheroid ST_Contains ST_Within ST_Intersects ST_DWithin ST_Length2D ST_HausdorffDistance ST_Area ST_Dimension ST_GeoHash ST_SRID ST_AsWKT ST_AsWKB ST_AsEWKT ST_Simplify ST_MakeEnvelope ST_GeoGPoint ST_MakePolygonOriented

Geospatial SQL Examples

-- Find all events within 5km of a location
SELECT event_id, event_ts,
  ST_Distance(location,
    ST_GeoGPoint(-87.6298, 41.8781)) AS dist_m
FROM mobile_events
WHERE ST_DWithin(
  location,
  ST_GeoGPoint(-87.6298, 41.8781),
  5000  -- meters
)
ORDER BY dist_m;

-- Count devices in a polygon region
SELECT COUNT(*) AS devices_in_zone
FROM device_pings
WHERE ST_Contains(
  ST_Polygon_FromEWKT('POLYGON((-87.7 41.9,-87.5 41.9,...))'),
  device_location
);

-- Aggregate by geohash grid cell
SELECT
  ST_GeoHash(location, 6) AS cell,
  COUNT(*)               AS event_count
FROM events
GROUP BY cell
ORDER BY event_count DESC;

Use Cases for OcientGeo®

📡 Telecom

Network tower coverage analysis, customer location density mapping, spatiotemporal churn analysis.

🚗 AdTech

Geo-targeted ad delivery, store visit attribution, location-based audience segmentation.

🛡 Government

Territory management, satellite imagery analysis, geofence monitoring and compliance.

🏦 Financial

Fraud detection via location anomaly, risk exposure by geography, branch coverage modeling.

09 — Deployment

Deployment Options

Ocient is designed for flexible deployment with consistent feature parity across all options. Every deployment includes 24/7 critical on-call support, subscription licensing, training, and access to updates.

☁️ OcientCloud®

Fully managed Software-as-a-Service
Hosted in LEED-certified data center, Chicago
100% renewable energy powered
Ocient-owned and procured hardware
All-inclusive: hardware, network, software, ops, support
Optimal hardware pre-configured for peak performance
Simplify cloud migration with no training overhead
Predictable low-cost pricing based on cores

🏢 On-Premises

Self-managed in your own data center
Runs on commodity (standard) hardware
Full data sovereignty and compliance control
Maximum price-performance when using NVMe hardware
Air-gapped environments supported
Ideal for regulated industries (government, finance)
Ocient Management Services available as add-on
Same performance as cloud deployments

🌐 Public Cloud

Deploy on AWS or Google Cloud Platform
Customer-managed cloud account and resources
Elastic scaling of cloud instances
Integrate with cloud-native data sources (S3, GCS)
Case study: Dun & Bradstreet on Google Cloud, ROI in 10 months
Compatible with cloud-native security and networking
Bring-your-own cloud contract pricing
Ocient software installed on cloud VMs

Management Services: Regardless of deployment model, organizations can engage the Ocient Management Services team to handle system setup, 24/7 monitoring, software updates, and ongoing operations. This is offered as a fully managed layer on top of any deployment modality.

Ocient Simulator

Ocient provides a Simulator — a single-node development and testing environment — for developers and data teams to explore Ocient's SQL dialect, pipeline functionality, and ML capabilities without provisioning a full multi-node system. The Simulator is ideal for query development, schema design, and pipeline prototyping.

10 — Security & Compliance

Security Model

As a unified platform, Ocient consolidates security capabilities in one place, simplifying compliance and reducing the attack surface compared to multi-system analytics stacks.

🔐 Encryption

Optional TLS/SSL encryption for data in transit between clients and SQL nodes. Encryption at rest configurable at the storage space level.

👤 Authentication

Native username/password authentication, SSO integration (Single Sign-On), and configurable authentication policies per user or group.

🎭 Role-Based Access Control

SQL GRANT/REVOKE DCL statements for fine-grained table, schema, database, and system-level permissions. User groups and roles for scalable access management.

📋 Auditing

Comprehensive log-level monitoring and auditing of all query and administrative activity. System catalog tables expose audit trails for compliance reporting.

📊 Compliance

SOC 2 Type 2 certified. OcientCloud® operates in a LEED-certified facility. Designed to support HIPAA, FedRAMP-aligned, and government-regulated workloads.

🌐 Network Security

Separate admin network (10 Gbps) isolated from query traffic (100 Gbps). Configurable network access controls and firewall integration at the infrastructure level.

🔑 ML Model Access Control

ML models created with OcientML® are governed by the same RBAC framework as database objects — access to train, score, or drop models is controlled via SQL GRANT/REVOKE.

🏷 Priority & Resource Controls

User group priority settings control query scheduling and resource allocation. Prevent any single workload or user from monopolizing system resources.

11 — Administration

Database & System Administration

Ocient provides comprehensive tooling for both Database Administrators (DBAs) managing data structures, users, and performance; and System Administrators managing node configuration, monitoring, and maintenance.

Database Administration

DBAs interact with Ocient through standard SQL DDL and DCL commands via any connected SQL client. Key responsibilities include:

Schema, table, and view management
Index creation and optimization
User, group, and role management
Workload priority configuration
Result set caching configuration
Data compression settings
Storage group assignment
Query performance analysis via system catalog
Pipeline monitoring and error handling

System Administration

System administrators handle the physical infrastructure layer of an Ocient deployment:

Node provisioning and role assignment
Storage cluster and storage space configuration
Segment group width and parity configuration
System installation, upgrades, and patches
Network configuration (100 Gbps / 10 Gbps)
System health monitoring and alerting
Backup and recovery procedures
Hardware failure detection and erasure code recovery
Integration with Datadog, Prometheus, and other observability tools

System Catalog

All operational metadata in Ocient is exposed through queryable system catalog tables (prefixed with sys_). DBAs and system admins can query these tables using standard SQL to monitor the state of the system in real time.

-- View all pipeline events
SELECT * FROM sys_pipeline_events
WHERE  pipeline_name = 'orders_pipeline'
ORDER BY event_ts DESC;

-- View pipeline errors for troubleshooting
SELECT * FROM sys_pipeline_errors
WHERE  error_ts > NOW() - INTERVAL '1 hour';

-- Check active queries and their state
SELECT query_id, user_name, state, elapsed_ms, query_text
FROM   sys_queries
WHERE  state = 'RUNNING';

-- Check node health and storage utilization
SELECT node_id, node_role, status, storage_used_gb, storage_total_gb
FROM   sys_nodes;

Datadog Integration

Ocient provides a native Datadog integration for shipping OpenMetrics to Datadog dashboards. Install the integration with:

# Install Datadog integration
agent integration install -t datadog-ocient==1.0.0

# ocient.d/conf.yaml
instances:
  - use_openmetrics: true
    openmetrics_endpoint: http://<master_node_host>:9090/metrics

Metrics include query performance counters, disk usage, database table statistics, loading throughput, and system health indicators.

12 — Integrations

Ecosystem & Integrations

Ocient integrates with the broader data ecosystem through standard SQL interfaces and purpose-built connectors for BI tools, data engineering platforms, and monitoring systems.

Connection Drivers

JDBC Driver

Official Java JDBC driver for connecting any JVM-based application, BI tool, or ETL framework to Ocient. Default port 4050. Used by Tableau, DBeaver, Metabase, and JDBC-compatible tools.

com.ocient.jdbc.JDBCDriver
jdbc:ocient://host:4050/db

pyocient

Official Python driver (DB-API 2.0 compliant). Enables Python applications, data science notebooks, and Airflow DAGs to connect to Ocient with native Python syntax.

pip install pyocient
import pyocient
conn = pyocient.connect(dsn="...")

Business Intelligence & Visualization

BI Tableau

BI Metabase

BI Looker

BI Power BI

BI Apache Superset

BI Grafana

BI Mode Analytics

BI Redash

Data Engineering & ETL

ETL Apache Kafka

ETL Apache Airflow

ETL dbt

ETL Fivetran

ETL Spark

STORAGE AWS S3

STORAGE Google Cloud Storage

STORAGE Azure Blob Storage

Monitoring & Observability

MONITORING Datadog

MONITORING Prometheus / OpenMetrics

DATABASE DBeaver

AI LLM / AI Prompt Tools

AI Tools Integration

Ocient provides a machine-readable documentation endpoint at https://docs.ocient.com/llms.txt for optimal content extraction by LLM-based AI tools. This enables AI coding assistants to provide accurate SQL and API suggestions when working with Ocient systems.

Vertical Industry Solutions

📡

AdTech

Real-time bidding analytics, attribution modeling, audience segmentation at billions of events per day.

📶

Telecommunications / CSPs

Network visibility, data retention & regulatory compliance, CDR analysis, spatiotemporal network performance.

🛡

Government & Defense

National security analytics, large-scale surveillance data processing, satellite imagery, intelligence workflows.

🏦

Financial Services

Real-time fraud detection, risk analytics, trade surveillance, regulatory reporting on massive transaction histories.

⚙️

Operational IT

Log analytics, infrastructure performance monitoring, security event correlation at hyperscale data volumes.

🗺

Geospatial

Location intelligence, proximity analysis, spatiotemporal data fusion for mapping and territory analytics.

13 — Glossary

Glossary of Terms

Core terminology used throughout the Ocient Hyperscale Data Warehouse documentation and product.

CASA (Compute-Adjacent Storage Architecture)

Ocient's patented design principle of co-locating NVMe storage with compute resources on foundation nodes to eliminate storage-compute network bottlenecks.

System

The complete collection of all SQL nodes, foundation nodes, loader nodes, their data, and interconnecting networks in a single Ocient environment.

SQL Node

Node role responsible for parsing SQL, creating execution plans, distributing work to foundation nodes, and returning results to clients.

Foundation Node

Node role that stores columnar data on NVMe SSDs and performs the bulk of query processing as close to the data as possible.

Loader Node

Node role responsible for extracting, transforming, indexing, and loading data from batch files or streaming sources into foundation nodes.

Segment

The primary columnar storage unit on foundation nodes. Contains compressed row data, embedded indexes, and statistical metadata for a range of records.

Segment Group

A fixed set of segments (the "width") plus associated parity blocks for erasure coding resilience, stored across a storage cluster.

Erasure Coding

A method of organizing data and computing parity blocks so the system can reconstruct missing data after hardware failure, without storing full redundant copies.

Page

A row-based staging structure used during data loading before records are converted into columnar segments for permanent storage.

TimeKey

A special segment key column that time-partitions data on disk, enabling queries with time filters to skip irrelevant time ranges entirely.

Clustering Key

One or more columns that further subdivide time partitions on disk, enabling fast lookup of rows matching specific key values within a partition.

Data Pipeline

A SQL-defined database object that manages end-to-end data ingestion from a source through transformation into Ocient tables, with lifecycle state tracking.

Storage Cluster

A set of foundation nodes with an associated storage space that coordinates together to store segment groups reliably using erasure coding.

Storage Space

A configuration object that defines the width and parity width for segments stored on a storage cluster, determining fault tolerance levels.

OcientML®

Ocient's in-database machine learning capability, allowing SQL-based model training, scoring, and management directly against data in the warehouse.

OcientGeo®

Ocient's native geospatial and spatiotemporal analytics library, providing ST_ spatial functions and geographic data types within the SQL engine.

OcientCloud®

Ocient's fully managed, SaaS deployment option hosted in a LEED-certified Chicago data center running on 100% renewable energy.

N-Gram Index

A secondary index on VARCHAR columns that breaks strings into overlapping character n-grams to enable efficient LIKE substring queries.

System Catalog

A set of queryable sys_ prefixed tables that expose real-time metadata about the Ocient system, including queries, pipelines, nodes, and events.

Workload Management

The system's capability to assign priority levels to users and user groups, controlling how compute resources are allocated across concurrent queries.

Ocient HyperscaleData Warehouse

What is Ocient?

Ultra-Fast Analytics

ANSI SQL Standard

Commodity Hardware

In-Database ML/AI

Native Geospatial

Energy Efficient

System Architecture

The Three Node Types

SQL Nodes

Foundation Nodes

Loader Nodes

Networking

Storage, Segments & Erasure Coding

Columnar Storage

Pages → Segments → Segment Groups

Storage Configuration

Erasure Coding for Fault Tolerance

Data Types Supported

TimeKey & Clustering Key

⏱ TimeKey

🗂 Clustering Key

Data Pipelines & Loading

Pipeline Overview

Pipeline DDL Structure

Pipeline Lifecycle States

Supported Sources

Supported Extract Formats

Key Guarantees

Querying & Performance

Query Execution Flow

Custom I/O Pipeline Per Query

Query Optimization Methods

SQL Capabilities

Joins

Aggregate Functions

Window Functions

DDL & DCL

Semi-Structured Data

Result Set Caching

Workload Management

PostgreSQL Compatibility

Connecting to Ocient

Indexing Strategy

TimeKey (Temporal Partition Key)

Clustering Key (Multi-Column Sort Key)

Numeric Index

String (Full-Text) Index

N-Gram Index (Partial String)

Geospatial Index

Defining Indexes — DDL Examples

In-Database Machine Learning

Supported ML Algorithms

Linear Regression

Logistic Regression

K-Means Clustering

K-Nearest Neighbors (KNN)

Decision Tree

Feedforward Neural Network

Naive Bayes

Autoregression

Principal Component Analysis (PCA)

Support Vector Machine (SVM)

Training & Scoring with SQL

Native Geospatial Analytics

Geospatial Data Types

Key Spatial Functions

Geospatial SQL Examples

Use Cases for OcientGeo®

📡 Telecom

🚗 AdTech

🛡 Government

🏦 Financial

Deployment Options

Ocient Simulator

Security Model

🔐 Encryption

👤 Authentication

🎭 Role-Based Access Control

Ocient Hyperscale
Data Warehouse