MirAI-V2

System Architecture Overview

System configuration and data flow of the on-premise AI Video Management System (VMS)

バージョン / Version	2.0.3
発行日 / Issue Date	2026-06-17
発行 / Issued by	MarkAny Co., Ltd.

改訂履歴 / Revision History

版数 / Rev.	発行日 / Date	改訂内容 / Description	承認 / Approved
1.0	2026-06-17	Initial release	—

目次 / Contents

System Overview
Architecture Diagram
Components
Network Configuration
Physical Deployment
Data Flow
Notes and Assumptions

1. System Overview

MirAI-V2 is an on-premise AI Video Management System (VMS) provided by MarkAny Co., Ltd. It ingests video from IP cameras (ONVIF / RTSP), uses an AI inference engine running on an NVIDIA GPU to detect events such as object detection, fire/smoke, fall-down, and intrusion in real time, and provides recording, re-streaming, notification, and monitoring through a dedicated client. All processing — including AI inference — is completed within the on-premise server; video data is never sent to any external cloud.

The system is organized into three principal layers (subsystems):

Camera layer: A fleet of IP cameras compliant with standard protocols (ONVIF / RTSP). The server connects to each camera and pulls the video via RTSP.
Server layer (MirAI Server): The core of the system. It handles video ingestion (GStreamer), the AI inference engine (GPU / TensorRT), the REST / WebSocket / TCP APIs, the PostgreSQL database, the RTSP re-streaming server, and the recording storage. It is implemented in Rust (Axum), with the AI inference engine running as an in-process DLL.
Client layer (MirAI Client): A dedicated monitoring application implemented in Qt / C++. It provides live grid viewing, AI pipeline editing, and review of notifications and incident logs. It connects to the server over encrypted channels (HTTPS / WSS / TCP).

The major subsystems are as follows.

Subsystem	Role
Video ingestion	Pulls video from cameras via RTSP, then splits it with a GStreamer tee-based pipeline into a "recording / re-stream branch (passthrough)" and an "AI inference branch (decode)".
AI inference engine	Runs detection, classification, and tracking on the GPU (TensorRT 10.8 / ONNX Runtime) and generates events based on defined rules.
API services	Provides REST (CRUD operations), WebSocket (real-time sync of events and status), and TCP (binary transfer of video and images).
Database	Stores persistent data — users, camera settings, AI pipelines, incident logs, notification rules, audit trail, etc. — in PostgreSQL.
Recording / re-streaming	Generates recording files (split into segments) and re-delivers video to clients through the RTSP re-streaming server.

2. Architecture Diagram

The block diagram below shows the system components and the data/control flow between cameras, server, and client. The protocol and port are labeled on each arrow.

Figure 1. MirAI-V2 system architecture block diagram (data/control flow; protocols and ports labeled on arrows)

Video from each camera is split into two branches by a GStreamer tee. The "passthrough branch" (for recording and RTSP re-streaming) handles already-encoded packets to minimize CPU load. The "decode branch" (for AI inference) decodes via NVDEC (GPU hardware decoder), throttles the frame rate (default cap of 7–10 fps), and feeds frames to the AI engine.

3. Components

The role and technology of each component are listed below.

Component	Role	Technology
MirAI Server (API)	Provides the REST API (all CRUD), WebSocket (event/status sync), and TCP (video/image transfer); handles authentication/authorization and camera lifecycle management.	Rust / Axum, JWT + RBAC, TLS
AI inference engine	Runs detection (YOLO/SSD family) → classification (fire, fall-down, etc.) → tracking (ByteTrack) → rule evaluation on the GPU to generate events. Runs as an in-process DLL within the server.	C++ / CUDA, TensorRT 10.8, ONNX Runtime
Video ingestion (Ingest)	Connects to cameras via RTSP to pull video; uses a tee to split into recording/re-stream and AI-inference branches, performing GPU decode and frame extraction.	GStreamer, NVDEC (nvh264dec / nvh265dec)
Database	Persists users, cameras, AI pipelines, incident logs, notification rules, audit trail, sessions, and system settings.	PostgreSQL (pool size 10, port 5432)
RTSP re-streaming server	Re-delivers ingested video to clients, providing a main (high-resolution) stream and a sub (low-resolution, for grid view) stream.	RTSP server (main 8554 / sub 8555)
Recording storage	Stores continuous / event recording files (split into 60-second segments), with built-in minimum-free-disk protection.	Dedicated recording drive (HDD / NVMe), ring buffer
MirAI Client	Live grid viewing, node-based AI pipeline editing, review of notifications and incident logs, multi-language UI (Japanese / English / Korean).	Qt 6 / C++ (dark theme)
IP cameras	Video sources. Existing standard-compliant camera assets can be reused.	ONVIF (auto-discovery) / RTSP (manual URL)
GPU (inference accelerator)	Hardware performing AI inference and video decode; determines the number of cameras that can be processed concurrently.	NVIDIA GPU, CUDA 12.8, TensorRT 10.8

On first startup, the AI engine builds GPU-optimized models (TensorRT engines) for your hardware. This takes approximately 15–30 minutes and occurs only once.

4. Network Configuration

MirAI-V2 uses the following ports and protocols for client/server communication and for communication with cameras. When the client and server are on different machines, allow inbound traffic for the relevant ports in the server-side firewall.

4.1 Service Ports (Client ↔ Server)

Purpose	Port	Protocol	Auth
REST API (all CRUD operations)	7878	HTTPS / TCP	Bearer JWT
Real-time sync (events, status changes)	7575	WSS (WebSocket over TLS) / TCP	Token in the first message
Binary file transfer (video clips, images)	7979	TCP	Token header
RTSP re-streaming (main)	8554	RTSP / TCP	—
RTSP re-streaming (sub: low-resolution)	8555	RTSP / TCP	—
Live video streaming	7676	UDP	—

4.2 Server ↔ Cameras / Database

Purpose	Port	Protocol	Notes
Camera video ingestion (RTSP)	554	RTSP / TCP	Standard camera port (may vary by model)
Camera discovery / control (ONVIF)	80 / 443	HTTP / HTTPS	ONVIF device discovery and configuration
Database connection	5432	TCP (PostgreSQL)	Local (localhost) connection recommended

About TLS (encryption): MirAI-V2 enables TLS (TLS 1.3) by default and communicates over HTTPS (7878) and WSS (7575). On first startup, the server automatically generates a self-signed certificate (Trust On First Use). Replacement with a certificate issued by the customer's Certificate Authority (CA) is also supported. Passwords are encrypted with AES-128-CBC in transit and hashed with bcrypt / Argon2id at rest.

5. Physical Deployment

MirAI-V2 supports both a single-server configuration for smaller deployments and a multi-GPU configuration for high-density operation. Adding GPUs scales camera capacity horizontally.

5.1 Single-Server Configuration (Standard)

In the standard configuration, a single GPU-equipped Windows server runs the launcher (Windows service), the MirAI Server (API, AI engine, ingestion), PostgreSQL, and the recording storage all on one machine. Clients connect from the same machine or from separate machines over LAN / VPN.

Category	Configuration
Server machine	One GPU-equipped Windows server (workstation)
Running processes	Launcher (Windows service) / MirAI Server (API, AI engine DLL, GStreamer ingestion) / PostgreSQL
Clients	One or more client machines (same machine, or over LAN / VPN)
GPU	1× NVIDIA GPU

5.2 Multi-GPU Configuration (High-Density Operation)

For high-density operation with concurrent AI inference across many cameras, multi-GPU configurations (2 CPU / 4–8 GPU) are supported. One inference engine instance is assigned per GPU, and cameras are distributed across GPUs (via a frame router) to balance the load. Guidance on camera capacity by GPU memory (VRAM) is shown below.

GPU Memory (VRAM)	Recommended cameras (per GPU)	Inference batch size (guide)
8 GB	8 – 12	16
12 GB	12 – 20	24
16 GB	16 – 24	30
24 GB	24 – 32	32

Server configuration	GPUs	Camera capacity (guide)
Single-GPU configuration	1	Per the table above, based on GPU VRAM
4-GPU configuration	4	~64 cameras (16 per GPU)
8-GPU configuration	8	~128 cameras (16 per GPU)

The tables above are design guidance; actual capacity varies with resolution, analyzed FPS, the AI features enabled, and camera frame rates. In a multi-GPU configuration, if any GPU fails, the affected cameras are reassigned to healthy GPUs and the system continues operating in a degraded mode. The final configuration and capacity are confirmed through a pre-deployment proof of concept (PoC) and sizing.
Specific server specification (CPU, GPU model/count, storage capacity): [To be finalized per sizing results]

6. Data Flow

The end-to-end flow — from camera ingestion to display and notification on the client — is shown in numbered steps.

Video ingestion (camera → server): The server connects to each camera via RTSP (port 554) and pulls video using GStreamer. A tee splits the stream into a "recording / re-stream branch (passthrough)" and an "AI inference branch (decode)".
Decode and frame extraction (ingest → AI): On the AI branch, NVDEC (GPU hardware decoder) decodes the video, the frame rate is throttled for analysis (default cap of 7–10 fps), and frames are extracted.
AI inference (AI engine): Extracted frames are batch-processed on the GPU (TensorRT / ONNX Runtime), running detection → classification → tracking. In multi-GPU configurations, a frame router distributes cameras across GPUs.
Event evaluation (rules): Detection results are evaluated against the defined event rules and false-positive thresholds (e.g., fire 0.45 / smoke 0.40 / fall-down 0.80); an incident is generated when conditions are met.
Recording and persistence (passthrough branch → storage / DB): The encoded video on the passthrough branch is stored as recording files (60-second segments), and incident information (thumbnail, video path, type, confidence, timestamp) is written to PostgreSQL.
Notification and delivery (server → client): After applying notification rules (rate limiting, quiet hours, etc.), incidents are dispatched to the configured channels (Email / Webhook / SNMP / HTTP) and broadcast to connected clients over WebSocket (7575).
Client display (client): The client shows incidents received over WebSocket in the notification panel, fetches thumbnails and video clips over HTTP (7878) / TCP (7979), and displays live video via RTSP re-streaming (8554 / 8555).

7. Notes and Assumptions

This is an on-premise system (installed on the customer's premises). AI inference is completed within the server; video data is not sent to any external cloud.
An NVIDIA GPU is required for AI inference. AI features will not operate on non-NVIDIA GPUs or on systems without a GPU.
The confirmed values in this document (ports, protocols, technologies, etc.) are based on the standard configuration of MirAI-V2 v2.0.3.
Capacity, frame-rate, and similar figures shown in the diagram and text are guidance and are finalized during sizing based on the customer's operating conditions (camera count, resolution, analyzed FPS, retention period, etc.). Please also refer to the separate "System Requirements" document.
The contents of this document are subject to change without notice due to product improvements and other factors.