MirAI-V2

System Architecture Overview

System configuration and data flow of the on-premise AI Video Management System (VMS)
バージョン / Version2.0.3
発行日 / Issue Date2026-06-17
発行 / Issued byMarkAny Co., Ltd.

改訂履歴 / Revision History

版数 / Rev.発行日 / Date改訂内容 / Description承認 / Approved
1.02026-06-17Initial release

目次 / Contents

  1. System Overview
  2. Architecture Diagram
  3. Components
  4. Network Configuration
  5. Physical Deployment
  6. Data Flow
  7. Notes and Assumptions

1. System Overview

MirAI-V2 is an on-premise AI Video Management System (VMS) provided by MarkAny Co., Ltd. It ingests video from IP cameras (ONVIF / RTSP), uses an AI inference engine running on an NVIDIA GPU to detect events such as object detection, fire/smoke, fall-down, and intrusion in real time, and provides recording, re-streaming, notification, and monitoring through a dedicated client. All processing — including AI inference — is completed within the on-premise server; video data is never sent to any external cloud.

The system is organized into three principal layers (subsystems):

The major subsystems are as follows.

SubsystemRole
Video ingestionPulls video from cameras via RTSP, then splits it with a GStreamer tee-based pipeline into a "recording / re-stream branch (passthrough)" and an "AI inference branch (decode)".
AI inference engineRuns detection, classification, and tracking on the GPU (TensorRT 10.8 / ONNX Runtime) and generates events based on defined rules.
API servicesProvides REST (CRUD operations), WebSocket (real-time sync of events and status), and TCP (binary transfer of video and images).
DatabaseStores persistent data — users, camera settings, AI pipelines, incident logs, notification rules, audit trail, etc. — in PostgreSQL.
Recording / re-streamingGenerates recording files (split into segments) and re-delivers video to clients through the RTSP re-streaming server.

2. Architecture Diagram

The block diagram below shows the system components and the data/control flow between cameras, server, and client. The protocol and port are labeled on each arrow.

IP / ONVIF Cameras ONVIF (auto-discovery) / RTSP (manual URL) RTSP ↓ ingest (554) MirAI Server (Rust / Axum) Integrates ingestion, AI inference, APIs, DB, and recording / re-streaming Ingest GStreamer / NVDEC tee, 2 branches: record / re-stream & AI decode AI Inference Engine (DLL) GPU / TensorRT 10.8 ONNX Runtime detect → classify → track → rules API Services REST (HTTP) WebSocket (events) TCP (file transfer) PostgreSQL Database users, cameras, AI pipelines, incident logs, notifications, audit (:5432) Recording Storage / RTSP Server recording segments (60-second) RTSP re-streaming (:8554 / :8555) HTTPS (7878) / WSS (7575) TCP (7979) / RTSP (8554, 8555) MirAI Client (Qt / C++) live grid view, AI pipeline editing notifications and incident logs multi-language (Japanese / English / Korean)
Figure 1. MirAI-V2 system architecture block diagram (data/control flow; protocols and ports labeled on arrows)
Video from each camera is split into two branches by a GStreamer tee. The "passthrough branch" (for recording and RTSP re-streaming) handles already-encoded packets to minimize CPU load. The "decode branch" (for AI inference) decodes via NVDEC (GPU hardware decoder), throttles the frame rate (default cap of 7–10 fps), and feeds frames to the AI engine.

3. Components

The role and technology of each component are listed below.

ComponentRoleTechnology
MirAI Server (API)Provides the REST API (all CRUD), WebSocket (event/status sync), and TCP (video/image transfer); handles authentication/authorization and camera lifecycle management.Rust / Axum, JWT + RBAC, TLS
AI inference engineRuns detection (YOLO/SSD family) → classification (fire, fall-down, etc.) → tracking (ByteTrack) → rule evaluation on the GPU to generate events. Runs as an in-process DLL within the server.C++ / CUDA, TensorRT 10.8, ONNX Runtime
Video ingestion (Ingest)Connects to cameras via RTSP to pull video; uses a tee to split into recording/re-stream and AI-inference branches, performing GPU decode and frame extraction.GStreamer, NVDEC (nvh264dec / nvh265dec)
DatabasePersists users, cameras, AI pipelines, incident logs, notification rules, audit trail, sessions, and system settings.PostgreSQL (pool size 10, port 5432)
RTSP re-streaming serverRe-delivers ingested video to clients, providing a main (high-resolution) stream and a sub (low-resolution, for grid view) stream.RTSP server (main 8554 / sub 8555)
Recording storageStores continuous / event recording files (split into 60-second segments), with built-in minimum-free-disk protection.Dedicated recording drive (HDD / NVMe), ring buffer
MirAI ClientLive grid viewing, node-based AI pipeline editing, review of notifications and incident logs, multi-language UI (Japanese / English / Korean).Qt 6 / C++ (dark theme)
IP camerasVideo sources. Existing standard-compliant camera assets can be reused.ONVIF (auto-discovery) / RTSP (manual URL)
GPU (inference accelerator)Hardware performing AI inference and video decode; determines the number of cameras that can be processed concurrently.NVIDIA GPU, CUDA 12.8, TensorRT 10.8
On first startup, the AI engine builds GPU-optimized models (TensorRT engines) for your hardware. This takes approximately 15–30 minutes and occurs only once.

4. Network Configuration

MirAI-V2 uses the following ports and protocols for client/server communication and for communication with cameras. When the client and server are on different machines, allow inbound traffic for the relevant ports in the server-side firewall.

4.1 Service Ports (Client ↔ Server)

PurposePortProtocolAuth
REST API (all CRUD operations)7878HTTPS / TCPBearer JWT
Real-time sync (events, status changes)7575WSS (WebSocket over TLS) / TCPToken in the first message
Binary file transfer (video clips, images)7979TCPToken header
RTSP re-streaming (main)8554RTSP / TCP
RTSP re-streaming (sub: low-resolution)8555RTSP / TCP
Live video streaming7676UDP

4.2 Server ↔ Cameras / Database

PurposePortProtocolNotes
Camera video ingestion (RTSP)554RTSP / TCPStandard camera port (may vary by model)
Camera discovery / control (ONVIF)80 / 443HTTP / HTTPSONVIF device discovery and configuration
Database connection5432TCP (PostgreSQL)Local (localhost) connection recommended
About TLS (encryption): MirAI-V2 enables TLS (TLS 1.3) by default and communicates over HTTPS (7878) and WSS (7575). On first startup, the server automatically generates a self-signed certificate (Trust On First Use). Replacement with a certificate issued by the customer's Certificate Authority (CA) is also supported. Passwords are encrypted with AES-128-CBC in transit and hashed with bcrypt / Argon2id at rest.

5. Physical Deployment

MirAI-V2 supports both a single-server configuration for smaller deployments and a multi-GPU configuration for high-density operation. Adding GPUs scales camera capacity horizontally.

5.1 Single-Server Configuration (Standard)

In the standard configuration, a single GPU-equipped Windows server runs the launcher (Windows service), the MirAI Server (API, AI engine, ingestion), PostgreSQL, and the recording storage all on one machine. Clients connect from the same machine or from separate machines over LAN / VPN.

CategoryConfiguration
Server machineOne GPU-equipped Windows server (workstation)
Running processesLauncher (Windows service) / MirAI Server (API, AI engine DLL, GStreamer ingestion) / PostgreSQL
ClientsOne or more client machines (same machine, or over LAN / VPN)
GPU1× NVIDIA GPU

5.2 Multi-GPU Configuration (High-Density Operation)

For high-density operation with concurrent AI inference across many cameras, multi-GPU configurations (2 CPU / 4–8 GPU) are supported. One inference engine instance is assigned per GPU, and cameras are distributed across GPUs (via a frame router) to balance the load. Guidance on camera capacity by GPU memory (VRAM) is shown below.

GPU Memory (VRAM)Recommended cameras (per GPU)Inference batch size (guide)
8 GB8 – 1216
12 GB12 – 2024
16 GB16 – 2430
24 GB24 – 3232
Server configurationGPUsCamera capacity (guide)
Single-GPU configuration1Per the table above, based on GPU VRAM
4-GPU configuration4~64 cameras (16 per GPU)
8-GPU configuration8~128 cameras (16 per GPU)
The tables above are design guidance; actual capacity varies with resolution, analyzed FPS, the AI features enabled, and camera frame rates. In a multi-GPU configuration, if any GPU fails, the affected cameras are reassigned to healthy GPUs and the system continues operating in a degraded mode. The final configuration and capacity are confirmed through a pre-deployment proof of concept (PoC) and sizing.
Specific server specification (CPU, GPU model/count, storage capacity): [To be finalized per sizing results]

6. Data Flow

The end-to-end flow — from camera ingestion to display and notification on the client — is shown in numbered steps.

  1. Video ingestion (camera → server): The server connects to each camera via RTSP (port 554) and pulls video using GStreamer. A tee splits the stream into a "recording / re-stream branch (passthrough)" and an "AI inference branch (decode)".
  2. Decode and frame extraction (ingest → AI): On the AI branch, NVDEC (GPU hardware decoder) decodes the video, the frame rate is throttled for analysis (default cap of 7–10 fps), and frames are extracted.
  3. AI inference (AI engine): Extracted frames are batch-processed on the GPU (TensorRT / ONNX Runtime), running detection → classification → tracking. In multi-GPU configurations, a frame router distributes cameras across GPUs.
  4. Event evaluation (rules): Detection results are evaluated against the defined event rules and false-positive thresholds (e.g., fire 0.45 / smoke 0.40 / fall-down 0.80); an incident is generated when conditions are met.
  5. Recording and persistence (passthrough branch → storage / DB): The encoded video on the passthrough branch is stored as recording files (60-second segments), and incident information (thumbnail, video path, type, confidence, timestamp) is written to PostgreSQL.
  6. Notification and delivery (server → client): After applying notification rules (rate limiting, quiet hours, etc.), incidents are dispatched to the configured channels (Email / Webhook / SNMP / HTTP) and broadcast to connected clients over WebSocket (7575).
  7. Client display (client): The client shows incidents received over WebSocket in the notification panel, fetches thumbnails and video clips over HTTP (7878) / TCP (7979), and displays live video via RTSP re-streaming (8554 / 8555).

7. Notes and Assumptions