| 版数 / Rev. | 発行日 / Date | 改訂内容 / Description | 承認 / Approved |
|---|---|---|---|
| 1.0 | 2026-06-17 | Initial release | — |
MirAI-V2 is an on-premise AI Video Management System (VMS) provided by MarkAny Co., Ltd. It ingests video from IP cameras (ONVIF / RTSP), uses an AI inference engine running on an NVIDIA GPU to detect events such as object detection, fire/smoke, fall-down, and intrusion in real time, and provides recording, re-streaming, notification, and monitoring through a dedicated client. All processing — including AI inference — is completed within the on-premise server; video data is never sent to any external cloud.
The system is organized into three principal layers (subsystems):
The major subsystems are as follows.
| Subsystem | Role |
|---|---|
| Video ingestion | Pulls video from cameras via RTSP, then splits it with a GStreamer tee-based pipeline into a "recording / re-stream branch (passthrough)" and an "AI inference branch (decode)". |
| AI inference engine | Runs detection, classification, and tracking on the GPU (TensorRT 10.8 / ONNX Runtime) and generates events based on defined rules. |
| API services | Provides REST (CRUD operations), WebSocket (real-time sync of events and status), and TCP (binary transfer of video and images). |
| Database | Stores persistent data — users, camera settings, AI pipelines, incident logs, notification rules, audit trail, etc. — in PostgreSQL. |
| Recording / re-streaming | Generates recording files (split into segments) and re-delivers video to clients through the RTSP re-streaming server. |
The block diagram below shows the system components and the data/control flow between cameras, server, and client. The protocol and port are labeled on each arrow.
The role and technology of each component are listed below.
| Component | Role | Technology |
|---|---|---|
| MirAI Server (API) | Provides the REST API (all CRUD), WebSocket (event/status sync), and TCP (video/image transfer); handles authentication/authorization and camera lifecycle management. | Rust / Axum, JWT + RBAC, TLS |
| AI inference engine | Runs detection (YOLO/SSD family) → classification (fire, fall-down, etc.) → tracking (ByteTrack) → rule evaluation on the GPU to generate events. Runs as an in-process DLL within the server. | C++ / CUDA, TensorRT 10.8, ONNX Runtime |
| Video ingestion (Ingest) | Connects to cameras via RTSP to pull video; uses a tee to split into recording/re-stream and AI-inference branches, performing GPU decode and frame extraction. | GStreamer, NVDEC (nvh264dec / nvh265dec) |
| Database | Persists users, cameras, AI pipelines, incident logs, notification rules, audit trail, sessions, and system settings. | PostgreSQL (pool size 10, port 5432) |
| RTSP re-streaming server | Re-delivers ingested video to clients, providing a main (high-resolution) stream and a sub (low-resolution, for grid view) stream. | RTSP server (main 8554 / sub 8555) |
| Recording storage | Stores continuous / event recording files (split into 60-second segments), with built-in minimum-free-disk protection. | Dedicated recording drive (HDD / NVMe), ring buffer |
| MirAI Client | Live grid viewing, node-based AI pipeline editing, review of notifications and incident logs, multi-language UI (Japanese / English / Korean). | Qt 6 / C++ (dark theme) |
| IP cameras | Video sources. Existing standard-compliant camera assets can be reused. | ONVIF (auto-discovery) / RTSP (manual URL) |
| GPU (inference accelerator) | Hardware performing AI inference and video decode; determines the number of cameras that can be processed concurrently. | NVIDIA GPU, CUDA 12.8, TensorRT 10.8 |
MirAI-V2 uses the following ports and protocols for client/server communication and for communication with cameras. When the client and server are on different machines, allow inbound traffic for the relevant ports in the server-side firewall.
| Purpose | Port | Protocol | Auth |
|---|---|---|---|
| REST API (all CRUD operations) | 7878 | HTTPS / TCP | Bearer JWT |
| Real-time sync (events, status changes) | 7575 | WSS (WebSocket over TLS) / TCP | Token in the first message |
| Binary file transfer (video clips, images) | 7979 | TCP | Token header |
| RTSP re-streaming (main) | 8554 | RTSP / TCP | — |
| RTSP re-streaming (sub: low-resolution) | 8555 | RTSP / TCP | — |
| Live video streaming | 7676 | UDP | — |
| Purpose | Port | Protocol | Notes |
|---|---|---|---|
| Camera video ingestion (RTSP) | 554 | RTSP / TCP | Standard camera port (may vary by model) |
| Camera discovery / control (ONVIF) | 80 / 443 | HTTP / HTTPS | ONVIF device discovery and configuration |
| Database connection | 5432 | TCP (PostgreSQL) | Local (localhost) connection recommended |
MirAI-V2 supports both a single-server configuration for smaller deployments and a multi-GPU configuration for high-density operation. Adding GPUs scales camera capacity horizontally.
In the standard configuration, a single GPU-equipped Windows server runs the launcher (Windows service), the MirAI Server (API, AI engine, ingestion), PostgreSQL, and the recording storage all on one machine. Clients connect from the same machine or from separate machines over LAN / VPN.
| Category | Configuration |
|---|---|
| Server machine | One GPU-equipped Windows server (workstation) |
| Running processes | Launcher (Windows service) / MirAI Server (API, AI engine DLL, GStreamer ingestion) / PostgreSQL |
| Clients | One or more client machines (same machine, or over LAN / VPN) |
| GPU | 1× NVIDIA GPU |
For high-density operation with concurrent AI inference across many cameras, multi-GPU configurations (2 CPU / 4–8 GPU) are supported. One inference engine instance is assigned per GPU, and cameras are distributed across GPUs (via a frame router) to balance the load. Guidance on camera capacity by GPU memory (VRAM) is shown below.
| GPU Memory (VRAM) | Recommended cameras (per GPU) | Inference batch size (guide) |
|---|---|---|
| 8 GB | 8 – 12 | 16 |
| 12 GB | 12 – 20 | 24 |
| 16 GB | 16 – 24 | 30 |
| 24 GB | 24 – 32 | 32 |
| Server configuration | GPUs | Camera capacity (guide) |
|---|---|---|
| Single-GPU configuration | 1 | Per the table above, based on GPU VRAM |
| 4-GPU configuration | 4 | ~64 cameras (16 per GPU) |
| 8-GPU configuration | 8 | ~128 cameras (16 per GPU) |
The end-to-end flow — from camera ingestion to display and notification on the client — is shown in numbered steps.