End-to-End Coordination of RAN and Edge Server for Latency-Critical Inference Serving over Cellular Networks Journal Article uri icon

Overview

abstract

  • The growing adoption of deep neural network (DNN) inference in mobile applications underscores the need for edge-assisted inference to meet the latency requirements of diverse DNN tasks, given the constrained device capabilities. However, widespread deployment remains hindered by unreliable latency due to the shared nature of cellular networks, where concurrent workloads contend across uplink, computation, and downlink stages. Although several approaches have been proposed, they primarily target specific tasks (e.g., video analytics) and lack support for diverse DNN tasks due to their uplink-centric designs.; This paper presents CORA, a system that serves latency-critical DNN inference requests through end-to-end coordination between the RAN and the edge server. CORA dynamically adjusts the latency budgets across stages based on each DNN task's characteristics, balancing resource demands for the radio and compute domains to mitigate contention at each stage. It then aligns the resource schedulers with these per-stage budgets, thereby enabling end-to-end coordination without requiring modifications to end hosts. We prototype and evaluate CORA on an over-the-air testbed with diverse DNN tasks. CORA serves 3.2× more requests within the latency target and reduces the 95th percentile latency by 2.1× compared to baselines.

publication date

  • November 24, 2025

Date in CU Experts

  • November 27, 2025 12:17 PM

Full Author List

  • Jin S; Kim S; Ha S; Lee K

author count

  • 4

Other Profiles

Electronic International Standard Serial Number (EISSN)

  • 2834-5509

Additional Document Info

start page

  • 1

end page

  • 23

volume

  • 3

issue

  • CoNEXT4