Host PCI Optimization Tuning#
This document describes a set of host-level tuning steps commonly used to improve PCIe/DMA/P2P performance and reduce latency variance on Linux servers.
1. Hugepage Configuration#
Allocate 64K (65536) of 2MB hugepages.
Hugepages (i.e., “HugeTLB” pages) reduce TLB pressure and page-table walk overhead by using larger page sizes. For workloads that frequently touch large memory regions (e.g., DMA buffers, packet buffers, large pinned allocations), hugepages can:
Reduce CPU overhead from address translation (fewer TLB misses)
Improve latency stability by reducing page management overhead
Reduce fragmentation issues for large contiguous allocations (when allocated early)
One-time setting#
To apply immediately (effective until reboot):
echo 65536 | sudo tee /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
Notes#
This adjusts the number of 2MB HugeTLB pages.
The allocation can fail (partially or fully) if memory is fragmented; applying early after boot usually works best.
Persistent setting (every boot)#
Create a sysctl rule so it is applied at boot:
sudo tee /etc/sysctl.d/90-hugepages.conf <<EOF
vm.nr_hugepages = 65536
EOF
Then apply it immediately without reboot:
sudo sysctl --system
2. Disable PCI ACS (Access Control Services)#
When to use this#
On some systems(e.g., NXT RNGD Server), devices are connected behind a PCIe switch. If peer-to-peer (P2P) data paths are expected, PCI ACS enabled on the upstream switch port can prevent or penalize P2P traffic.
Example scenario:
A device is located under a PCIe switch, and P2P throughput/latency matters.
Why ACS can impact P2P performance#
ACS provides isolation and routing control features (e.g., request/completion redirection, upstream forwarding control) that improve security/isolation and topology correctness in complex systems. However, when ACS enforces redirection, it can force transactions to be routed upstream (toward the root complex) rather than staying within the switch fabric. This can:
Increase hop count / latency
Reduce effective bandwidth
Add contention on upstream links
Prevent the most direct P2P path between endpoints under the same switch
Disabling ACS can therefore improve P2P performance by allowing more direct routing within the switch.
Warning
Disabling ACS reduces isolation between endpoints and may not be acceptable in multi-tenant / strict security environments. Apply only when your platform and use-case tolerate reduced PCIe isolation.
Supported Server and PCIe Switch Combinations#
The following table documents officially supported server configurations where ACS disable is validated and supported.
Server Platform |
PCIe Switch |
Vendor / Device ID |
ACS Control Offset |
|---|---|---|---|
NXT RNGD Server (Supermicro) |
Broadcom / LSI PEX890xx PCIe Gen 5 Switch (rev b0) |
|
|
For the configuration above, ACS is disabled on the PCIe switch downstream port connected to the RNGD device in order to allow optimal PCIe P2P traffic within the switch fabric.
Run the following command:
sudo setpci -s ${PARENT_BDF} ${ACS_OFFSET}.W=0x0
${PARENT_BDF}is the BDF of the PCIe switch port directly connected to the RNGD device (e.g.,0000:02:03.0).
lspci -D -d 1ed2: -PP | head -n 1
0000:00:01.1/01:00.0/02:03.0/06:00.0 Processing accelerators: FuriosaAI, Inc. Device 0001 (rev 01)
lspci -D -s 0000:02:03.0 -nn
0000:02:03.0 PCI bridge [0604]: Broadcom / LSI PEX890xx PCIe Gen 5 Switch [1000:c030] (rev b0)
Verify the result#
Run:
lspci -vv -s ${PARENT_BDF}
After disabling, you should observe flags similar to:
Capabilities: [170 v1] Access Control Services
ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans+
ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-
This indicates that the corresponding ACS control bits are cleared.
Auto-apply with a udev rule#
You can use a udev rule to apply the ACS disable automatically when the device (port) is enumerated. The following snippet provides a template. Adjust the matching keys (vendor/device, BDF, or driver) for your environment.
Create a rules file such as /etc/udev/rules.d/99-furiosa.rules:
# Disable ACS on a specific upstream PCIe port to improve P2P performance.
SUBSYSTEM=="rngd_mgmt", ATTR{device_type}=="RNGD", ACTION=="add", ENV{BUSNAME}="$attr{busname}", RUN+="/usr/local/sbin/furiosa-acs-clear"
Create a script file such as /usr/local/sbin/furiosa-acs-clear:
#!/usr/bin/env bash
set -euo pipefail
if [[ -z "${BUSNAME:-}" ]]; then
echo "Error: BUSNAME is not set" >&2
exit 1
fi
DEVICE_ROOT="/sys/bus/pci/devices"
DEVICE_PATH="${DEVICE_ROOT}/${BUSNAME}"
if [[ ! -e "$DEVICE_PATH" ]]; then
echo "Error: PCI device not found: ${DEVICE_PATH}" >&2
exit 2
fi
READLINK=$(readlink ${DEVICE_PATH} 2>/dev/null)
if [ -z "$READLINK" ]; then
echo "Error: Unable to read link for ${DEVICE_PATH}"
exit 3
fi
PARENT_BDF=$(basename $(dirname $READLINK))
PARENT_PATH="${DEVICE_ROOT}/${PARENT_BDF}"
if [[ ! -e "${PARENT_PATH}" ]]; then
echo "Error: Parent device not found: ${PARENT_PATH}" >&2
exit 4
fi
if [[ ! -r "${PARENT_PATH}/vendor" || ! -r "${PARENT_PATH}/device" ]]; then
echo "Error: Cannot read vendor/device IDs for ${PARENT_BDF}" >&2
exit 5
fi
VENDOR_ID=$(cat ${PARENT_PATH}/vendor)
DEVICE_ID=$(cat ${PARENT_PATH}/device)
if [[ "${VENDOR_ID,,}" != "0x1000" ]]; then
echo "Error: Parent device ${PARENT_BDF} vendor_id is '${VENDOR_ID}', expected '0x1000'" >&2
exit 6
fi
if [[ "${DEVICE_ID,,}" != "0xc030" ]]; then
echo "Error: Parent device ${PARENT_BDF} device_id is '${DEVICE_ID}', expected '0xc030'" >&2
exit 6
fi
ACS_OFFSET="0x176"
CURRENT_ACS=$(setpci -s ${PARENT_BDF} ${ACS_OFFSET}.W)
if [[ -z "${CURRENT_ACS}" ]]; then
echo "Error: Failed to read ACS register on ${PARENT_BDF}" >&2
exit 7
fi
if [[ "${CURRENT_ACS}" == "0000" ]]; then
echo "already set on ${PARENT_BDF}"
else
setpci -s ${PARENT_BDF} ${ACS_OFFSET}.W=0x0
echo "ACS clear for ${PARENT_BDF} completed"
fi
3. tuned-adm: latency-performance Profile#
Overview#
The tuned daemon provides predefined system tuning profiles. The
latency-performance profile generally targets lower latency and reduced
jitter by adjusting CPU governor, power management, kernel scheduler-related
settings, and other system knobs.
Install tuned#
sudo apt update
sudo apt install -y tuned
sudo systemctl enable --now tuned
Set the profile to latency-performance#
sudo tuned-adm profile latency-performance
Confirm active profile:
sudo tuned-adm active
Expected effects#
Common outcomes when using latency-performance include:
Lower latency variance (reduced jitter) under load
More consistent CPU frequency behavior (often favoring performance over power saving)
Reduced impact from aggressive power management states
Better tail latency for latency-sensitive PCIe/DMA-driven workloads
Note
The exact changes depend on distribution and tuned version. Review the tuned
profile contents under /usr/lib/tuned/latency-performance/ if you need a
precise list of applied knobs.