Numa Calculator – Calculator City

NUMA Calculator: Analyze Memory Access Latency

Model the performance impact of Non-Uniform Memory Access architecture on your system.

Local Memory Access Latency (ns)

Time to access memory on the same NUMA node as the CPU. Typically 50-100 ns.

Please enter a valid, positive number.

Remote Memory Access Latency (ns)

Time to access memory on a different NUMA node. Typically 1.5x-3x local latency.

Please enter a valid, positive number. Must be >= local latency.

Local Memory Access Percentage (%)

The percentage of memory requests that are satisfied by local memory.

Please enter a value between 0 and 100.

Average Memory Access Latency

62.00 ns

NUMA Ratio

1.67x

Remote Access Penalty

+40.00 ns

Performance vs 100% Local

-3.33%

Formula: Average Latency = (Local Latency × % Local) + (Remote Latency × % Remote)

Performance Visualizations

Comparison of memory access latencies. Lower is better.

Local Access %	Average Latency (ns)	Performance Impact

Impact of varying local memory access percentages on overall system latency.

What is NUMA and Why Does It Matter?

Non-Uniform Memory Access (NUMA) is a computer memory architecture used in multi-socket and multi-core systems. In a NUMA system, a processor can access its own local memory much faster than it can access remote memory (memory local to another processor). This time difference, however small, can have a significant impact on application performance, especially for memory-intensive workloads common in databases, scientific computing, and virtualization.

Understanding this architecture is crucial for system administrators, performance engineers, and developers. A misconfigured system or a “NUMA-unaware” application can suffer from unnecessary latency as its threads are forced to fetch data across the slower interconnects between CPUs. This is where a NUMA calculator becomes an indispensable tool. By modeling the effects of local versus remote memory hits, you can predict performance, justify hardware configurations, and understand the benefits of code optimization for memory locality. The term “non-uniform” directly refers to the fact that memory access times depend on the location of the memory relative to the processor.

Who Should Use a NUMA Calculator?

This NUMA calculator is designed for:

System Architects: To model and compare the performance of different server hardware before purchase.
Database Administrators: To understand the performance implications of CPU and memory allocation for their database instances.
Software Developers: To quantify the performance gains from optimizing code for data locality.
Virtualization Engineers: To properly size and configure virtual machines (vNUMA) to align with the physical hardware’s NUMA topology, avoiding performance penalties.

NUMA Calculator Formula and Mathematical Explanation

The core of this NUMA calculator is a weighted average formula that determines the effective memory access latency based on your system’s characteristics. The calculation is straightforward but powerful in its implications.

Step-by-Step Calculation

Determine Remote Access Percentage: This is simply the inverse of the local access percentage.

% Remote = 100% – % Local
Calculate Weighted Local Latency: The portion of total latency contributed by fast, local memory accesses.

Weighted Local = Local Latency × (% Local / 100)
Calculate Weighted Remote Latency: The portion of total latency contributed by slower, remote memory accesses.

Weighted Remote = Remote Latency × (% Remote / 100)
Sum for Average Latency: The final result is the sum of the weighted local and remote latencies.

Average Memory Access Latency = Weighted Local + Weighted Remote

Variables Table

Variable	Meaning	Unit	Typical Range
Local Latency	Time to access memory in the same NUMA node.	nanoseconds (ns)	50 – 120 ns
Remote Latency	Time to access memory in a different NUMA node.	nanoseconds (ns)	80 – 300 ns
Local Access %	Percentage of memory hits served by local memory.	Percent (%)	50% – 100%
NUMA Ratio	Remote Latency / Local Latency. A key measure of NUMA “cost”.	Multiplier (x)	1.2x – 3.0x

Practical Examples (Real-World Use Cases)

Example 1: Well-Optimized Database Workload

A database administrator is running a large in-memory database on a 2-socket server. The application has been optimized and pinned correctly, ensuring high data locality.

Inputs:
- Local Memory Latency: 70 ns
- Remote Memory Latency: 120 ns
- Local Access Percentage: 98%
NUMA Calculator Results:
- Average Latency: 71.0 ns
- NUMA Ratio: 1.71x
- Performance Impact: -1.43% (compared to a theoretical 100% local access)
Interpretation: The performance penalty is minimal. The high local access percentage means the system is performing close to its theoretical best. The NUMA calculator validates that the optimization efforts were successful.

Example 2: Un-Optimized Virtualization Host

An engineer is running multiple high-performance virtual machines on a 4-socket server. The VMs were not configured with vNUMA, and processes are frequently being scheduled on cores far from their allocated memory.

Inputs:
- Local Memory Latency: 85 ns
- Remote Memory Latency: 200 ns
- Local Access Percentage: 65%
NUMA Calculator Results:
- Average Latency: 125.25 ns
- NUMA Ratio: 2.35x
- Performance Impact: -47.35% (compared to 100% local access)
Interpretation: The performance is severely degraded. The low local access rate and high NUMA ratio result in a nearly 50% increase in average memory latency. This data from the NUMA calculator provides a strong justification for re-configuring the VMs with proper NUMA topology awareness.

How to Use This NUMA Calculator

This tool is designed for real-time analysis. Follow these steps to model your system’s performance.

Enter Local Latency: Input the access time for memory on the same CPU socket. You can find this value in your server’s technical documentation or by using system profiling tools like Intel VTune.
Enter Remote Latency: Input the access time for memory on a different CPU socket. This is often the most critical factor.
Enter Local Access Percentage: Estimate the percentage of time your application successfully finds data in its local memory node. This is a measure of your application’s “NUMA-awareness.” A higher percentage is better.
Analyze the Results: The calculator instantly updates.
- The Average Memory Access Latency is your primary result. This is the effective latency your application experiences.
- The NUMA Ratio shows how much more expensive a remote access is compared to a local one. A higher ratio means a greater penalty for remote access.
- The Performance Impact quantifies the slowdown compared to a perfect scenario where all access is local. This highlights the cost of remote lookups.
Consult the Table and Chart: Use the dynamic table to see how performance changes as locality improves. The chart provides a quick visual reference for the latency difference. This comprehensive approach is key to using a NUMA calculator effectively.

Key Factors That Affect NUMA Performance

Several factors influence the real-world performance of a NUMA system. Optimizing these is key to minimizing latency.

CPU and Memory Affinity: This is the most critical factor. Ensuring a process or thread runs on a CPU core within the same NUMA node as its primary memory is paramount. Operating systems provide tools (e.g., `numactl` in Linux) to pin processes to specific nodes. Using a System Topology Guide can help visualize your server’s layout.
Application Workload: Applications with independent, parallelizable tasks that operate on distinct data sets are ideal for NUMA. Conversely, applications with a single, massive, randomly accessed data set will struggle and see significant performance loss, which a NUMA calculator can help predict.
Interconnect Speed (QPI/UPI): The speed of the bus connecting the CPU sockets directly dictates the remote memory access latency. Faster interconnects (like Intel’s Ultra Path Interconnect) reduce the NUMA penalty. This value is a key input for any accurate NUMA calculator.
Memory Interleaving: Some BIOS settings allow for memory interleaving, which spreads memory addresses across all NUMA nodes. This creates a more “uniform” (but uniformly slow) access pattern. It can sometimes be a fallback for NUMA-unaware applications but generally prevents the high performance possible with true locality.
Operating System Scheduler: Modern operating systems are NUMA-aware. Their schedulers attempt to keep processes on their “home” node to avoid remote access. However, under heavy load, processes might be migrated, leading to performance degradation. Using a CPU Affinity Checker can help monitor this.
Virtualization Layer (vNUMA): When using hypervisors like VMware or Hyper-V, the virtual machine’s presented CPU and memory (vNUMA) must align with the physical hardware’s NUMA boundaries. A mismatch is a common and severe source of performance issues. This is a critical area where a NUMA calculator helps in planning VM sizes.

Frequently Asked Questions (FAQ)

1. What is a “good” NUMA ratio?

A lower ratio is always better. Ratios between 1.2x and 1.7x are common in modern servers. A ratio above 2.0x indicates a significant performance penalty for remote access, making data locality even more critical.

2. How can I find my system’s local and remote latency?

You can use specialized tools like Intel’s Memory Latency Checker (MLC) or performance monitoring utilities like `perf` in Linux. Sometimes these values are also published in technical reviews of server hardware.

3. Does NUMA affect single-socket systems?

No. By definition, NUMA is an architecture for multi-processor systems. A single-socket system has a Uniform Memory Access (UMA) architecture, where latency is the same to all memory locations.

4. Can I ignore NUMA for my application?

You can, but it might cost you significant performance. Modern OS schedulers do a decent job, but for performance-critical applications, explicitly managing memory and process placement is crucial. Using this NUMA calculator can show you just how much performance you might be leaving on the table. For further reading, an Linux Performance Tuning guide is a great resource.

5. What does “node interleaving” in the BIOS do?

It effectively disables NUMA at the hardware level, presenting the OS with a single, large memory region. This averages out the latency, making all access slightly slower than local but faster than remote. It can be a “safe” option for legacy systems but prevents the peak performance achievable with NUMA-aware tuning.

6. Is a higher local access percentage always better?

Yes. A higher percentage means your application is efficiently using the fast, local memory. The goal of NUMA optimization is to get this number as close to 100% as possible for critical code paths. A Memory Bandwidth Tool can help diagnose bottlenecks related to access patterns.

7. How does this NUMA calculator help with database performance?

Databases are often bound by memory latency. By modeling different scenarios, a DBA can make informed decisions about CPU pinning, memory allocation, and even schema design to promote data locality. For example, it can help justify why running a specific query workload benefits from a dedicated NUMA node. Check out our guide on Database Optimization for more.

8. What is the difference between Inter-Socket and Inter-Core latency?

This calculator focuses on Inter-Socket latency (access between different CPU packages). There is also a smaller latency difference when cores on the same CPU package access shared caches, which is a more granular level of performance tuning. For details, see our article on Inter-Core Latency.

Related Tools and Internal Resources

CPU Affinity Checker – A tool to verify if your processes are running on the optimal NUMA node.
Memory Bandwidth Tool – Measure the actual memory bandwidth on your system to identify hardware limits.
System Topology Guide – A comprehensive guide to understanding and visualizing your server’s hardware layout.
Advanced Linux Performance Tuning – An article covering advanced techniques for optimizing applications on Linux, including NUMA considerations.
Understanding Inter-Core Latency – A deep dive into the performance characteristics within a single CPU package.
NUMA and Database Optimization – A guide focused specifically on configuring databases like PostgreSQL and SQL Server for NUMA architectures.