cpu cache - Write back or write through to main memory

Sunday, March 3, 2019

cpu cache - Write back or write through to main memory

write-through : data is written to the main memory through the cache immediately

write-back : data is written in a latter time .

I have a shared memory , which is located in NUMA node 1 , suppose that Process A executed in Node 0 which modify the contents of shared memory , then the process B executed in Node 1 which like to read the contents of shared meory .

if it is in write-through mode , then the contents modified by process A will be in main memory in Node 1 , since while Node 0 write data to Node 1 main memory will go through L3 cache of Node 1 , then Process B can get the contents which modified by process A from L3 cache of Node 1 , not from main memory of Node 1.

if it is write-back mode , then while process B in Node 1 like to read the contents of shared memory , the cache line will be in L3 cache of Node 0 ,
get it will cost more since it is in Node 0 cache .

I like to know in Intel(R) Xeon(R) CPU E5-2643 , which mode it will choose ?!
or Xeon will decide which mode it will use on its own and nothing programmer can do ?!

Edit :

dmidecode -t cache

showes Xeon cache operational mode is write back ,look reasonable , refering to

http://www.cs.cornell.edu/courses/cs3410/2013sp/lecture/18-caches3-w.pdf

Answer

Cache coherency on Intel (and AMD) x86-64 NUMA architectures does not work like that of a RAID array... Instead of having a single write-through or write-back cache, the two or four processor packages have a snooping & transfer protocol for synchronizing and sharing their L3 caches. OS level support for controlling such things is generally very rough, even though though NUMA has been mainstream for about ten years now.

Speaking specifically about Linux, control over the cache settings really boil down to a handful of process-level settings:

What core(s) your code is allowed to run on.

Whether your process is allowed to allocate non-local node memory.

Whether your process interleaves all of its allocations between Numa nodes.

By default, the Linux kernel will allocate process memory from the NUMA node the process is actively running on, falling back to allocations on the other node if there's memory pressure on the local node.

You can control the pushing of data in and out of the L3 cache of the local node using x86 assembly primitives like LOCK, but in general you really, really, really should not care about anything more than your process running locally with its allocated memory.

For more information on this, I'd encourage you to read some of the Linux documentation on NUMA, and possibly also Intel's (QPI is the name of the cache-sharing technology).

A good start for you would probably be the Linux 'numactl' manpage (https://linux.die.net/man/8/numactl)

Blog

Sunday, March 3, 2019