forked from pmem/linux-examples
-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathLINUX_PMEM_API.txt
222 lines (175 loc) · 7.59 KB
/
LINUX_PMEM_API.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
This is version 0.8 of LINUX_PMEM_API.txt.
NAME
Linux Persistent Memory Application Programming Interface
SYNOPSIS
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <pmem.h>
cc ... -lpmem
/* open a Persistent Memory file */
fd = open("/path/to/pm-aware-filesystem/file", O_RDWR);
/* map it */
base = mmap(NULL, len, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
/* optional, instead of mmap line above: */
base = pmem_map(fd, len);
/* ... use loads and stores to access PM at *base ... */
/* make changes persistent */
msync(addr, nbytes, MS_SYNC);
/* optional, instead of mmap line above: */
pmem_persist(addr, nbytes, 0);
/* other interfaces described in this document... */
pmem_flush_cache(addr, len, flags);
pmem_fence();
pmem_drain_pm_stores();
/* and many other POSIX file APIs... */
NOTE: If libpmem has not been installed on this system, you'll
want to arrange for "pmem.h" to be included via a non-system
path, and you'll want to arrange to either find libpmem to be
using LD_LIBRARY_PATH or use the static version, libpmem.a.
DESCRIPTION
This document describes the Linux API for accessing Persistent
Memory (PM). The Linux Persistent Memory API follows the
"NVM.PM.FILE" programming model as described in the NVM Programming
Model Specification published by the SNIA NVM Programming Technical
Work Group (except as noted in the TODO section below). See
http://www.snia.org/ for more information.
The prerequisites to using this API are to install some Persistent
Memory in your system and then expose that Persistent Memory using
the appropriate device driver and a PM-aware file system such as PMFS
(for more information on PMFS, see https://github.com/linux-pmfs).
Of course, it is possible to develop code using these APIs without
having actual Persistent Memory hardware available, for example by
using volatile DRAM as "pretend" Persistent Memory. All the APIs
below will work, but things like power failure testing will not be
available during development.
POSIX file APIs
The PM-aware file system exposes Persistent Memory as
files, so POSIX file APIs work as expected. This includes
the obvious basic operations, such as:
open(2)
close(2)
read(2)
write(2)
fsync(2)
mmap(2)
msync(2)
and typically includes the most common operations for
managing files, such as:
chmod(2)
chown(2)
closedir(3)
creat(2)
lseek(2)
madvise(2)
mkdir(2)
mprotect(2)
opendir(3)
readdir(3)
readlink(2)
rename(2)
rmdir(2)
symlink(2)
utimes(2)
... and many more ...
In fact, most C library calls designed to operate on files
will work with Persistent Memory just like they work with
other file systems. Like any file system, a PM-aware file
system may or may not implement all possible interfaces,
returning "Not supported" for some of them.
Another aspect of a PM-aware file system is that Persistent
Memory files can be managed using common command-line
utilities like cp, mv, ls, and tar.
int open(const char *pathname, int flags, mode_t mode);
To begin accessing Persistent Memory, open(2) is used
just as it would be for any other file. The open mode
specifies whether the file is opened read-only (O_RDONLY),
for read/write (O_RDWR), etc. See the open(2) man page
for details.
void *mmap(void *addr, size_t length, int prot, int flags,
int fd, off_t offset);
The next step to using Persistent Memory is to map it
into the address space of the process using mmap(2).
For Persistent Memory, the flag MAP_SHARED is most
commonly used, providing the application with direct
access to the Persistent Memory. For traditional files,
using mmap this way will "demand-page" data in and out
of the system page cache as it is accessed. This results
in unpredictable latencies, much longer when page faults
happen to occur. But with Persistent Memory, the page cache
is avoided, providing direct, uniform access to the memory
itself.
The flag MAP_PRIVATE may be used, in which case reads will
come directly from Persistent Memory until a page is written
for the first time. At that point, a copy-on-write will
happen and the modified page will be located in volatile DRAM.
MAP_PRIVATE pages are never written back to Persistent Memory
and are discarded when the process calls munmap(2) or it
exits.
See the man page mmap(2) for details on this interface.
void *pmem_map(int fd, size_t len);
This function is just a convenience function that calls
mmap(2) with the appropriate arguments. It is entirely
programmer preference on whether you use this function or
call mmap directly. Returns NULL with errno set on failure.
int msync(void *addr, size_t length, int flags);
Force any changes in the range [addr, addr+len) to be stored
durably in Persistent Memory. POSIX specifies (and Linux
enforces) that addr must be page-aligned, which may force
the call to flush more data than required. See pmem_persist()
below for an alternate solution. Note that the traditional
definition of msync is to search the "page cache" for dirty
pages. But for Persistent Memory, the page cache is not
involved and msync's effect is more related to flushing dirty
data from the processor's cache and other HW buffers to make
sure it is fully persistent.
WARNING: The semantics of msync(2) have always been to
check for unwritten changes and write them out.
Nothing prevents changes from being written out
before msync is called and no atomic or transactional
behavior is implied by the msync call. It is a
"flush" operation to catch anything that hasn't
been flushed already.
See the man page msync(2), and the related pages mprotect(2)
and madvise(2) for more information.
void pmem_persist(void *addr, size_t len, int flags);
Force any changes in the range [addr, addr+len) to be stored
durably in Persistent Memory. This is equivalent to calling
msync(2) as described above, but may be more optimal and will
avoid calling into the kernel if possible.
No flags have been defined for this call yet.
WARNING: Like msync(2) described above, there is nothing
atomic or transactional about this call. Any
unwritten stores in the given range will be written,
but some stores may have already been written by
virtue of normal cache eviction/replacement policies.
Correctly written code must not depend on stores
waiting until pmem_persist() is called to become
persistent -- they can become persistent at any time
before pmem_persist() is called.
void pmem_flush_cache(void *addr, size_t len, int flags);
void pmem_fence(void);
void pmem_drain_pm_stores(void);
These three functions provide partial versions of the
pmem_persist() function described above. pmem_persist()
can be thought of as this:
void pmem_persist(void *addr, size_t len, int flags)
{
/* flush the processor caches */
pmem_flush_cache(addr, len, flags);
/* Persistent Memory store barrier */
pmem_fence();
/* wait for any PM stores to drain from HW buffers */
pmem_drain_pm_stores();
}
These functions allow advanced programs to create their
own variations of pmem_persist(). For example, a program
that needs to flush several discontiguous ranges can call
pmem_flush_cache() for each range and then follow up by
calling the pmem_fence() and pmem_drain_pm_stores() once.
SEE ALSO
open(2), mmap(2), msync(2), mprotect(2), madvise(2)
TODO
- Add support for the get/set attributes actions defined by the SNIA
NVM.PM.FILE programming model.