source: quanta magazine

Understanding when and how to use Memory Mapped Files

Abhijit Mondal
17 min readFeb 18, 2023

--

There were several instances where I wished that my file I/O gets a bit more faster because there were times when they required to be almost real time.

Sometimes we can get around this problem by using an in-memory cache (hash tables, trees etc.) but sometimes the volume of data is just not right for in-memory processing.

Two of the problems where I found that a faster file I/O would be beneficial rather than using an in-memory cache.

  1. Commit Log replication in distributed systems.
  2. ML model training and inferencing.

One of the popular techniques for improving file based I/O is using memory mapped files.

Understanding Memory Mapped Files

In a standard file I/O operation, when we issue a read(bytes) command, the OS would fetch the bytes from the file in disk, then cache the data in kernel space buffer and then make a copy of the cached data in the user space (application’s address space).

The bytes are fetched as “pages” (usually 4KB).

The information about the “pages” is maintained in a data structure called the “page table”.

Since a file might be arbitrarily large in size, when a request is sent to fetch a page, if the page is not found in page table, “page fault” occurs and the page is fetched from the disk and added to kernel buffer, copied to application buffer and updated in page table.

One of the advantage with this approach is that pages already in the “page table” can be read without going to the disk.

Also the OS usually pre-fetches some additional pages.

Similarly during writes, it is first updated in the application buffer, then copied to kernel buffer and then scheduled to be “flushed” to disk.

Usually, the address space of the kernel buffer and application buffer need not be aligned i.e. it could be that the application buffer occupy bytes 0 to 4095 in the virtual memory and the kernel buffer occupy bytes 4096 to 8192 in virtual memory.

Thus the application can query the next 4KB directly from kernel buffer.

One of the ways we can speed up the process is to avoid copying the actual pages from the kernel buffer to the application buffer. This can be achieved by aligning the kernel and application buffer in the same address space in the virtual memory.

Memory mapping technique is used to achieve this.

Thus if we are updating a page from bytes 4096 to 8192, the same changes are visible to the kernel buffer without copying the changes. The kernel can later schedule the “dirty page” to be flushed to the disk.

Usually the write back to disk is asynchronous, thus when there are lots of writes per second, there is risk of data loss. We can make it synchronous by using “sync”, “fsync” or “msync” commands.

One another disadvantage is that when multiple processes have their own memory maps and we have limited RAM, the OS needs to “move out” some memory maps and then move them back in when required.

Which memory maps are “thrashed” is determined by the OS itself.

Thus for very large files, random reads and writes can cause frequent page faults and as a result the advantage gained by using memory maps might be not so significant.

Thus memory maps are most useful when:

  1. Reads and writes happen at contiguous locations for e.g. append only log files.
  2. Data is mostly accessed sequentially and in small batches. For e.g. for feature store in ML models where we need to train the model per epoch using mini batches.
  3. Can live with some data loss (application logs etc.)

Although with random access, one might see some performance improvement over standard I/O if the random locations accessed are “close” enough.

For e.g. Given multiple profile records in a file, then to answer a query to fetch all profiles living in City A with age ≤ 25, if the records are not grouped by the City, the record indices will be scattered all over the file.

But if they are grouped by City, then the records will be located “very close” because although not all records with City A will have age ≤ 25, but the “distance” between the first and the last of such record can be small enough to fit within few “pages”.

Implementing a simple Logger using mmap in C++

In distributed systems, each node in a replica set needs to be in sync either sooner or later.

In order to persist the chain of commands to a service, the commands are written to a write ahead log file i.e. the command is first written to the file then any changes to in-memory state is made later.

This ensures reliability of the data because if the node crashes, then the in-memory state would be lost but we can recover the state by running the commands sequentially from the WAL file.

But if any node crashes and if the system is allowed to continue even after few nodes are not running (read Quorum), then log files updated in the running nodes are not in sync with the failed nodes.

When the nodes are up and running again, the logs from the healthy nodes needs to synced with the “reborn” nodes. It might happen that the log files in the “reborn” nodes has a lot of catching to do.

Thus we have the following requirements:

  1. The healthy nodes should be able to read and send own logs in a paginated fashion very fast. Since the size of the log entries could be significant, it is convenient to send them in chunks over HTTP/s.
  2. The “reborn” nodes should be able to update its own log with logs from healthy nodes fast.

Let’s declare a C++ class:

class CommitLogMemoryMap {
public:

const char* filename = "commit-log-mmap.txt";
std::vector<long> positions;
int fd;
std::mutex m;
char *mmap_obj;
};

We are maintaining a vector of log entry positions.

The i-th entry corresponds to the last byte position + 1 in the original file for the i-th log entry.

fd is the file descriptor for the original file.

Since, log files are being accessed concurrently by multiple threads, we use locking with a mutex to read and write operations on shared data structures.

Finally mmap_obj is the actual memory mapping for our log file.

In our constructor, we initalize our memory mapping.

CommitLogMemoryMap() {
if (file_exists(filename)) {
// Open file in R+W mode if already exists
fd = open(filename, O_RDWR, 0644);
}
else {
// Create file if it does not exists
fd = open(filename, O_RDWR | O_CREAT, 0644);
}

if (fd == -1) {
handle_error("Error opening file for writing");
}

// For mmap we need to stretch file
if (lseek(fd, MAX_MMAP_LENGTH, SEEK_SET) == -1) {
close(fd);
handle_error("Error getting to end of file");
}

// After stretching write 1 byte at the end
if (write(fd, "", 1) == -1) {
close(fd);
handle_error("Error writing last byte of the file");
}

// Create mmap object
mmap_obj = (char *) mmap(NULL, MAX_MMAP_LENGTH, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

if (mmap_obj == MAP_FAILED)
handle_error("mmap");

close(fd);
}

When creating a memory map with a new file, there is nothing to be mapped if the file is empty. Thus we need a way to first put “dummy” data in the file for the memory mapping to work.

Thus we put ‘\0’ or “” character as dummy data.

Note that in C++ we can add the ‘\0’ character only at the last byte position after “stretching” the file to our desired size, but in Python we have to add ‘\0’ at all byte positions from 0 to max_length.

Here we are defining that the maximum size of one log file is 250MB.

#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)

// Maximum length of mmap'd file 250MB
#define MAX_MMAP_LENGTH 1024*1024*250

// Check if file exists or not
bool file_exists (const char *filename) {
struct stat buffer;
return (stat (filename, &buffer) == 0);
}

Next we look at how we can implement the write function:

int writefile(std::vector<std::string> messages, long start=__LONG_MAX__) {
long start_pos, end_pos;

{
std::lock_guard<std::mutex> l{m};

// Trim start
start = (start < positions.size()) ? start:positions.size();

// Add each new log entry
for (int i = start; i < start+messages.size(); i++) {
std::string message = messages[i-start];

// Get start and end byte positions in the file
start_pos = (i > 0) ? positions[i-1]:0;
end_pos = start_pos + message.length();

// Write the message into mmap'd region
long j = 0;
for (size_t k = start_pos; k < end_pos; k++) {
mmap_obj[k] = message[j++];
}

// Update positions vector with new entries and add entries
// if required
if (i < positions.size()) {
if (i > 0) positions[i] = positions[i-1] + message.length();
else positions[i] = message.length();
}
else {
if (positions.size() > 0) positions.push_back(positions.back() + message.length());
else positions.push_back(message.length());
}
}

// Delete trailing elements of positions vector
if (positions.size() > start+messages.size()) {
positions.resize(start+messages.size());
}
}

return 0;

For each log entry, we start writing at the end of the previous log entry and then update the positions vector with the length of the new log entry.

Note that how we are using mmap_obj just like an in-memory array.

In Raft like consensus protocol, the log entries can be overwritten where the older log entries were created by the previous leader and before the entries replicated to majority nodes, the leader crashed, as a result new leader can have different entries.

Thus we have a parameter ‘start’ to start overwriting the log from that index onwards.

‘start’ is the index of the line for a log entry and ‘start_pos’ is the corresponding starting byte position.

We might end up with a log file which contains corrupted text between two log entries. This can happen when the last byte position of the new entry is smaller than the last byte position of the previous entry at that index.

One way to handle this is to truncate the log file after the last entry.

We can truncate because in Raft when a log is overwritten from index start onwards, it is assumed that all entries after start are incorrect and should be either overwritten or deleted.

For e.g. before overwriting:

1. Log entry 1
2. Log entry 2

Now “Log entry 1” is overwritten with “Entry 1” which has lesser bytes than “Log entry 1”, thus after overwriting:

1. Entry 1ry 1
2. Log Entry 2

The text “ry 1” after “Entry 1” in line 1 is unwanted.

One of the questions you can ask is what if the size of my log file goes beyond 250MB?

In such a scenario (which is very common), we can create a new file whenever the size of the current file exceeds beyond max_length. Thus we need to initialize a new mmap_obj with the new file.

For a single thread, we do not need the mmap_obj’s from the older files because at any point in time, we would be writing to the latest log file. But this is not the case often as there are multiple threads that require reading logs from different files depending on how far behind a follower node is.

The next function we need is for reading log entries from line start to line end.

std::string readfile(long start, long end=__LONG_MAX__) {
{
std::lock_guard<std::mutex> l{m};

// Get start and end byte positions in the file
start = (start < positions.size()) ? start:positions.size();
end = (end < positions.size()-1) ? end:positions.size()-1;

if (end < start) return "";

long start_pos = 0;

if (start > 0) start_pos = positions[start-1];
long end_pos = positions[end];

// Read from mmap'd region
std::string str((char *)mmap_obj + start_pos, (char *)mmap_obj + end_pos);

return str;
}
}

Again as before, we find the starting and ending byte positions in the log file for reading the data and use mmap_obj like an array to extract the data.

Few things to observe:

When an mmap_obj is updated, the process is asynchronous i.e. the write may not happen immediately to the disk. The kernel usually schedules the write. With large number of update requests, there is risk of data loss. We can overcome this by add “msync” in the writefile method.

for (int i = start; i < start+messages.size(); i++) {
std::string message = messages[i-start];

// Get start and end byte positions in the file
start_pos = (i > 0) ? positions[i-1]:0;
end_pos = start_pos + message.length();

// Write the message into mmap'd region
long j = 0;
for (size_t k = start_pos; k < end_pos; k++) {
mmap_obj[k] = message[j++];
}

// This line ensure the write to disk happens before moving on
// but this will reduce the performance of writes
msync(mmap_obj+start_pos, end_pos-start_pos, MS_SYNC);

// Update positions vector with new entries and add entries
// if required
if (i < positions.size()) {
if (i > 0) positions[i] = positions[i-1] + message.length();
else positions[i] = message.length();
}
else {
if (positions.size() > 0) positions.push_back(positions.back() + message.length());
else positions.push_back(message.length());
}

During reading, if there are lots of threads/processes reading from different log files, we can create smaller mmap_objs for each file.

Assuming we will be reading on an average 100 log entries to sync them with a follower node and each log entry is of size 500 bytes. Then we are reading around 50000 bytes i.e. around 49 KB.

49KB can be read by reading from 13 pages (each of size 4KB).

But we will need 14 pages when the starting byte position is near the end of a page boundary as we will see below.

Loading 14 pages in main memory will take up only 57344 bytes per file whereas memory mapping the full file takes up 250MB per file.

std::string readfile(long start, long end=__LONG_MAX__) {
{
std::lock_guard<std::mutex> l{m};

// Get start and end byte positions in the file
start = (start < positions.size()) ? start:positions.size();
end = (end < positions.size()-1) ? end:positions.size()-1;

if (end < start) return "";

long start_pos = 0;

if (start > 0) start_pos = positions[start-1];
long end_pos = positions[end];

// Instead of memory mapping entire file of 250MB
// Based on the starting position first time, fetch 57344 bytes
// starting at page offset pa_offset.
// Thus if start_pos is 4095, then pa_offset = 0.
// Reading 50000 bytes will require reading upto 54095 bytes.
// Thus starting at offset 0 we will require reading 14 pages.
if (mmap_obj == NULL) {
off_t pa_offset = start_pos & ~(sysconf(_SC_PAGE_SIZE) - 1);
mmap_obj = (char *) mmap(NULL, 57344, PROT_READ | PROT_WRITE, MAP_SHARED, fd, pa_offset);
}

// Read from mmap'd region
std::string str((char *)mmap_obj + start_pos - pa_offset, (char *)mmap_obj + end_pos - pa_offset);

return str;
}
}

Note that the above assumes that we will never read beyond 53249 bytes for a log file. If its more than that then we will not receive some bytes beyond page boundary.

pa_offset is 0 when start_pos is 0 to 4095, pa_offset is 4096 when start_pos is between 4096 and 8191 and so on. Thus we also need to subtract off the pa_offset while reading the mmap_obj.

Now let’s compare this with standard I/O based logging as shown below:

class CommitLog {
public:

const char* filename = "commit-log.txt";
std::vector<long> positions;
int fd;
std::mutex m;

CommitLog() {
if (file_exists(filename)) {
// Open file for read/write if already exists
fd = open(filename, O_RDWR, 0644);
}
else {
// Create file if do not exist
fd = open(filename, O_RDWR | O_CREAT, 0644);
}

if (fd == -1) {
handle_error("Error opening file for writing");
}

close(fd);
}

int writefile(std::vector<std::string> messages, long start=__LONG_MAX__) {
long start_pos, end_pos;

{
std::lock_guard<std::mutex> l{m};

// Get starting byte position
start = (start < positions.size()) ? start:positions.size();
start_pos = (start > 0) ? positions[start-1]:0;

// Open file for reading + writing
fd = open(filename, O_RDWR, 0644);

// Move to start position byte
if (lseek(fd, start_pos, SEEK_SET) == -1) {
close(fd);
handle_error("Error calling lseek()");
}

for (int i = start; i < start+messages.size(); i++) {
const char *msg = messages[i-start].c_str();

// Write new log entry after last entry
if (write(fd, msg, strlen(msg)) == -1) {
close(fd);
handle_error("Error writing file");
}

// Update positions vector with new entries and add entries
// if required
if (i < positions.size()) {
if (i > 0) positions[i] = positions[i-1] + strlen(msg);
else positions[i] = strlen(msg);
}
else {
if (positions.size() > 0) positions.push_back(positions.back() + strlen(msg));
else positions.push_back(strlen(msg));
}
}

// Delete trailing elements of positions vector
if (positions.size() > start+messages.size()) {
positions.resize(start+messages.size());
}

close(fd);
}

return 0;
}

std::string readfile(long start, long end=__LONG_MAX__) {
{
std::lock_guard<std::mutex> l{m};
fd = open(filename, O_RDONLY, 0644);

// Get start and end line numbers
start = (start < positions.size()) ? start:positions.size();
end = (end < positions.size()-1) ? end:positions.size()-1;

if (end < start) return "";

// Get start and end byte positions in the file
long start_pos = 0;
if (start > 0) start_pos = positions[start-1];
long end_pos = positions[end];

// Move to start_pos byte position in file
if (lseek(fd, start_pos, SEEK_SET) == -1) {
close(fd);
handle_error("Error calling lseek()");
}

char out[end_pos-start_pos];

// Read end_pos-start_pos bytes beginning at start_pos
if (read(fd, out, end_pos-start_pos) == -1) {
close(fd);
handle_error("Error calling read()");
}

close(fd);

std::string str((char *)out, (char *)out + strlen((char *)out));
return str;
}
}
};

Benchmarking the performances for both the approaches as below:

// Generate random string of max_length
std::string generate(int max_length){
std::string possible_characters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
std::random_device rd;
std::mt19937 engine(rd());
std::uniform_int_distribution<> dist(0, possible_characters.size()-1);
std::string ret = "";

for(int i = 0; i < max_length; i++){
int random_index = dist(engine); //get index between 0 and possible_characters.size()-1
ret += possible_characters[random_index];
}
return ret;
}

int main(int argc, char *argv[]) {
std::vector<std::string> data, replace_data;
int i;
long n = 100000;

for (i = 0; i < n; i++) {
std::string h = generate(50) + '\n';
data.push_back(h);
}

auto start = std::chrono::high_resolution_clock::now();
CommitLog log;
for (i = 0; i < n; i+=10) {
std::vector<std::string> subdata(data.begin()+i, data.begin()+i+10);
log.writefile(subdata);
}
auto stop = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;

// std::cout << log.readfile(0) << std::endl;

start = std::chrono::high_resolution_clock::now();
for (i = 0; i < n; i += 10) {
log.readfile(i, i+9);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;

std::cout << std::endl;

start = std::chrono::high_resolution_clock::now();
CommitLogMemoryMap log2;
for (i = 0; i < n; i+=10) {
std::vector<std::string> subdata(data.begin()+i, data.begin()+i+10);
log2.writefile(subdata);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;

// std::cout << log2.readfile(0) << std::endl;

start = std::chrono::high_resolution_clock::now();
for (i = 0; i < n; i += 10) {
log2.readfile(i, i+9);
}
stop = std::chrono::high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::microseconds>(stop - start);
std::cout << duration.count() << std::endl;
}

Standard I/O:

For 100K log entries each with 50 bytes length, time taken to write the entries 10 at a time is around 153 microseconds.

Reading all the log entries 10 at a time takes 21 microseconds.

With memory mapped file:

For 100K log entries each with 50 bytes length, time taken to write the entries 10 at a time is around 52 microseconds.

Reading all the log entries 10 at a time takes 1.7 microseconds.

Thus with memory mapped file we achieve a speedup of 3x during writes and 10x during reads.

Using memory mapped file to use as a feature store for Logistic Regression

To train a machine learning classification model, we need to create feature vectors out of the training and testing data. Then we run Gradient Descent optimization algorithm to minimize the logistic loss of the predictions w.r.t. the actual labels of the data.

I will not go deep into how Logistic Regression model is trained using gradient descent. They can be found at many places over the internet.

An Introduction to Logistic Regression — Analytics Vidhya

Logistic regression — Wikipedia

In most applications the size of the feature matrix is small enough to fit in the memory and thus they can be fetched directly from memory itself. But with billions of data and each data having 1000s of features, loading them into memory will definitely cause OOM errors.

For e.g. assuming we have 64 bit (or 8 bytes) floats, each vector is of size 8000 bytes. With a billion vectors, the total size is 8000 billion bytes or 7.3 TB.

To work around those, we can persist the feature vectors in a file on disk and then using memory mapping to load these features in mini-batches to train the gradient descent optimization algorithm.

Here is an implementation in Python:

Following is the class for dealing with creation and retrieval of feature vectors. For this demo purpose we can create random feature vectors using Numpy although one can use any classification dataset from Kaggle.

Since the size of the file can be very large, we write the file in batches of size 100MB at a time. If we have data of size say 50GB, creating a file with null characters of 50GB will throw OOM error.

Whenever the data exceeds 100MB, we extend the file by another 100MB and resize the memory mapping.

Unlike C++, in Python we need to fill all empty byte positions with null character initially before we can do memory mapping on the file.

class Feature:
def __init__(self, file='features.txt', dim=100):
self.lock = Lock()
self.positions = []
self.file = file
self.n = n
self.dim = dim
self.batch_len = 1024*1024*100
self.batch = 0

# if file do not exists, then create it and fill with
# null character
if os.path.isfile(self.file):
f = open(self.file, 'r+b')
else:
f = open(self.file, 'w+b')
f.write(b'\0')
# flush the write to disk because write is asynchronous
# if not flushed then mmap will throw error because when it sees
# it will see an empty file.
f.flush()

# create memory mapping with the file
self.mmap_obj = mmap.mmap(f.fileno(), length=0, access=mmap.ACCESS_WRITE)
f.close()

def insert(self):
# use random vectors to use as feature vectors
data = np.random.rand(1, self.dim)[0]
data = ','.join([str(x) for x in data])
data = data.encode() + b','

with self.lock:
# add the new feature vector at the end (append)
offset = self.positions[-1] if len(self.positions) > 0 else 0

# if current size of the memory mapped file is less than
# bytes after the data is written, then extend the memory mapped
# file by self.batch_len
if offset + len(data) >= self.batch*self.batch_len:
f = open(self.file, 'r+b')
f.seek(offset)

# if batch_len is sufficiently large we do not need flush
f.write(b'\0'*self.batch_len)
self.batch += 1

# resize existing memory mapping
self.mmap_obj.resize(self.batch*self.batch_len)
f.close()

self.mmap_obj[offset:offset+len(data)] = data
self.positions += [self.positions[-1] + len(data)] if len(self.positions) > 0 else [len(data)]

def get(self, start, end):
# get vectors from start to end indices
with self.lock:
start = min(start, len(self.positions))
end = min(end, len(self.positions)-1)

if end < start:
return b''

start_pos = self.positions[start-1] if start > 0 else 0
end_pos = self.positions[end]

# end_pos-start_pos is assumed to fit in memory
return self.mmap_obj[start_pos:end_pos]

Next we are defining the class for training the LR classification:

class StreamingLogisticRegression:
def __init__(self, reg_lambda, epochs, batch_size, learning_rate, n, m, feature_obj):
self.weights = []
self.bias = 0
self.reg_lambda = reg_lambda
self.epochs = epochs
self.batch_size = batch_size
self.learning_rate = learning_rate
self.n = n
self.m = m
# feature_obj is instance of Feature class
self.feature_obj = feature_obj

def train(self, labels):
# initialize random weights
self.weights = init_weights(self.m)

# run gradient descent
self.weights, self.bias = gradient_descent(labels,
self.weights,
self.bias,
self.epochs,
self.batch_size,
self.learning_rate,
self.reg_lambda,
self.n,
self.m,
self.feature_obj)

Following function is used to initialize the weights of the features:

def init_weights(n):
return np.random.random(n)

The function where we are running the gradient descent algorithm:

def gradient_descent(labels, weights, bias, num_epochs, batch_size, learning_rate, reg_lambda, n, m, feature_obj):
# probabilities for the n examples
probs = np.zeros(n)

for epoch in range(num_epochs):
i = 0
while True:
# get batch
start, end = i*batch_size, min((i+1)*batch_size, n)
data = feature_obj.get(start, end-1).decode().split(",")
data = np.array([float(x) for x in data if x != '']).reshape(-1, m)

# compute the dot product of the weights and feature values
h = bias + np.dot(weights, data.T)

# compute the sigmoid probabilities
probs[start:end] = 1.0/(1.0+np.exp(-h))

# labels for the batch
sub_labels = labels[start:end]

# value of the gradient of loss w.r.t. the weights
sums = 2*reg_lambda*weights + np.dot((probs[start:end]-sub_labels), data)

# update the weights
weights -= learning_rate*sums
s = np.sum(probs[start:end]-sub_labels) + 2*reg_lambda*bias

# update bias
bias -= learning_rate*s

i += 1

if end == n:
break

# compute loss
curr_loss = loss(probs, labels, weights, bias, reg_lambda, n)
print(epoch, curr_loss)

return [weights, bias]

def loss(probs, labels, weights, bias, reg_lambda, n):
# compute logistic loss for current feature weights and bias
l = 0
for i in range(n):
l += -labels[i]*math.log(probs[i]) if probs[i] != 0 else 0
l += -(1-labels[i])*math.log(1-probs[i]) if probs[i] != 1 else 0

l += reg_lambda*(sum([w*w for w in weights]) + bias*bias)

return l

The main block for running the code:

if __name__ == '__main__':
n, m = 10000000, 1000

# binary classification with random labels
labels = np.random.randint(0, 1, n)
feature_obj = Feature(dim=m)

# insert each feature vector
for i in range(n):
feature_obj.insert()

# train LR with 100 epochs, 64 batch size and learning rate of 0.01
reg = StreamingLogisticRegression(0, 100, 64, 0.01, n, m, feature_obj)
reg.train(labels)

Good thing is that if disk size permits, you can experiment with billions of feature vectors without going OOM.

Whenever the LR algorithm tries to fetch a batch of vectors, if they are not present in the page table (due to swapping and thrashing), they will be fetched again from disk and updated in the page table. During this step, the training might slow down.

Codes are available here:

funktor/memory_mapped_file: Codes to understand memory mapped files in C++ and Python (github.com)

References

  1. Python mmap: Improved File I/O With Memory Mapping — Real Python
  2. Virtual Memory in Operating System — GeeksforGeeks
  3. mmap(2) — Linux manual page (man7.org)
  4. Comparison of three ways of disk I / O: standard I / O, direct I / O, mmap — Programmer Sought

--

--