Use memory map files in VC ++ to process large files

xiaoxiao2021-04-10  501

introduction

File operation is one of the most basic functions of the application, Win32 API and MFC provide functions and classes that support file processing, commonly used CreateFile (), WriteFile (), readfile (), WRITEFILE (), readfile (), CFILE class provided by the MFC Wait. In general, these functions can meet the requirements of most cases, but for several tens of GB, hundreds of GB, or even TB, the mass storage required for some special application areas, and then process the usual file processing method. Obviously it is not possible. Currently, the operation of this large file is generally processed in a mode of memory mapping files, which will be discussed below for this Windows core programming technology.

Memory map file

Memory map files are similar to virtual memory. You can keep an area of ​​an address space through a memory mapping file, while submitting the physical memory to this area, just the physical memory mapping of the memory file from a file already existing on the disk, not the system The page file, and must first map the file before operating the file, just load the entire file from the disk to memory. It can be seen that when using the memory map file to process files stored on the disk, it will not be necessary to perform I / O operations on the file, which means that it will not be necessary to apply and allocate the cache when processing the file. The file cache operation is directly managed by the system. Since the file data is loaded into memory, the data from memory to files and releases the memory block, the memory map file can be played when processing a large amount of data. Pretty important role. In addition, the system in the actual engineering often needs to share data between multiple processes. If the amount of data is small, the processing method is flexible, and if the shared data capacity is huge, then it needs to be performed by means of a memory map. In fact, memory mapping files are the most effective way to solve data sharing between locals.

Memory map files are not simple file I / O operations, actually use Windows core programming technology - memory management. So, if you want to have a more profound understanding of memory map files, you must have a clear understanding of the memory management mechanism of the Windows operating system. The relevant knowledge of memory management is very complicated, and the discussion category of this article is exceeded. Interested readers can refer to other related books. The general method of using a memory map is given below:

First, you must create or open a file kernel object through the createFile () function, which identifies the file that will be used as a memory map file on the disk. After advertising the file image in the location of the file image in the physical memory, only the path of the image file is specified, and the length of the image is not specified. To specify how much physical storage is required to specify file mapping objects, you need to create a file mapping kernel object to tell the system file size and access the file. After the file mapping object is created, you must retain an address space area for file data, and submit file data as a physical memory mapped to the area. The MapViewOffile () function is responsible for managing all or part of the file map object to the process address space through the management of the system. At this time, the use and processing of the memory mapping file is basically the same as the processing method of file data that is usually loaded into the memory. When the use of the memory map file is completed, the clearance is completed through a series of operations and Use the release of resources. This part is relatively simple, and you can complete the image of the file data from the process of address space by unmapViewoffile (), and close the file mapping objects and file objects created in front created through CloseHandle ().

Memory map file related functions

When using a memory map file, the API function used is mainly the functions mentioned earlier, and the following is introduced:

Handle Createfile (LPCTSTR LPFILENAME,

DWORD DWDESIREDACCESS, DWORD DWSHAREMODE,

LPSecurity_attributes lpsecurityattributes,

DWORD DWCREATIONDISPSITION,

DWORD DWFLAGSANDATTRIBUTES,

Handle htemplatefile;

Function createfile () Even when it is also used in normal file operation, open files, when processing memory mapping files, the function is created / opened a file kernel object, and returns its handle, when the function is called It is necessary to set the parameters dwdesiredAccess and DWSHAREMODE according to whether the sharing of data read and write and files is required, and the error parameter setting will cause failure when the corresponding operation.

Handle CreateFilemapping (Handle Hfile,

LPSecurity_attributes lpfilemappingattributes,

DWORD FLPROTECT,

DWORD DWMAXIMUMSIGH,

DWORD DWMAXIMUMSIZELOW,

LPCTSTR LPNAME);

CreateFileMapping () Function Creates a file mapping kernel object, specifying the file handle to the process address space by parameter HFile (the handle is acquired by the CreateFile () function. Since the physical memory of the memory mapping file is actually stored on the disk, not the memory allocated from the system's page file, the system does not actively reserve the address space area, nor will the file storage space Map to this area, in order to make the system to determine what protecting properties to the page, you need to set by parameter flprotect, protect attributes Page_readonly, Page_ReadWrite, and Page_WriteCopy, you can read, read and write file data. . When using PAGE_READONLY, we must ensure that CreateFile () is used in GENERIC_READ parameters; PAGE_READWRITE requires CreateFile () is used in GENERIC_READ | GENERIC_WRITE parameters; As for property PAGE_WRITECOPY only need to ensure that CreateFile () uses one of GENERIC_READ and can GENERIC_WRITE . DWORD type parameters DwMaximumSizeHigh and dwmaximumsizerow are also quite important to specify the maximum number of bytes of the file, because the two parameters are 64 bits, so the maximum file length is 16eb, which can almost satisfy any big data volume file processing. Requirements.

LPVOID MAPVIEWOFFILE (Handle HfileMappingObject,

DWORD DWDESIREDACCESS,

DWORD DWFILEOFFSETHIGH,

DWORD DWFILEOFFSETLOW,

DWORD DWNUMBEROFBYTESTOMAP;

The MapViewOffile () function is responsible for mapping the file data to the address space of the process, and the parameter hFileMappingObject is the file image object handle returned for CREATEFILEMAPPING (). The parameters DwdesiredAccess each specify the access method of the file data and also matches the protection attribute set to the createFileMapping () function. Although the protection attributes are repeatedly set up here, it can make the application more effectively control the application of the protection attribute of the data. The MapViewOffile () function allows all or part of the mapping file, when mapping, you need to specify the offset address of the data file and the length of the to map. The offset address of the file is specified by the 64-bit value consisting of DWORD type parameters dwfileoffsetHigh and DwFileOffsetLow, and must be an integration of the allocation grain size of the operating system. For the Windows operating system, the assignment particle size is fixed to 64KB. Of course, it is also possible to dynamically obtain the allocation granularity of the current operating system by the following code: system_info sinf;

GetSystemInfo (& SINF);

DWORD DWALLOCATIONGRANULARITY = SINF.DWALLOCATIONGRANULARITY

The parameter dwnumberofbytestomap specifies the mapping length of the data file, which is especially pointed out that for the Windows 9X operating system, if MapViewOffile () cannot find a large enough area to store the entire file mapping object, return null value (NULL); but Under Windows 2000, MapViewOffile () only needs to find a large enough area for the necessary view, without considering the size of the entire file mapping object.

After completing the file processing that is mapped to the process space area, you need to complete the release of the file data image through the function unmapViewOffile (), which is as follows:

Bool UnmapViewoffile (LPCVOID LPBASEADDRESS);

The only parameter lpBaseAddress specifies the base address of the return area, and it must be set to the return value of MapViewOffile (). After using the function mappviewoffile (), there must be a corresponding unmapViewOffile () call, otherwise the preserved area will not be released before the process is terminated. In addition to this, the file kernel object and file mapping kernel object have been created in front, and it is necessary to release it through CloseHandle () before the process is terminated, otherwise resource leakage problem will occur. .

In addition to these necessary API functions, other secondary functions should be selected as appropriate when using memory map files. For example, when using a memory mapped file, in order to improve the speed, the system is cached cache the files, and the disk image of the file is not updated immediately when the file mapping view is processed. To solve this problem, you can consider using the FlushViewOffile () function, which enforces the modified data sections or all of them to the disk image, so that all data updates can be saved to disk in time.

Use memory map file to process large file application examples

Next, a specific example is further described to further describe how the memory map file is used. This example receives data from the port and stores it in a disk in real time, and the memory mapping file is handled by this memory mapping file due to large data amount (tens of GB). The following is a part of the main code in the working thread mainproc, which starts from the program running, and when the port has data arrival, the event hevent [0], the waitformultipleObjects () function waits for the event after the event will receive The data is saved to the disk, and if the end reception will issue an event HEVENT [1], the event handler will be responsible for completing the release of the resource release and the file closure. The specific implementation process of this thread process is given below: ......

// Create a file kernel object, his handle is saved in HFile

Handle Hfile = CREATEFILE ("Recv1.zip",

Generic_write | generic_read,

File_share_read,

NULL,

Create_ALWAYS,

FILE_FLAG_SEQUENTIAL_SCAN,

NULL);

// Create a file mapping kernel object, handle is saved in HFileMapping

Handle Hfilemapping = CreateFilemapping (Hfile, Null, Page_Readwrite,

0, 0x4000000, NULL);

/ / Release the file kernel object

CloseHandle (HFILE);

// Set the size, offset and other parameters

__INT64 QWFILESIZE = 0x4000000;

__INT64 QWFILEOFFSET = 0;

__INT64 T = 600 * sinf.dwallocationGranularity;

DWORD DWBYTESINBLOCK = 1000 * sinf.dwallocationGranulaity;

// Map the file data to the address space of the process

PBYTE PBFILE = (Pbyte) MapViewoffile (HFilemapping,

FILE_MAP_ALL_ACCESS,

(DWORD) (QWFILEOFFSET & 0XFFFFFFFFFF), DWBYTESINBLOCK;

While (bloop)

{

// capture event hEvent [0] and event hEvent [1]

DWORD RET = WaitFormultiPleObjects (2, HEVENT, FALSE, Infinite);

RET - = WAIT_Object_0;

Switch (re)

{

/ / Receive data event trigger

Case 0:

/ / Receive data from the port and save to memory map file

NReadlen = Syio_read (port [1], pbfile qwfileoffset, queuelen);

QWFILEOFFSET = NREADLEN

// When the data is full of 60%, it is necessary to open a new map view for the anti-data overflow.

IF (QWFileOffset> T)

{

T = QWFileOffset 600 * sinf.dwallocationGranularity;

UNMAPVIEWOFFILE (PBFILE);

Pbfile = (pbyte) MapViewoffile (Hfilemapping,

FILE_MAP_ALL_ACCESS,

(DWORD) (QWFILEOFFSET & 0XFFFFFFFFFF), DWBYTESINBLOCK;

}

Break;

// Terminate the event trigger

Case 1:

BLOOP = false;

// Undo file data image from the address space of the process

UNMAPVIEWOFFILE (PBFILE);

// Close the file mapping object

CloseHandle (HFileMapping);

Break;

}

}

...

During the termination of the event trigger, if only simple execution unmapViewoffile () and closeHandle () functions will not be able to correctly identify the actual size of the file, that is, if the open memory map file is 30GB, the received data is only 14GB, then the above program is executed After the end, the saved file length is still 30GB. That is, the file will be restored to the actual size in the form of the memory mapping file after processing is completed, and the following is the main code to achieve this:

// Create another file kernel object

Hfile2 = CREATEFILE ("Recv.zip",

Generic_write | generic_read,

File_share_read,

NULL,

Create_ALWAYS,

FILE_FLAG_SEQUENTIAL_SCAN,

NULL);

/ / Create another file mapping kernel object with the actual data length

Hfilemapping2 = CREATEFILEMAPPING (HFILE2,

NULL,

Page_readwrite,

0,

(DWORD) (qwfileoffset & 0xfffffff),

NULL);

// Turn off the file kernel object

CloseHandle (HFILE2);

// Map the file data to the address space of the process

PBFILE2 = (Pbyte) MapViewOffile (Hfilemapping2,

FILE_MAP_ALL_ACCESS,

0, 0, qwfileoffset);

// Copy the data from the original memory map file to this memory map file

Memcpy (pbfile2, pbfile, qwfileoffset);

File: // Undo file data image from the process of address space

UNMAPVIEWOFFILE (PBFILE);

UnmapViewoffile (pbfile2);

// Close the file mapping object

CloseHandle (HFileMapping);

CloseHandle (HFileMapping2);

/ / Delete temporary files

Deletefile ("Recv1.zip");

in conclusion

With the actual test, the memory map file has a good performance when processing large data volume files, which has a significant advantage over the file processing method that usually uses the CFILE class and readFile () and WriteFile ().

转载请注明原文地址:https://www.9cbs.com/read-133462.html

New Post(0)