English 中文(简体)
Reading file over network slow due to extra reads
原标题:

I m reading a file and I either read a row of data (1600 sequential reads of 17 bytes) or a column of data (1600 reads of 17 bytes separated by 1600*17=27,200 bytes). The file is either on a local drive or a remote drive. I do the reads 10 times so I expect in each case to read in 272,000 bytes of data.

On the local drive, I see what I expect. On the remote drive when reading sequentially I also see what I expect but when reading a column, I see a ton of extra reads being done. They are 32,768 bytes long and don t seem to be used but they make the amount of data being read jump from 272,000 bytes to anywhere from 79 MB to 106 MB. Here is the output using Process Monitor:

1:39:39.4624488 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,390,069, Length: 17
1:39:39.4624639 PM  DiskSpeedTest.exe   89628   FASTIO_CHECK_IF_POSSIBLE    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Operation: Read, Offset: 9,390,069, Length: 17
1:39:39.4624838 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,388,032, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
1:39:39.4633839 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,417,269, Length: 17
1:39:39.4634002 PM  DiskSpeedTest.exe   89628   FASTIO_CHECK_IF_POSSIBLE    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Operation: Read, Offset: 9,417,269, Length: 17
1:39:39.4634178 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,444,469, Length: 17
1:39:39.4634324 PM  DiskSpeedTest.exe   89628   FASTIO_CHECK_IF_POSSIBLE    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Operation: Read, Offset: 9,444,469, Length: 17
1:39:39.4634529 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,441,280, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal
1:39:39.4642199 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,471,669, Length: 17
1:39:39.4642396 PM  DiskSpeedTest.exe   89628   FASTIO_CHECK_IF_POSSIBLE    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Operation: Read, Offset: 9,471,669, Length: 17
1:39:39.4642582 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,498,869, Length: 17
1:39:39.4642764 PM  DiskSpeedTest.exe   89628   FASTIO_CHECK_IF_POSSIBLE    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Operation: Read, Offset: 9,498,869, Length: 17
1:39:39.4642922 PM  DiskSpeedTest.exe   89628   ReadFile    \BCCDC01BCC-raid3SeisWareInc Temp DirBPepers_TempProjectsPT_4HorizonsBaseName3D_1RR_AP SUCCESS Offset: 9,498,624, Length: 32,768, I/O Flags: Non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal

Notice the extra reads of 32,768 with I/O Flags set to non-cached, Paging I/O, Synchronous Paging I/O, Priority: Normal. These extra reads are what take it from 272 KB to 106 MB and are causing the slowness. They don t happen when reading from a local file or if I m reading a row so it s all sequential.

I ve tried setting the FILE_FLAG_RANDOM_ACCESS but it doesn t seem to help. Any ideas on what is causing these extra reads and how to make them stop???

The tests are being run on a Vista 64 bit system. I can provide source code for a program to demonstrate the problem as well as a console program that does the tests.

问题回答

You might be running into op lock issues over smb. Typically when reading/saving a file over the network windows will pull over the full file to the client work on it and send back changes. When you are working with flat file databases or files it can cause unnecessary reads across an smb file share.

I m not sure if there is a way to just pull over the whole file, read the rows from that file on the local copy and then push back the changes or not.

You ll read some nightmares about oplocks and flat file databases.

http://msdn.microsoft.com/en-us/library/aa365433%28VS.85%29.aspx

Not sure if this solves your problem, but it might get you pointed in the right direction. Good luck!

I found the answer to this. Windows does file reads through the page cache so when I read 17 bytes, it first has to transfer a full page of 32K over and then can copy the 17 bytes I want out of the page cache. Nasty result on performance!

The same thing is actually happening the first time the reads are done on a local file since in that case it does still load a full page at a time into the page cache. But the second time I run the test locally, the files are all already in the page cache so I don t see it. And if SuperFetch is turned on and I ve been doing these tests for a while, Windows will start loading the file into the cache before I even run my test application so again I don t see the page reads being done.

So the operating system is doing a lot of things behind the scenes that makes it tough to get good performance testing done!

I see this all the time, and it s out of your control: the network does what it wants.

If you know the file is going to be less than 1MB, just pull the whole thing into memory.

My guess is that the OS is doing it s own read-ahead of the file on the off chance you need the data at a later point. If it s not hurting you then it shouldn t matter.

Check out caching behavoir section of the CreateFile API.

You may like to try the FILE_FLAG_NO_BUFFERING to see if it stops the extra reads. Be warned tho, using this flag may slow your application down. Normally you use this flag if you understand how to stream data off the disk as fast as you can and the OS caching is only getting in the way.

Also you may be able to get the same sort of behavior as the network file with local files if you use the FILE_FLAG_SEQUENTIAL_SCAN flag. This flag hint s to the windows cache manager what you will be doing and will try to get the data for you ahead of time.

I think SMB always transfers a block, rather than a small set of bytes.

Some information on block size negotiation can be found here. http://support.microsoft.com/kb/q223140

So you are seeing a read to copy the relevant block, followed by the local read(s) of 17 bytes within the block. (If you look at the pattern, there are some pairs of 17 byte reads where two reads fall within the same block).

The fix obviously depends upon the control you have over the application and the size and structure of the database. (e.g. if the database had one column per file, then all the reads would be sequential. If you used a database server, you wouldn t be using SMB, etc.)

If it s any consolation, iTunes performs abysmally when using a network drive too.





相关问题
Why running a service as Local System is bad on windows?

I am trying to find out the difference between difference service account types. I tumbled upon this question. The answer was because it has powerful access to local resources, and Network Service ...

Programmatically detect Windows cluster configuration?

Does anyone know how to programatically detect that a Windows server is part of a cluster? Further, is it possible to detect that the server is the active or passive node? [Edit] And detect it from ...

get file icon for Outlook appointment (.msg)

I ve read Get File Icon used by Shell and the other similar posts - and already use SHFileInfo to get the associated icon for any given extension, and that works great. However, Outlook uses ".msg" ...

Identifying idle state on a windows machine

I know about the GetLastInputInfo method but that would only give me the duration since last user input - keyboard or mouse. If a user input was last received 10 minutes ago, that wouldn t mean the ...

Terminating a thread gracefully not using TerminateThread()

My application creates a thread and that runs in the background all the time. I can only terminate the thread manually, not from within the thread callback function. At the moment I am using ...

热门标签