Directory.GetFiles VS Directory.EnumerateFiles

Where I work, we have fairly large archives of files due to the large volume of messages received from various clients. Most of these messages are received through a VLTrader or ASP(M)X front-end. Eventually they are archived onto the file system according to some pre-determined process. The teams supporting these archives had grown concerned about the ever-increasing amount of storage required for these files. There are thousands of directories (nested many levels deep) and hundreds of thousands of files, potentially millions.

I was asked to help come up with a solution for this problem. The app needed to be configurable when run to specify the root directory and the number of days back to check the date on the file. I needed to allow them to specify that all files older than 90 days should be deleted, for example.

My initial reaction was to use the excellent (and very convenient) System.IO.Directory.GetFiles and System.IO.Directory.GetDirectories methods to simply get an array of the files and directories I would need to enumerate in order to accomplish the task. So I wrote a quick app, utilizing these methods, and saw the IOPS go crazy for a while, then do nothing, then go crazy again. All the while, not much was being accomplished. The issue, as anyone who has tried to “browse” the file system using Windows Explorer may tell you, is that getting the properties of the entire tree, including the number/size of directories and number/size of files, is quite an expensive process.

After doing a bit more research, I came upon the Directory.EnumerateFiles method, which (you guessed it) returns an enumerable collection of file names in a specified path, as opposed to Directory.GetFiles, which returns an array of file names in a specified path. The difference when checking a path with thousands of directories and hundreds of thousands if files is huge. In fact you don’t even have to have that many files/directories to see a dramatic difference. This is only available in .NET 4.0 and above. I have seen others suggest ways of doing something similar with the win32 API, but it was much easier for me to make sure I had .NET 4.0 available than it was to try and implement something using the win32 API.

Usage is simply:

foreach (string file in Directory.EnumerateFiles(rootDirectory, "*", SearchOption.AllDirectories))
                    ShouldDeleteFile(file);

When using these methods, be sure that proper permissions are available on the entire tree. See this post at Stack Overflow for more information. Otherwise you may get an exception. Speaking of permissions — part of my requirement was that I was supposed to delete all files more than 90 days old and all directories which were empty. To avoid any potential conflicts with permissions and/or file properties, the application will run as an administrator and

File.SetAttributes(filePath, FileAttributes.Normal);

is being set each time through. I’m not sure of the performance penalty this may result it. I’ll have to research and see what the hit would be.

2 thoughts on “Directory.GetFiles VS Directory.EnumerateFiles

  1. I ran into this today. I have millions of files that I need to enumerate through, and originally I wrote my app to target the 4.0 version of .NET using DirectoryInfo.EnumerateFiles because it seemed like the right method to use. I later found out that my deployment server was actually .NET 3.5, so I had to go back and replace the call with DirectoryInfo.GetFiles. The app almost immediately crashed with an out of memory exception, so I either have to rewrite my code to go through the directory files one small chunk at a time, or get them to upgrade the target server’s framework version.

  2. You could try running it from a separate app server that has .NET 4.0 on it and connect to the target server via a share. Make sure that permissions are set appropriately.

Comments are closed.