Showing posts with label Files/IO. Show all posts
Showing posts with label Files/IO. Show all posts

Friday, February 9, 2007

Search and Replace Texts in all Files and Subfolders using C#

Recently I needed to replace a string in whole bunch of HTML template files that I had been working on. I found a couple of text editors which attempted to do this only after loading the entire lot of files in the memory. When the number of files were in hundreds they failed miserably. I also tried to load the files into a project in Visual Studio 2005 IDE and use its Find & Replace in Current Project feature. This failed to find texts with line breaks eventually forcing me to drop this idea. After spending an hour of intense googling to find the right tool, I decided to make my own Search & Replace application in C# and I did make it in 15mins. This application can search and replace texts in all files and subfolders filtered by their extensions. 

Instead of using the string.Replace() method provided by the string object, I wrote a custom method which finds and replace strings  in a loop. This gives me more flexibility to keep track of the number of replacements as well as the files that were actually affected. The Directory.GetFiles() method returns the complete paths of all files (after applying the filter) in the specified folder and all its subfolders.

Here is The code....

private void btnReplace_Click(object sender, EventArgs e)
{
    //Enter some text that you want to search and replace
    string find = txtFind.Text; 
    int replaced = 0;

    //Get all the files from the root directory filtered by a filter text.
    string[] fileList = Directory.GetFiles(@"C:\Documents and Settings\tganesan\Desktop\FindnReplace", "*.txt", SearchOption.AllDirectories);

    //Loop through each file, call the ReplaceText() method
    //and replace the file if something was replaced.
    foreach (string file in fileList)
    {
       StreamReader sr = new StreamReader(file);
       string content = sr.ReadToEnd();
       sr.Close();
       
       if(ReplaceText(ref content, txtFind.Text, txtReplace.Text, ref replaced))
       {
           StreamWriter sw = new StreamWriter(file);
           sw.Write(content);
           sw.Flush();
           sw.Close();
           //TODO: Add the files to a collection that were affected.
       }               
    }
    MessageBox.Show("Total replacements = " + replaced);
}


/// <summary>
/// This method loops through the File content 
/// and replaces the text if found.
/// </summary>
/// <param name="content"></param>
/// <param name="oldValue"></param>
/// <param name="newValue"></param>
/// <param name="replaced"></param>
/// <returns></returns>
private bool ReplaceText(ref string content,string oldValue, string newValue, ref int replaced)
{
    Boolean isReplaced = false;
    int startIndex = 0;         

    while (startIndex != -1)
    {
        startIndex = content.IndexOf(oldValue,startIndex);
        if (startIndex != -1)
        {
            content = content.Remove(startIndex, oldValue.Length);
            content = content.Insert(startIndex,newValue);                    
            replaced += 1;
            isReplaced = true;
        }
    }
    return isReplaced;
}
Don't forget to take a backup of files before running this code.

Tuesday, January 30, 2007

Split and Merge files in C#

In this feed I'll show you how to Split a file into user-specified chunks and eventually merge them all together. You will find this very helpful if you have very large text files, greater than a GB,  that cannot be viewed in your "lousy Notepad". These large text files could be one of the crucial log files from your enterprise applications that may accrue data, if left un-attended, over time.

The code example I have shown below is generalized to split any file irrespective of their format.

private void btnSplit_Click(object sender, EventArgs e)
{
    string inputFile = txtInputFile.Text; // Substitute this with your Input File 
    FileStream fs = new FileStream(inputFile, FileMode.Open, FileAccess.Read);
    int numberOfFiles = Convert.ToInt32(txtChunks.Text);
    int sizeOfEachFile = (int)Math.Ceiling((double)fs.Length / numberOfFiles);

    for (int i = 1; i <= numberOfFiles; i++)
    {
        string baseFileName = Path.GetFileNameWithoutExtension(inputFile);
        string extension = Path.GetExtension(inputFile);
        FileStream outputFile = new FileStream(Path.GetDirectoryName(inputFile) + "\\" + baseFileName + "." + i.ToString().PadLeft(5, Convert.ToChar("0")) + extension + ".tmp", FileMode.Create, FileAccess.Write);
        int bytesRead = 0;
        byte[] buffer = new byte[sizeOfEachFile];

        if ((bytesRead = fs.Read(buffer, 0, sizeOfEachFile)) > 0)
        {
            outputFile.Write(buffer, 0, bytesRead);
        }
        outputFile.Close();
    }
    fs.Close();
}
private void btnMerge_Click(object sender, EventArgs e)
{
    string outPath = txtInputFolder.Text; // Substitute this with your Input Folder 
    string[] tmpFiles = Directory.GetFiles(outPath, "*.tmp");
    FileStream outputFile = null;
    string prevFileName = "";

    foreach (string tempFile in tmpFiles)
    {

        string fileName = Path.GetFileNameWithoutExtension(tempFile);
        string baseFileName = fileName.Substring(0, fileName.IndexOf(Convert.ToChar(".")));
        string extension = Path.GetExtension(fileName);

        if (!prevFileName.Equals(baseFileName))
        {
            if (outputFile != null)
            {
                outputFile.Flush();
                outputFile.Close();
            }
            outputFile = new FileStream(outPath + baseFileName + extension, FileMode.OpenOrCreate, FileAccess.Write);
        }
        
        int bytesRead = 0;
        byte[] buffer = new byte[1024];
        FileStream inputTempFile = new FileStream(tempFile, FileMode.OpenOrCreate, FileAccess.Read);

        while ((bytesRead = inputTempFile.Read(buffer, 0, 1024)) > 0)
            outputFile.Write(buffer, 0, bytesRead);

        inputTempFile.Close();
        File.Delete(tempFile);
        prevFileName = baseFileName;
    }
    outputFile.Close();
}
 
The split method is straightforward, you set the count of number of files to be split, and the size of each file is allocated equally. Each file is named after its parent, numbered and tailed with an extension of ".tmp". If you're splitting a Text file with no intention of merging them at a later time, you can replace the ".tmp" extension with ".txt".

The Merge method above, is in fact a "Merge All" method. It merges all the files with extensions ".tmp" in the specified directory and re-creates the parent file back. That's the reason why I'm retaining the original fileName and their extensions.

The Directory.GetFiles() method returns an array of all the file paths from the directory in ascending order. If you have fileNames like Testfile1.txt, Testfile2.txt, .... testfile100.txt then their order in the string array would be Testfile1.txt, Testfile10.txt, Testfile100.txt, Testfile2.txt, Testfile20.txt, Testfile3.txt,.... as the numbers are just strings. The Merge would fail eventually because of this merge-order. This can be addressed if you LeftPad the number with 0's, preferably a padding of 5 characters, while splitting. The fileNames now would be something like these, Testfile00001.txt, Testfile00002.txt...Testfile00100.txt.

All the "tmp" files are deleted from the folder after they are merged.