English 中文(简体)
快速阅读和搜索文件
原标题:looking for way to read and search file fast in c#

I have 100Mb text file and I need to check every line for special word. I am looking for fast way to do it.

因此,我把档案分为三个:

public void ParseTheFile(BackgroundWorker bg)
    {

        Lines = File.ReadAllLines(FilePath);
        this.size = Lines.Length;
        chankSise=size/10;

        reports reportInst = new reports(bg,size);

        ParserThread [] ParserthreadArray = new ParserThread[10];

        for (int i = 0; i <ParserthreadArray.Length; i++)
        {
            ParserthreadArray[i] = new ParserThread((reportInst));
            ParserthreadArray[i].Init(SubArray(Lines,i * chankSise, chankSise), OutputPath);

        }

        Thread oThread0 = new Thread(ParserthreadArray[0].run);
        oThread0.IsBackground = true;
        Thread oThread1 = new Thread(ParserthreadArray[1].run);
        oThread1.IsBackground = true;
        Thread oThread2 = new Thread(ParserthreadArray[2].run);
        oThread2.IsBackground = true;
        Thread oThread3 = new Thread(ParserthreadArray[3].run);
        oThread3.IsBackground = true;
        Thread oThread4 = new Thread(ParserthreadArray[4].run);
        oThread4.IsBackground = true;
        Thread oThread5 = new Thread(ParserthreadArray[5].run);
        oThread5.IsBackground = true;
        Thread oThread6 = new Thread(ParserthreadArray[6].run);
        oThread6.IsBackground = true;
        Thread oThread7 = new Thread(ParserthreadArray[7].run);
        oThread7.IsBackground = true;
        Thread oThread8 = new Thread(ParserthreadArray[8].run);
        oThread8.IsBackground = true;
        Thread oThread9 = new Thread(ParserthreadArray[9].run);
        oThread9.IsBackground = true;

        oThread0.Start();
        oThread1.Start();
        oThread2.Start();
        oThread3.Start();
        oThread4.Start();
        oThread5.Start();
        oThread6.Start();
        oThread7.Start();
        oThread8.Start();
        oThread9.Start();

        oThread0.Join();
        oThread1.Join();
        oThread2.Join();
        oThread3.Join();
        oThread4.Join();
        oThread5.Join();
        oThread6.Join();
        oThread7.Join();
        oThread8.Join();
        oThread9.Join();

这种方法是:

public void Init(string [] olines,string outputPath)
    {
        Lines = olines;
        OutputPath = outputPath+"/"+"ThreadTemp"+threadID;
    }

这是次轨道方法:

public string [] SubArray(string [] data, int index, int length)
    {
        string [] result = new string[length];
        Array.Copy(data, index, result, 0, length);
        return result;
    }

并且每一条镜头都这样做:

 public void run()
    {

        if (!System.IO.Directory.Exists(OutputPath))
        {
            System.IO.Directory.CreateDirectory(OutputPath);
            DirectoryInfo dir = new DirectoryInfo(OutputPath);
            dir.Attributes |= FileAttributes.Hidden;
        }



        this.size = Lines.Length;
        foreach (string line in Lines)
        {



            bgReports.sendreport(allreadychecked);

            allreadychecked++;
            hadHandlerOrEngine = false;
            words = line.Split(   );
            if (words.Length>4)
            {
                for (int i = 5; i < words.Length; i++)
                {
                    if (words[i] == "Handler" | words[i] == "Engine")
                    {

                        hadHandlerOrEngine = true;
                        string num = words[1 + i];
                        int realnum = int.Parse(num[0].ToString());
                        cuurentEngine = (realnum);
                        if (engineArry[realnum] == false)
                        {
                            File.Create(OutputPath + "/" + realnum + ".txt").Close();
                            engineArry[realnum] = true;

                        }
                        TextWriter tw = new StreamWriter(OutputPath + "/" + realnum + ".txt", true);
                        tw.WriteLine(line);
                        tw.Close();

                        break;
                    }
                }

            }

            if (hadHandlerOrEngine == false)
            {
                if (engineArry[cuurentEngine] == true)
                {
                    TextWriter tw = new StreamWriter(OutputPath + "/" + cuurentEngine + ".txt", true);
                    tw.WriteLine(line);
                    tw.Close();
                }

            }

        }

我的问题是,任何办法都能够加快这一运行。

最佳回答

您的口号显示您的<代码>Init方法,但此时此刻,您的座右铭即each将实际检查all<>m>。 此外,委员会也认为,所有的人都可能试图对同一档案进行书写,而不是以例外的方式这样做(使用<条码>使用<>条码/代码>说明)。

EDIT: Okay,我们现在可以参看Init,但我们可参看Sub Array/code>。 简言之,这只是阵列的一小部分。

如果你避免使用read子开始,那么这种速度会多么缓慢? 这无疑过于缓慢? 你们的业绩目标是什么? 似乎不大可能利用10条线杆来提供帮助,因为此时此刻,它完全记忆犹新。 (你还应努力避免重重重开所有线索的法典——为什么要用收集来这样做?)

问题回答

You are probably IO bound, so I d guess that multiple threads aren t going to help much. (Odds are your program spends most of its time here: Lines = File.ReadAllLines(FilePath); and not that much time actually parsing. You should measure though.) In fact, your SubArray splitting is possibly slower than if you just passed the whole thing to a single parser thread.

我将研究一下记忆犹新(如果是“NET 4”),它应当帮助国际交易日志的一些人不必提供所有来源数据的副本。

我想建议一些可能有用的东西。 正如有人所说的那样,如果你把多个读物放在你的档案中,就没有点了,因为现在更多的是<代码>I/O的活动,在这种情况下,可在<代码>上查找。 页: 1 但是,毫无疑问,你可以提出<条码>作为第1/O号要求,要求任何可提供<条码>I/O 完成<>。

Now when it comes to processing the file, I would recommend you use Memory-mapped files . Memory-mapped files are ideal for scenarios where an arbitrary chunk file ( view) of a considerably larger file needs to be accessed repeatedly/separately. In your scenario, memory-mapped files can help you split/assemble the file if the chunks arrive/process out of order. I have no handy examples at the moment. Have a look at the following article Memory Mapped Files.





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签