Question

我有两种方案,基本上相同,在档案中,一线的长度最大,我有大约8 000条线的档案,而我C的法典比我C++的法典小得多(当然是!)。民事诉讼程序大约需要2秒钟,而C++方案则需要10秒钟才能运行(我正在对两个案件进行测试)。但为什么? 我期望它花费同样多的时间,或略微多一点,但并非8秒。

我的C部法典:

#include <stdio.h>
#include <stdlib.h> 
#include <string.h>

#if _DEBUG
    #define DEBUG_PATH "../Debug/"
#else
    #define DEBUG_PATH ""
#endif

const char FILE_NAME[] = DEBUG_PATH "data.noun";

int main()
{   
    int sPos = 0;
    int maxCount = 0;
    int cPos = 0;
    int ch;
    FILE *in_file;              

    in_file = fopen(FILE_NAME, "r");
    if (in_file == NULL) 
    {
        printf("Cannot open %s
", FILE_NAME);
        exit(8);
    }       

    while (1) 
    {
        ch = fgetc(in_file);
        if(ch == 0x0A || ch == EOF) // 
 or 
 or 
 or end of file
        {           
            if ((cPos - sPos) > maxCount)
                maxCount = (cPos - sPos);

            if(ch == EOF)
                break;

            sPos = cPos;
        }
        else
            cPos++;
    }

    fclose(in_file);

    printf("Max line length: %i
",  maxCount); 

    getch();
    return (0);
}

my code in C++:

#include <iostream>
#include <fstream>
#include <stdio.h>
#include <string>

using namespace std;

#ifdef _DEBUG
    #define FILE_PATH "../Debug/data.noun"
#else
    #define FILE_PATH "data.noun"
#endif

int main()
{
    string fileName = FILE_PATH;
    string s = "";
    ifstream file;
    int size = 0;

    file.open(fileName.c_str());
    if(!file)
    {
        printf("could not open file!");
        return 0;
    }

    while(getline(file, s) )
            size = (s.length() > size) ? s.length() : size;
    file.close();

    printf("biggest line in file: %i", size);   

    getchar();
    return 0;
}

Answer 1

C++版本经常分配并处理下列障碍:扼杀。记忆分配是一项昂贵的行动。施工者/司机被处决。

然而,C版使用了经常记忆,而且很有必要: 阅读单一特性,确定线宽度与每条新线的新价值(如果更高的话)相对应,并相应。

Answer 2

我的猜测是,这是你所使用的汇编方法、汇编者本身或档案系统的一个问题。我现在将两种版本(优化版)汇编成册,对照92,000个字面文件进行:

c++ version:  113 ms
c version:    179 ms

而我怀疑,C++版本越快的原因是植被最有可能放缓。 <代码>fgetc确实使用缓冲I/O,但正在发挥功能,检索各种特性。 I ve test it before and fgetc is not as well as made an calls to Read the full line in one calls (e.g.,versus the fgets).

Answer 3

因此,在少数评论中我和各国人民一样回答说,问题很可能是你的C++版所做的额外复制,在版图中,它复制了这些线路。但我要对此进行测试。

第一,我实施了植被和线性版本,并做了时间。我确认,在减速模式中,电离层流放速度较慢,大约为130微克和60微克。鉴于传统的智慧,即奥溪比使用分流缓慢,这是不值得称赞的。然而,在过去,我的经验是,从优化开始,各流的步伐大大加快。在我比较我的释放模式时间时证实了这一点:大约20微克使用线,48微克有植被。

至少在释放模式中,使用流线的速度比植被快,这违反了以下论点:复制所有数据必须比复制数据要慢,因此,我不敢肯定所有优化能够避免什么,而我确实没有想找到任何解释,但很想知道什么是最佳的。 edit:当我看一看方案概况时,显然如何比较业绩,因为不同方法所看的情况与另一个不同。

我想看到的是,我能否通过避免使用“<代码>get()”法复制有关单流物体,而这正是C文本正在做的。当我这样做时,我非常惊讶地发现,使用<代码>fstream:植被(<>/代码)比粉碎和释放中的植被和线性方法要慢得多; de中的约230微克和80微克释放。

为了缩小减速幅度,我走过前面,并做了另一种版本,这次使用了附属于气流物体的溪流和<代码>snextc()。该版本目前最快,排出25微克,释放6微克。

我猜测,使<代码>fstream:get()方法如此缓慢的事物是,它为每个电话构造一个切入物体。虽然我没有检测过这一点,但我看不出,get()远远超出了从溪流获得下一个特性的范围,只有这些重返物体除外。

Anyway, the moral of the story is that if you want fast io you re probably best off using high level iostream functions rather than stdio, and for really fast io access the underlying stream_buf. edit: actually this moral may only apply to MSVC, see update at bottom for results from a different toolchain.

参考:

我在时间方面使用了VS2010和chrono,从1.47升入。我建造了32个轨道望远镜(由于能够找到64个轨道版本的微粒而需要增强速度)。我 did击了汇编备选办法,但可能并非完全标准,因为我是在一纸空谈中这样做的。

文档一经测试的版本是:Frédéric Bastiat项目Frédéric Bastiat的1.1. MB 20,000行式平原文本,即:http://www.gutenberg.org/ebooks/35390“rel=”http://www.gutenberg.org/ebooks/35390。

释放方式

fgetc time is: 48150 microseconds snextc time is: 6019 microseconds get time is: 79600 microseconds getline time is: 19881 microseconds

2. 变式时间:

fgetc time is: 59593 microseconds snextc time is: 24915 microseconds get time is: 228643 microseconds getline time is: 130807 microseconds

载于我的<代码>fgetc(>版本:

{ auto begin = boost::chrono::high_resolution_clock::now(); FILE *cin = fopen("D:/bames/automata/pg35390.txt","rb"); assert(cin); unsigned maxLength = 0; unsigned i = 0; int ch; while(1) { ch = fgetc(cin); if(ch == 0x0A || ch == EOF) { maxLength = std::max(i,maxLength); i = 0; if(ch==EOF) break; } else { ++i; } } fclose(cin); auto end = boost::chrono::high_resolution_clock::now(); std::cout << "max line is: " << maxLength << ; std::cout << "fgetc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << ; }

载于我的<代码>getline(版):

{ auto begin = boost::chrono::high_resolution_clock::now(); std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary); unsigned maxLength = 0; std::string line; while(std::getline(fin,line)) { maxLength = std::max(line.size(),maxLength); } auto end = boost::chrono::high_resolution_clock::now(); std::cout << "max line is: " << maxLength << ; std::cout << "getline time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << ; }

<代码>fstream:get(

{ auto begin = boost::chrono::high_resolution_clock::now(); std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary); unsigned maxLength = 0; unsigned i = 0; while(1) { int ch = fin.get(); if(fin.good() && ch == 0x0A || fin.eof()) { maxLength = std::max(i,maxLength); i = 0; if(fin.eof()) break; } else { ++i; } } auto end = boost::chrono::high_resolution_clock::now(); std::cout << "max line is: " << maxLength << ; std::cout << "get time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << ; }

www.un.org/chinese/sc/presidency.asp

{ auto begin = boost::chrono::high_resolution_clock::now(); std::ifstream fin("D:/bames/automata/pg35390.txt",std::ios::binary); std::filebuf &buf = *fin.rdbuf(); unsigned maxLength = 0; unsigned i = 0; while(1) { int ch = buf.snextc(); if(ch == 0x0A || ch == std::char_traits<char>::eof()) { maxLength = std::max(i,maxLength); i = 0; if(ch == std::char_traits<char>::eof()) break; } else { ++i; } } auto end = boost::chrono::high_resolution_clock::now(); std::cout << "max line is: " << maxLength << ; std::cout << "snextc time is: " << boost::chrono::duration_cast<boost::chrono::microseconds>(end-begin) << ; }

update:

我将SOS X使用部族(中继)进行的测试与校准++相重。基于流层的执行结果相对相同(优化使用);<代码>fstream:get(<>/code> 远低于std:getline(<>> 远低于filebuf:snextc()。但是,<代码>fgetc(>>的性能相对于getline()的履行情况有所改善,并且变得更快。也许这是因为,通过<代码>getline()进行的复制成为这一工具链的一个问题,而与MSVC相比是没有的。或许Microsoft s CRT实施fgetc()是坏的还是什么?

不管怎么说,这里是时候(我用了较大的档案,5.3甲基溴):

使用——

fgetc time is: 39004 microseconds snextc time is: 19374 microseconds get time is: 145233 microseconds getline time is: 67316 microseconds

使用

fgetc time is: 44061 microseconds snextc time is: 92894 microseconds get time is: 184967 microseconds getline time is: 209529 microseconds

-O2

fgetc time is: 39356 microseconds snextc time is: 21324 microseconds get time is: 149048 microseconds getline time is: 63983 microseconds

-O3

fgetc time is: 37527 microseconds snextc time is: 22863 microseconds get time is: 145176 microseconds getline time is: 67899 microseconds

Answer 4

你们并没有把 app子与 app子相提并论。页: 1 FILE*向您的方案记忆提供缓冲。它还使用原始档案。

您的C++方案需要多次缩短每次播音的长度——一次在流法中知道何时终止它回到你身上,一次是在<条码>的构造中:,,一次在你的代码上叫<条码>。

您有可能改进您的C方案的业绩,例如通过使用getc_unidden。如果有的话,可向您提供。但最大的胜利是不必复制你的数据。

<>strong>EDIT: edited in response to a comment by bames53

Answer 5

仅8 000条线路的2秒? 我不知道你的界限是多长的,但机会是,你正在做一些错误的事情。

This trivial Python program executes almost instantly with El Quijote downloaded from Project Gutenberg (40006 lines, 2.2MB):

import sys
print max(len(s) for s in sys.stdin)

时间:

~/test$ time python maxlen.py < pg996.txt
76

real    0m0.034s
user    0m0.020s
sys     0m0.010s

你可以通过缓冲投入而不是用果子阅读果园来改进你的C码。

About why is the C++ slower than C, it should be related with building the string objects and then calling the length method. In C you are just counting the chars as you go.

Answer 6

I tried compiling and running your programs against 40K lines of C++ source and they both completed in about 25ms or so. I can only conclude that your input files have extremely long lines, possibly 10K-100K characters per line. In that case the C version doesn t have any negative performance from the long line length while the C++ version would have to keep increasing the size of the string and copying the old data into the new buffer. If it had to increase in size a sufficient number of times that could account for the excessive performance difference.

这里的关键是,这两个方案没有做同样的事,因此,你无法真正比较结果。如果你能够提供投入文件,我们就能够提供更多细节。

您可能使用<代码>tellg和ignore,以便在C++中更快地做到这一点。

Answer 7

C++方案建构了线标的,而C方案仅读写特征,看特征。

http://www.un.org。

Thanks for the upvotes, but after the discussion I now think this answer is wrong. It was a reasonable first guess, but in this case it seems that the different (and very slow) execution times are caused by other things.

Answer 8

I m alright with the theory Populars. 但让人们获得经验。

我制作了1 300万条文字档案,以便与我合作。

~$ for i in {0..1000}; do cat /etc/* | strings; done &> huge.txt

The original code edited to read from stdin (shouldn t affect too much the performance) made it in almost 2 min.

C++ 代码:

#include <iostream>
#include <stdio.h>

using namespace std;

int main(void)
{
    string s = "";
    int size = 0;

    while (cin) {
        getline(cin, s);
        size = (s.length() > size) ? s.length() : size;
    }
    printf("Biggest line in file: %i
", size);

    return 0;
}

C++ 时间:

~$ time ./cplusplus < huge.txt
real    1m53.122s
user    1m29.254s
sys     0m0.544s

A. 导言原文:

#include <stdio.h>
int main(void)
{
    char *line = NULL;
    int read, max = 0, len = 0;

    while ((read = getline(&line, &len, stdin)) != -1)
        if (max < read)
            max = read -1;
    printf("Biggest line in file %d
", max);

    return 0;
}

C 业绩:

~$ time ./ansic < huge.txt
real    0m4.015s
user    0m3.432s
sys     0m0.328s

您自己的数学......

update:

友情链接