English 中文(简体)
将两个大文本文件的返回匹配项与output.txt进行比较
原标题:Compare 2 Large Text Files return matches to output.txt

两天来,我一直在尝试找到一种方法来比较Python或Powershell中的两个文本文件,并将两个文件中出现的行写入输出文件。

Example: Text1.txt bird human dog hotdog film

Text2.txt human mercedes dog crown clown

Output.txt: human dog

我是个初学者,希望你能帮助我。

In Powershell I find many Side Indicator examples but only with listing like: Text1 Text 2 bird <= human == mercedes =>

那么在哪个文件中复制了什么。但是几千行太长了。在Powershell中,我只找到显示output.txt中差异的示例。

Powershell中的测试:

compare-object -PassThru (get-content File2.txt) (get-content File1.txt | Select-Object -Unique)
问题回答
def compare_files(file1, file2, output_file):
  with open(file1,  r ) as f1, open(file2,  r ) as f2, open(output_file, 
     w ) as output:
    lines1 = set(f1.read().splitlines())
    lines2 = set(f2.read().splitlines())
    common_lines = lines1.intersection(lines2)
    for line in common_lines:
        output.write(line +  
 )

    # Usage example
    compare_files( Text1.txt ,  Text2.txt ,  Output.txt )

在本例中,我们定义了一个函数compare_files,它接受三个参数:file1(第一个文本文件的路径)、file2(第二个文本文件)和output_file(输出文件的路径。

在函数内部,我们使用open函数以适当的模式打开文件(r用于读取,w用于写入)。我们使用read方法读取两个文件的内容,使用splitlines将内容拆分为行,并将它们转换为集以进行有效比较。

然后,我们使用交集方法找到两个集合之间的公共线。最后,我们对公共行进行迭代,并将它们写入输出文件。

您可以根据实际文件名和位置自定义文件路径(Text1.txt、Text2.txt和Output.txt)。

请注意,这个Python解决方案假设文本文件不是很大,并且可以轻松地放入内存。如果您使用的是非常大的文件,则可能需要一种不同的方法来有效地处理它们。

使用File.ReadLines和LINQ相交

[System.Linq.Enumerable]::Intersect(
    [System.IO.File]::ReadLines( absolutepath	ofile1.txt ),
    [System.IO.File]::ReadLines( absolutepath	ofile2.txt )) |
    Set-Content path	ointersectedLines.txt

值得注意的是,行之间的交叉点区分大小写,如果您需要不区分大小写的比较,请使用以下内容:

[System.Linq.Enumerable]::Intersect(
    [System.IO.File]::ReadLines( absolutepath	ofile1.txt ),
    [System.IO.File]::ReadLines( absolutepath	ofile2.txt ),
    [System.StringComparer]::InvariantCultureIgnoreCase) |
    Set-Content path	ointersectedLines.txt




相关问题
Can Django models use MySQL functions?

Is there a way to force Django models to pass a field to a MySQL function every time the model data is read or loaded? To clarify what I mean in SQL, I want the Django model to produce something like ...

An enterprise scheduler for python (like quartz)

I am looking for an enterprise tasks scheduler for python, like quartz is for Java. Requirements: Persistent: if the process restarts or the machine restarts, then all the jobs must stay there and ...

How to remove unique, then duplicate dictionaries in a list?

Given the following list that contains some duplicate and some unique dictionaries, what is the best method to remove unique dictionaries first, then reduce the duplicate dictionaries to single ...

What is suggested seed value to use with random.seed()?

Simple enough question: I m using python random module to generate random integers. I want to know what is the suggested value to use with the random.seed() function? Currently I am letting this ...

How can I make the PyDev editor selectively ignore errors?

I m using PyDev under Eclipse to write some Jython code. I ve got numerous instances where I need to do something like this: import com.work.project.component.client.Interface.ISubInterface as ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

Pragmatically adding give-aways/freebies to an online store

Our business currently has an online store and recently we ve been offering free specials to our customers. Right now, we simply display the special and give the buyer a notice stating we will add the ...

Converting Dictionary to List? [duplicate]

I m trying to convert a Python dictionary into a Python list, in order to perform some calculations. #My dictionary dict = {} dict[ Capital ]="London" dict[ Food ]="Fish&Chips" dict[ 2012 ]="...

热门标签