English 中文(简体)
当文件编码未知时如何使用 ReadAllText
原标题:How to use ReadAllText when file encoding unknown

Im 正在读取有 < 强 > readallText 的文件

    String[] values = File.ReadAllText(@"c:\c\file.txt").Split( ; );

    int i = 0;

    foreach (String s in values)
    {
        System.Console.WriteLine("output: {0} {1} ", i, s);
        i++;
    }

如果我试着阅读一些文件, 有时我得到错误的字符( 对于 {}...) 。 输出是像?, 因为它的编码有问题 :

output: 0 TEST
output: 1 A??O?

一种解决办法是将编码设置在 ReadAllText 中, 让我们说一些可以解决问题的编码( ” c:\ c\ file. txt”, 编码. UTF8 ) 。 但如果我仍然能以输出方式获得? 如果我不知道文件的编码, 那么如果每个文件都有不同的编码呢? 那么, 如果每个文件都有不同的编码呢? 那么用 C# 做它的最佳方法是什么? 谢谢

最佳回答

唯一可靠地这样做的方法就是在文本文件开始处寻找>byte 命令标记 。 (这种粗略更一般地代表所使用的字符编码的内含性,但也代表编码(例如UTF8、UTF16、UTF32) 。 不幸的是,这种方法只对基于 Unicode 的编码有效,而在此之前(必须使用更不可靠的方法)没有任何办法可用。

StreamReader 类型支持检测这些标记以确定编码 - 您只需要将旗帜传到参数本身:

new System.IO.StreamReader("path", true)

您可以检查 stremReader.CrentreEncoding 的值以确定文件使用的编码。 但是请注意, 如果不存在字节编码标记, 那么 CrentEncoding 将默认为 Encoding.Default

查询编码项目解决方案以检测编码

问题回答

您必须先检查文件编码编码。 请尝试此

System.Text.Encoding enc = null; 
System.IO.FileStream file = new System.IO.FileStream(filePath, 
    FileMode.Open, FileAccess.Read, FileShare.Read); 
if (file.CanSeek) 
{ 
    byte[] bom = new byte[4]; // Get the byte-order mark, if there is one 
    file.Read(bom, 0, 4); 
    if ((bom[0] == 0xef && bom[1] == 0xbb && bom[2] == 0xbf) || // utf-8 
        (bom[0] == 0xff && bom[1] == 0xfe) || // ucs-2le, ucs-4le, and ucs-16le 
        (bom[0] == 0xfe && bom[1] == 0xff) || // utf-16 and ucs-2 
        (bom[0] == 0 && bom[1] == 0 && bom[2] == 0xfe && bom[3] == 0xff)) // ucs-4 
    { 
        enc = System.Text.Encoding.Unicode; 
    } 
    else 
    { 
        enc = System.Text.Encoding.ASCII; 
    } 

    // Now reposition the file cursor back to the start of the file 
    file.Seek(0, System.IO.SeekOrigin.Begin); 
} 
else 
{ 
    // The file cannot be randomly accessed, so you need to decide what to set the default to 
    // based on the data provided. If you re expecting data from a lot of older applications, 
    // default your encoding to Encoding.ASCII. If you re expecting data from a lot of newer 
    // applications, default your encoding to Encoding.Unicode. Also, since binary files are 
    // single byte-based, so you will want to use Encoding.ASCII, even though you ll probably 
    // never need to use the encoding then since the Encoding classes are really meant to get 
    // strings from the byte array that is the file. 

    enc = System.Text.Encoding.ASCII; 
}

In my case, I was creating some simple json file and was getting same error. The problem was creating the file using Visual Studio (2019 at the moment).

I am sure you can find some configuration in VS options to deal with this issue. However, the quickiest way I ve found was to create the same file and content using Notepad++. You can set the encoding in Notepad++ by visiting Encoding top menu. And I believe you may also find similar config in other text editors.





相关问题
Anyone feel like passing it forward?

I m the only developer in my company, and am getting along well as an autodidact, but I know I m missing out on the education one gets from working with and having code reviewed by more senior devs. ...

NSArray s, Primitive types and Boxing Oh My!

I m pretty new to the Objective-C world and I have a long history with .net/C# so naturally I m inclined to use my C# wits. Now here s the question: I feel really inclined to create some type of ...

C# Marshal / Pinvoke CBitmap?

I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...

How to Use Ghostscript DLL to convert PDF to PDF/A

How to user GhostScript DLL to convert PDF to PDF/A. I know I kind of have to call the exported function of gsdll32.dll whose name is gsapi_init_with_args, but how do i pass the right arguments? BTW, ...

Linqy no matchy

Maybe it s something I m doing wrong. I m just learning Linq because I m bored. And so far so good. I made a little program and it basically just outputs all matches (foreach) into a label control. ...

热门标签