C+++ 中的程序需要读取一个以 utf-8 编码的文件。 不幸的是, 使用 char* 它不能获得扩展字符( 等), wchar_ t* 错误地解读了它们。 我管理它的算法是 :
(1) 1) 创建新文件
(2) 将其名称改为[原名]Utf-16
(3) (3) 将原始文件复制为新文件, 同时转换
(4) 提取数据。
(5) 在不再需要临时文件时,删除此临时文件。
我被困在3: 3, 是否有某处的函数 比如“FileUTF8toUTF16”?
C+++ 中的程序需要读取一个以 utf-8 编码的文件。 不幸的是, 使用 char* 它不能获得扩展字符( 等), wchar_ t* 错误地解读了它们。 我管理它的算法是 :
(1) 1) 创建新文件
(2) 将其名称改为[原名]Utf-16
(3) (3) 将原始文件复制为新文件, 同时转换
(4) 提取数据。
(5) 在不再需要临时文件时,删除此临时文件。
我被困在3: 3, 是否有某处的函数 比如“FileUTF8toUTF16”?
这是我用的东西
int nLenWide = MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)(pData + nOffset),
(int)(nDataLen - nOffset), NULL, 0);
if (MultiByteToWideChar(CP_UTF8, 0, (LPCSTR)(pData + nOffset),
(int)(nDataLen - nOffset),
str.GetBuffer(nLenWide), nLenWide) != nLenWide)
{
str.ReleaseBuffer(0);
ASSERT(false);
return str;
}
str.ReleaseBuffer(nLenWide);
return str;
在其中 pData
是实际 utf-8 数据的 > bYTE
指针, noffset
通常为 3 (BOM)。
Maybe this will help you: Conversion between Unicode UTF-16 and UTF-8 in C++/Win32 | C++
守则:
//////////////////////////////////////////////////////////////////////////////
//
// *** Routines to convert between Unicode UTF-8 and Unicode UTF-16 ***
//
// By Giovanni Dicanio <giovanni.dicanio AT gmail.com>
//
// Last update: 2010, January 2nd
//
//
// These routines use ::MultiByteToWideChar and ::WideCharToMultiByte
// Win32 API functions to convert between Unicode UTF-8 and UTF-16.
//
// UTF-16 strings are stored in instances of CStringW.
// UTF-8 strings are stored in instances of CStringA.
//
// On error, the conversion routines use AtlThrow to signal the
// error condition.
//
// If input string pointers are NULL, empty strings are returned.
//
//
// Prefixes used in these routines:
// --------------------------------
//
// - cch : count of characters (CHAR s or WCHAR s)
// - cb : count of bytes
// - psz : pointer to a NUL-terminated string (CHAR* or WCHAR*)
// - str : instance of CString(A/W) class
//
//
//
// Useful Web References:
// ----------------------
//
// WideCharToMultiByte Function
// http://msdn.microsoft.com/en-us/library/dd374130.aspx
//
// MultiByteToWideChar Function
// http://msdn.microsoft.com/en-us/library/dd319072.aspx
//
// AtlThrow
// http://msdn.microsoft.com/en-us/library/z325eyx0.aspx
//
//
// Developed on VC9 (Visual Studio 2008 SP1)
//
//
//////////////////////////////////////////////////////////////////////////////
namespace UTF8Util
{
//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF8ToUTF16
// DESC: Converts Unicode UTF-8 text to Unicode UTF-16 (Windows default).
//----------------------------------------------------------------------------
CStringW ConvertUTF8ToUTF16( __in const CHAR * pszTextUTF8 )
{
//
// Special case of NULL or empty input string
//
if ( (pszTextUTF8 == NULL) || (*pszTextUTF8 == ) )
{
// Return empty string
return L"";
}
//
// Consider CHAR s count corresponding to total input string length,
// including end-of-string ( ) character
//
const size_t cchUTF8Max = INT_MAX - 1;
size_t cchUTF8;
HRESULT hr = ::StringCchLengthA( pszTextUTF8, cchUTF8Max, &cchUTF8 );
if ( FAILED( hr ) )
{
AtlThrow( hr );
}
// Consider also terminating
++cchUTF8;
// Convert to int for use with MultiByteToWideChar API
int cbUTF8 = static_cast<int>( cchUTF8 );
//
// Get size of destination UTF-16 buffer, in WCHAR s
//
int cchUTF16 = ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
MB_ERR_INVALID_CHARS, // error on invalid chars
pszTextUTF8, // source UTF-8 string
cbUTF8, // total length of source UTF-8 string,
// in CHAR s (= bytes), including end-of-string
NULL, // unused - no conversion done in this step
0 // request size of destination buffer, in WCHAR s
);
ATLASSERT( cchUTF16 != 0 );
if ( cchUTF16 == 0 )
{
AtlThrowLastWin32();
}
//
// Allocate destination buffer to store UTF-16 string
//
CStringW strUTF16;
WCHAR * pszUTF16 = strUTF16.GetBuffer( cchUTF16 );
//
// Do the conversion from UTF-8 to UTF-16
//
int result = ::MultiByteToWideChar(
CP_UTF8, // convert from UTF-8
MB_ERR_INVALID_CHARS, // error on invalid chars
pszTextUTF8, // source UTF-8 string
cbUTF8, // total length of source UTF-8 string,
// in CHAR s (= bytes), including end-of-string
pszUTF16, // destination buffer
cchUTF16 // size of destination buffer, in WCHAR s
);
ATLASSERT( result != 0 );
if ( result == 0 )
{
AtlThrowLastWin32();
}
// Release internal CString buffer
strUTF16.ReleaseBuffer();
// Return resulting UTF16 string
return strUTF16;
}
//----------------------------------------------------------------------------
// FUNCTION: ConvertUTF16ToUTF8
// DESC: Converts Unicode UTF-16 (Windows default) text to Unicode UTF-8.
//----------------------------------------------------------------------------
CStringA ConvertUTF16ToUTF8( __in const WCHAR * pszTextUTF16 )
{
//
// Special case of NULL or empty input string
//
if ( (pszTextUTF16 == NULL) || (*pszTextUTF16 == L ) )
{
// Return empty string
return "";
}
//
// Consider WCHAR s count corresponding to total input string length,
// including end-of-string (L ) character.
//
const size_t cchUTF16Max = INT_MAX - 1;
size_t cchUTF16;
HRESULT hr = ::StringCchLengthW( pszTextUTF16, cchUTF16Max, &cchUTF16 );
if ( FAILED( hr ) )
{
AtlThrow( hr );
}
// Consider also terminating
++cchUTF16;
//
// WC_ERR_INVALID_CHARS flag is set to fail if invalid input character
// is encountered.
// This flag is supported on Windows Vista and later.
// Don t use it on Windows XP and previous.
//
#if (WINVER >= 0x0600)
DWORD dwConversionFlags = WC_ERR_INVALID_CHARS;
#else
DWORD dwConversionFlags = 0;
#endif
//
// Get size of destination UTF-8 buffer, in CHAR s (= bytes)
//
int cbUTF8 = ::WideCharToMultiByte(
CP_UTF8, // convert to UTF-8
dwConversionFlags, // specify conversion behavior
pszTextUTF16, // source UTF-16 string
static_cast<int>( cchUTF16 ), // total source string length, in WCHAR s,
// including end-of-string
NULL, // unused - no conversion required in this step
0, // request buffer size
NULL, NULL // unused
);
ATLASSERT( cbUTF8 != 0 );
if ( cbUTF8 == 0 )
{
AtlThrowLastWin32();
}
//
// Allocate destination buffer for UTF-8 string
//
CStringA strUTF8;
int cchUTF8 = cbUTF8; // sizeof(CHAR) = 1 byte
CHAR * pszUTF8 = strUTF8.GetBuffer( cchUTF8 );
//
// Do the conversion from UTF-16 to UTF-8
//
int result = ::WideCharToMultiByte(
CP_UTF8, // convert to UTF-8
dwConversionFlags, // specify conversion behavior
pszTextUTF16, // source UTF-16 string
static_cast<int>( cchUTF16 ), // total source string length, in WCHAR s,
// including end-of-string
pszUTF8, // destination buffer
cbUTF8, // destination buffer size, in bytes
NULL, NULL // unused
);
ATLASSERT( result != 0 );
if ( result == 0 )
{
AtlThrowLastWin32();
}
// Release internal CString buffer
strUTF8.ReleaseBuffer();
// Return resulting UTF-8 string
return strUTF8;
}
} // namespace UTF8Util
//////////////////////////////////////////////////////////////////////////////
I m getting this linker error. I know a way around it, but it s bugging me because another part of the project s linking fine and it s designed almost identically. First, I have namespace LCD. Then I ...
I have been searching for sample code creating iterator for my own container, but I haven t really found a good example. I know this been asked before (Creating my own Iterators) but didn t see any ...
Is there an equivalent to tidy for HTML code for C++? I have searched on the internet, but I find nothing but C++ wrappers for tidy, etc... I think the keyword tidy is what has me hung up. I am ...
I m new to C++ and am wondering how much time I should invest in learning how to implement template classes. Are they widely used in industry, or is this something I should move through quickly?
Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...
Why is it when i do the following i get errors when relating to with wchar_t? namespace Foo { typedef std::wstring String; } Now i declare all my strings as Foo::String through out the program, ...
I cannot figure out how to marshal a C++ CBitmap to a C# Bitmap or Image class. My import looks like this: [DllImport(@"test.dll", CharSet = CharSet.Unicode)] public static extern IntPtr ...
Is it possible to check with the means of pure X11/Xlib only whether the given window is iconified/minimized, and, if it is, how?