English 中文(简体)
Dealing with eacute and other special characters using Oracle, PHP and Oci8
原标题:

Hi I am trying to store names into an Oracle database and fetch them back using PHP and oci8.

However, if I insert the é directly into the Oracle database and use oci8 to fetch it back I just receive an e

Do I have to encode all special characters (including é) into html entities (ie: é) before inserting into database ... or am I missing something ?

Thx


UPDATE: Mar 1 at 18:40

found this function: http://www.php.net/manual/en/function.utf8-decode.php#85034

function charset_decode_utf_8($string) {
    if(@!ereg("[200-237]",$string) && @!ereg("[241-377]",$string)) {
        return $string;
    }
$string = preg_replace("/([340-357])([200-277])([200-277])/e"," &# .((ord( \1 )-224)*4096 + (ord( \2 )-128)*64 + (ord( \3 )-128)). ; ",$string);
$string = preg_replace("/([300-337])([200-277])/e"," &# .((ord( \1 )-192)*64+(ord( \2 )-128)). ; ",$string);
return $string;
}

seems to work, although not sure if its the optimal solution


UPDATE: Mar 8 at 15:45

Oracle s character set is ISO-8859-1.
in PHP I added:

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1");

to force the oci8 connection to use that character set. Retrieving the é using oci8 from PHP now worked ! (for varchars, but not CLOBs had to do utf8_encode to extract it )
So then I tried saving the data from PHP to Oracle ... and it doesnt work..somewhere along the way from PHP to Oracle the é becomes a ?


UPDATE: Mar 9 at 14:47

So getting closer. After adding the NLS_LANG variable, doing direct oci8 inserts with é works.

The problem is actually on the PHP side. By using ExtJs framework, when submitting a form it encodes it using encodeURIComponent.
So é is sent as %C3%A9 and then re-encoded into é.
However it s length is now 2 (strlen($my_sent_value) = 2) and not 1. And if in PHP I try: $my_sent_value == é = FALSE

I think if I am able to re-encode all these characters in PHP back into lengths of byte size 1 and then inserting them into Oracle, it should work.

Still no luck though


UPDATE: Mar 10 at 11:05

I keep thinking I am so close (yet so far away).

putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9"); works very sporadicly.

I created a small php script to test:

header( Content-Type: text/plain; charset=ISO-8859-1 );
putenv("NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P9");
$conn= oci_connect("user", "pass", "DB");
$stmt = oci_parse($conn, "UPDATE temp_tb SET string_field =  |é| ");
oci_execute($stmt, OCI_COMMIT_ON_SUCCESS);

After running this once and loggin into the Oracle Database directly I see that STRING_FIELD is set to |¿|. Obviously not what I had come to expect from my previous experience.
However, if I refresh that PHP page twice quickly.... it worked !!!
In Oracle I correctly saw |é|.

It seems like maybe the environment variable is not being correctly set or sent in time for the first execution of the script, but is available for the second execution.

My next experiment is to export the variable into PHP s environment, however, I need to reset Apache for that...so we ll see what happens, hopefully it works.

最佳回答

This is what I finally ended up doing to solve this problem:

Modified the profile of the daemon running PHP to have:

NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1

So that the oci8 connection uses ISO-8859-1.

Then in my PHP configuration set the default content-type to ISO-8859-1:

default_charset = "iso-8859-1"

When I am inserting into an Oracle Table via oci8 from PHP, I do:

utf8_decode($my_sent_value)

And when receiving data from Oracle, printing the variable should just work as so:

echo $my_received_value

However when sending that data over ajax I have had to use:

utf8_encode($my_received_value)
问题回答

I presume you are aware of these facts:

  • There are many different character sets: you have to pick one and, of course, know which one you are using.
  • Oracle is perfectly capable of storing text without HTML entities (é). HTML entities are used in, well, HTML. Oracle is not a web browser ;-)

You must also know that HTML entities are not bind to a specific charset; on the contrary, they re used to represent characters in a charset-independent context.

You indistinctly talk about ISO-8859-1 and UTF-8. What charset do you want to use? ISO-8859-1 is easy to use but it can only store text in some latin languages (such as Spanish) and it lacks some common chars like the € symbol. UTF-8 is trickier to use but it can store all characters defined by the Unicode consortium (which include everything you ll ever need).

Once you ve taken the decision, you must configure Oracle to hold data in such charset and choose an appropriate column type. E.g., VARCHAR2 is fine for plain ASCII, NVARCHAR2 is good for UTF-8.

If you have different charsets between the server side code (php in this case) and the Oracle database, you should set server side code charset in the Oracle connection, then Oracle made the conversion.

Example: Let s assume:

  • php charset utf-8 (default).
  • Oracle charset AMERICAN_AMERICA.WE8ISO8859P1

In the connection to Oracle made by php you should set UTF8 (third parameter).

oci_pconnect("USER", "PASS", "URL"),"UTF8");

Doing this, you write code in utf-8 (not doing any conversion at all) and get utf-8 from the database through this connection.

So you could write something like SELECT * FROM SOME_TABLE WHERE TEXT = SOME TEXT LIKE áéíóú Ñ and also get utf-8 text as a result.

According to the php documentation, by default, Oracle client (oci_pconnect) takes the NLS_LANG environment variable from the Operating system. Some debian based systems has no NLS_LANG enviromental variable, so I think Oracle client use it s own charset (AMERICAN_AMERICA.WE8ISO8859P1) if we don t specify the third parameter.

If you really cannot change the character set that oracle will use then how about Base64 encoding your data before storing it in the database. That way, you can accept characters from any character set and store them as ISO-8859-1 (because Base64 will output a subset of the ASCII character set which maps exactly to ISO-8859-1). Base64 encoding will increase the length of the string by, on average, 37%

If your data is only ever going to be displayed as HTML then you might as well store HTML entities as you suggested, but be aware that a single entity can be up to 10 characters per unencoded character e.g. ϑ is ϑ

I had to face this problem : the LatinAmerican special characters are stored as "?" or "¿" in my Oracle database ... I can t change the NLS_CHARACTER_SET because we re not the database owners.

So, I found a workaround :

1) ASP.NET code Create a function that converts string to hexadecimal characters:

    public string ConvertirStringAHex(String input)
    {
        Encoding encoding = System.Text.Encoding.GetEncoding("ISO-8859-1");
        Byte[] stringBytes = encoding.GetBytes(input);
        StringBuilder sbBytes = new StringBuilder(stringBytes.Length);
        foreach (byte b in stringBytes)
        {
            sbBytes.AppendFormat("{0:X2}", b);
        }
        return sbBytes.ToString();
    }

2) Apply the function above to the variable you want to encode, like this

     myVariableHex = ConvertirStringZHex( myVariable );

In ORACLE, use the following:

 PROCEDURE STORE_IN_TABLE( iTEXTO IN VARCHAR2 )
 IS
 BEGIN
   INSERT INTO myTable( SPECIAL_TEXT )  
   VALUES ( UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW( iTEXTO ));
   COMMIT;
 END;

Of course, iTEXTO is the Oracle parameter which receives the value of "myVariableHex" from ASP.NET code.

Hope it helps ... if there s something to improve pls don t hesitate to post your comments.

Sources: http://www.nullskull.com/faq/834/convert-string-to-hex-and-hex-to-string-in-net.aspx https://forums.oracle.com/thread/44799





相关问题
Export tables from SQL Server to be imported to Oracle 10g

I m trying to export some tables from SQL Server 2005 and then create those tables and populate them in Oracle. I have about 10 tables, varying from 4 columns up to 25. I m not using any constraints/...

Connecting to Oracle 10g with ODBC from Excel VBA

The following code works. the connection opens fine but recordset.recordCount always returns -1 when there is data in the table. ANd If I try to call any methods/properties on recordset it crashes ...

How to make a one to one left outer join?

I was wondering, is there a way to make a kind of one to one left outer join: I need a join that matches say table A with table B, for each record on table A it must search for its pair on table B, ...

Insert if not exists Oracle

I need to be able to run an Oracle query which goes to insert a number of rows, but it also checks to see if a primary key exists and if it does, then it skips that insert. Something like: INSERT ALL ...

How can I store NULLs in NOT NULL field?

I just came across NULL values in NOT-NULL fields in our test database. How could they get there? I know that NOT-NULL constraints can be altered with NOVALIDATE clause, but that would change table s ...

Type reference scope

I m studying databases and am currently working on a object-relational DB project and I ve encountered a small problem with the number of possible constraints in an object table. I m using "Database ...

OracleParameter and DBNull.Value

we have a table in an Oracle Database which contains a column with the type Char(3 Byte). Now we use a parameterized sql to select some rows with a DBNull.Value and it doesn t work: OracleCommand ...

热门标签