|
|||||||
New to SitePoint Forums? Register here for free!
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
#1 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
ms word curly quotes
I seem to be having a problem when someone pastes from ms word into a script and the curly quotes are saved to the database. Instead of showing up properly I end up with random, unknown characters and thus far I have been unable to parse out the quotes. Does anyone know how to do this?
|
|
|
|
|
|
#2 |
|
Test cases complete. 0 fails.
![]() Join Date: Feb 2001
Location: Melbourne Australia
Posts: 6,721
|
This will be due to a problem with the character encoding. Almost all character encodings share the same 127 characters, but angled quotes are not within the first 127 characters and are thus different according to the character encoding used.
If you're seeing two nonsense characters for each occurence of one curly quote character then it could be that you're storing it as UTF-8 but viewing it as ISO-8859-1. What's the character encoding of the page that you're viewing it on? |
|
|
|
|
|
#3 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
Standard pages using the iso set but I imagine mysql (which is storing the data) is not using the same set. Any way to remove the characters or force a text set to use before saving the data?
|
|
|
|
|
|
#4 |
|
Test cases complete. 0 fails.
![]() Join Date: Feb 2001
Location: Melbourne Australia
Posts: 6,721
|
Unless you have a funky version of MySQL it will store your characters in the same way as they are input and output.
Are you viewing the characters in a browser or in phpMyAdmin or in a shell? What matters is how they appear in the browser. If they are incorrect, then in your browser go to "view" -> "encoding" and change the encoding until you find one where it looks right. Once you have found this you will know what encoding the characters were entered in. It's at this time that you will realise that PHP has almost nonexistant support for converting between character encodings and you may be tempted to give up on non-ascii characters. |
|
|
|
|
|
#5 |
|
SitePoint Evangelist
![]() ![]() ![]() ![]() Join Date: May 2003
Location: nyc
Posts: 463
|
you might need to connect to mysql appending 'urf-8' to the db url:
//localhost/<yourdbname>?useUnicode=true&characterEncoding=UTF-8 besides that you have to make sure that your forms are set to accept data input as utf-8 and your output pages are set to display data as utf-8 as well good luck james |
|
|
|
|
|
#6 |
|
SitePoint Member
Join Date: Aug 2004
Location: US NorthWest
Posts: 9
|
Check out David Wheeler's Quoter. Basically, you need to look for certain values in the submitted text and replace them with HTML "
|
|
|
|
|
|
#7 |
|
SitePoint Enthusiast
![]() Join Date: Sep 2004
Location: UK
Posts: 78
|
My guess would be that he's checking the database contents using the shell. The effect on a Windows shell versus a UNIX shell is different, but stem from the same problem.
I think it is the shell's handling of the characters that is at fault, and not MySQL. MySQL defaults to Latin 1 which should be perfectly fine for your average db application in Western Europe. As people have already said, try it out in PHPMyAdmin. If that gives the same (or still an incorrect) result then you really will have to start messing about with character sets. |
|
|
|
|
|
#8 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
Actually the data is all being passed via web script from an html textarea to perl to mysql back to perl and output as html. I will try the quote transforming and also take a look at the actual data in mysql to see where it's going south.
|
|
|
|
|
|
#9 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
Ok, looking at the mysql data via ssh I am seeing some wierd characters directly in the sql table like:
Code:
joint heritage. |
|
|
|
|
|
#10 |
|
SitePoint Enthusiast
![]() Join Date: Sep 2004
Location: UK
Posts: 78
|
Ted,
Do you have something like PHPMyAdmin or equivalent? I ask only because I'm fairly certain that the shell will display certain characters incorrectly even though the data may be ok. I'm now out of my depth on this topic unfortunately. Good luck! |
|
|
|
|
|
#11 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
Looking vvia phpMyAdmin I can still see the misformated characters. To be clear this data is coming from Word (with it's odd formating) into an html form, saved to mysql via perl and then viewed again (with php, ssh, perl which all show the bad characters).
|
|
|
|
|
|
#12 |
|
SitePoint Mentor
![]() Join Date: Aug 2003
Location: Southern California
Posts: 2,730
|
I've tried a few regexp lines to no avail... any other ideas on the character encoding?
|
|
|
|
|
|
#13 |
|
SitePoint Member
Join Date: Sep 2004
Location: Massachusetts
Posts: 11
|
I've had to deal with the same problem recently. The curly quotes from MS Word, when pasted into a HTML textarea, appear as straight slanted quotes. Then when you submit the form and display on a web page they should appear as the straight up and down quotes. Instead I was getting little square boxes.
My solution (we use CFMX/MySQL) was to use a ColdFusion Replace() function. The curly quotes in MS Word are ASCII characters 8220 (left quote) and 8221 (right quote). The quotes you want are ASCII character 34. So, in the same template where I have the textarea boxes, I also use the Replace() function. You mentioned you are using Perl. I'm not familiar enough with Perl but I would think there is a replace function you could write up that would do the same thing I'm doing in CFMX. Here's the syntax I'm using in CFMX (without the starting and ending brackets): cfset form.abstract=#replace(form.abstract, chr(8220), chr(34), "all")# where "abstract" is the name of the textarea. I have a similar line for ASCII character 8221. Hope this is helpful. |
|
|
|
|
|
#14 |
|
SitePoint Enthusiast
![]() Join Date: Sep 2004
Location: UK
Posts: 78
|
rvanderth,
Your solution is similar to the solution I used myself (eventually) however I used the SQL function REPLACE. The logic is the same however. Also another tip, opening the pasted MS Word text in xEmacs will show you the codes that Word has used, i.e. 8220 as rvanderth writes would be shown as /220. |
|
|
|
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
|
All times are GMT -7. The time now is 06:56.







Good luck!


Linear Mode
