SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Form is inserting unwanted characters into DB

    I have a form in which users are submitting information ... when I am pulling the info from the db to display in a webpage, I am getting this character...(&#8206) ‎ ..tacked on to the end of some of the text entries as you can see below. It happens when someone copy and pastes their entry from another web page ... I think.

    Mike's Electric Vehicles‎(&#8206) ‎
    3100 Airport Avenue
    San Diego CA 92110
    (919) 456-4890‎(&#8206) ‎

    How can I program the form to ignore this entry or how do I replace it with " "

    thanks in advance for any help!
    Last edited by idf; Oct 30, 2008 at 17:55.

  2. #2
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,290
    Mentioned
    198 Post(s)
    Tagged
    3 Thread(s)
    It might not work, but an easy thing to try would be to trim() the input variables.

  3. #3
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    It might not work, but an easy thing to try would be to trim() the input variables.
    I added trim() and I think it may have worked ... I am waiting to see when it happens again so I can track down exactly what might be causing it.

    Even if you type that entity like this right here .... ↚ .... it will get stripped from this post. You should see 4 dots a space where I typed the entity and then 4 more dots. So it is hard to even find help on this problem ...

    it is definitely happening when a user copies text from either another form or from another web page.

    thank you for your help!

  4. #4
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,290
    Mentioned
    198 Post(s)
    Tagged
    3 Thread(s)
    Unicode #8206 is "Right to Left". That is, if I wanted to write something like Hebrew in the middle of English (I suppose I'd need a Left to Right after it, whatever that one is).
    But I highly doubt users are intending this, most likely a copy/paste character encoding thing.
    The r2l is a control character, and PHP trim() strips whitespace, but I don't know if it considers conttrol characters as whitespace characters.
    Hopefully it does.

  5. #5
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    Unicode #8206 is "Right to Left". That is, if I wanted to write something like Hebrew in the middle of English (I suppose I'd need a Left to Right after it, whatever that one is).
    But I highly doubt users are intending this, most likely a copy/paste character encoding thing.
    The r2l is a control character, and PHP trim() strips whitespace, but I don't know if it considers conttrol characters as whitespace characters.
    Hopefully it does.
    I have not seen it again after I put the trim() function in place but just to keep this post useful to someone else running into this I will confirm this after a few more days of data entry...

    thanks again for your help!

  6. #6
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    Unicode #8206 is "Right to Left". That is, if I wanted to write something like Hebrew in the middle of English (I suppose I'd need a Left to Right after it, whatever that one is).
    But I highly doubt users are intending this, most likely a copy/paste character encoding thing.
    The r2l is a control character, and PHP trim() strips whitespace, but I don't know if it considers conttrol characters as whitespace characters.
    Hopefully it does.
    Ok that character is back again ... is there another way that you know of that might work... like something to strip it off before it posts to the database?

    Also, I have about 200 records in the DB with that character tacked on the end of phone numbers and or the title of the business that registered. Is there SQL command that will remove them or do I have to edit each record by hand?

    thanks again for your help!

  7. #7
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,290
    Mentioned
    198 Post(s)
    Tagged
    3 Thread(s)
    Ideally the RTL character should be removed before it's inserted into the table. But you will want to make sure it's removed after it's retrieved from the table too, unless you can clean up existing entries (once you're sure it won't get back in again). I'm not advanced in MySQL, so there may be some way to do it using "pure" MySQL. But other than that (if there's a way) you could do a SELECT, use PHP to clean it up, and then a subsequent UPDATE. A lot of resource usage, but it may be the only way. If you post a thread in the MySQL forum, http://www.sitepoint.com/forums/forumdisplay.php?f=182 hopefully someone more knowledgeable than I can come up with a query that can strip out the RTLs without using PHP as a middle-man. Something like a recursive
    UPDATE field SET field = [field w/o RTLs]
    sure would be nice.

  8. #8
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    Ideally the RTL character should be removed before it's inserted into the table. But you will want to make sure it's removed after it's retrieved from the table too, unless you can clean up existing entries (once you're sure it won't get back in again). I'm not advanced in MySQL, so there may be some way to do it using "pure" MySQL. But other than that (if there's a way) you could do a SELECT, use PHP to clean it up, and then a subsequent UPDATE. A lot of resource usage, but it may be the only way. If you post a thread in the MySQL forum, http://www.sitepoint.com/forums/forumdisplay.php?f=182 hopefully someone more knowledgeable than I can come up with a query that can strip out the RTLs without using PHP as a middle-man. Something like a recursive
    UPDATE field SET field = [field w/o RTLs]
    sure would be nice.
    Ok Cool I will look to the MySql forum to try and remove it from the DB. Do you know what code I would need to use to remove it before it goes into the DB?

    Is it like string replace function or something like that? I have a data company inputting data into my website and it is coming from something they are doing....

    thanks again for your help!

  9. #9
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,290
    Mentioned
    198 Post(s)
    Tagged
    3 Thread(s)
    I've been trying to get a L2R (unicode 8206 is ltr not rtl as I posted before, :d'oh:) into a string to test with, but no luck yet.
    I was thinking you might be able to use ctype, but that won't work for your needs with mixed strings. This leaves using a regex replace. One possible problem is deciding what to replace with. If they are always at the end of a line, should they be replaced with a space or a newline?

    Maybe something like this would work, but I haven't been able to test it since I can't get a string containing the character.

    PHP Code:
    <?php
    $string 
    'Whatever, the ltr control character [?] and the rest.';
    $pattern '/(\x{200E})/';
    $replacement ' ';
    echo 
    preg_replace($pattern$replacement$string);
    ?>

  10. #10
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,104
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I was getting strange chars and wrote this to clean them up, change it to suite your needs

    PHP Code:
    $sqlaa = @mysql_query("SELECT * FROM users_data WHERE user_county LIKE 'Â%'"); 
        if(
    mysql_numrows($sqlaa) == '0'){
        echo 
    'nothing to clean up';
        }
        else{
            while ( 
    $row mysql_fetch_array($sqlaa) ) { 

    //        echo  $row["ct"].'<br>';
            
    if ( preg_match('/^Â/'$row["user_county"]) ) {  
            echo  
    $row["user_county"].' is changed to<br>';
                
    //            $out = ereg_replace("[^[:alpha:]]", "", $row["ct"]);
                
    $out=preg_replace("/[^a-z \d]/i"""$row["user_county"]);
                
    $out trim($out);
                echo  
    $out.'<br><br>';
                
    $uid $row["user_id"];
                      
    $sql2 "UPDATE users_data SET
                              user_county = '
    $out
                              WHERE user_id = '
    $uid'";
                      if (
    mysql_query($sql2)) {
    //                    echo('<p>Database was cleaned up.</p>');
                      
    } else {
                        die(
    '<p>Error cleaning Database: ' .
                            
    mysql_error() . '</p>');
                      }
                        
                }
    // preg_match
    //            else {echo 'Nothing to clean up';}
            
    }// end sql while
        
    }//else 
    hth
    What I lack in acuracy I make up for in misteaks

  11. #11
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I've been trying to get a L2R (unicode 8206 is ltr not rtl as I posted before, :d'oh into a string to test with, but no luck yet.
    I was thinking you might be able to use ctype, but that won't work for your needs with mixed strings. This leaves using a regex replace. One possible problem is deciding what to replace with. If they are always at the end of a line, should they be replaced with a space or a newline?

    Maybe something like this would work, but I haven't been able to test it since I can't get a string containing the character.

    PHP Code:
    <?php
    $string 
    'Whatever, the ltr control character [?] and the rest.';
    $pattern '/(\x{200E})/';
    $replacement ' ';
    echo 
    preg_replace($pattern$replacement$string);
    ?>
    Ok ... let me see if that works... the character comes at the end of each string.

    thank you for your help.

  12. #12
    SitePoint Enthusiast
    Join Date
    Apr 2008
    Posts
    49
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lorenw View Post
    I was getting strange chars and wrote this to clean them up, change it to suite your needs

    PHP Code:
    $sqlaa = @mysql_query("SELECT * FROM users_data WHERE user_county LIKE '%'"); 
        if(
    mysql_numrows($sqlaa) == '0'){
        echo 
    'nothing to clean up';
        }
        else{
            while ( 
    $row mysql_fetch_array($sqlaa) ) { 

    //        echo  $row["ct"].'<br>';
            
    if ( preg_match('/^/'$row["user_county"]) ) {  
            echo  
    $row["user_county"].' is changed to<br>';
                
    //            $out = ereg_replace("[^[:alpha:]]", "", $row["ct"]);
                
    $out=preg_replace("/[^a-z \d]/i"""$row["user_county"]);
                
    $out trim($out);
                echo  
    $out.'<br><br>';
                
    $uid $row["user_id"];
                      
    $sql2 "UPDATE users_data SET
                              user_county = '
    $out
                              WHERE user_id = '
    $uid'";
                      if (
    mysql_query($sql2)) {
    //                    echo('<p>Database was cleaned up.</p>');
                      
    } else {
                        die(
    '<p>Error cleaning Database: ' .
                            
    mysql_error() . '</p>');
                      }
                        
                }
    // preg_match
    //            else {echo 'Nothing to clean up';}
            
    }// end sql while
        
    }//else 
    hth
    Currently the string's of text that I am trying to fix all look like this.

    Smith Automotive‎
    (866) 455-0376‎

    Maxwell Town North Mitsubishi‎
    (512) 451-7411‎

    etc. etc.

    Thank you for your function I will try it!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •