SitePoint Sponsor

User Tag List

Results 1 to 4 of 4

Hybrid View

  1. #1
    SitePoint Zealot Rio's Avatar
    Join Date
    Nov 2001
    Location
    United Kingdom
    Posts
    171
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Form validation to detect multibyte characters

    Hello there,

    I'm trying to make a form that only accept single-byte characters.

    Is there any simple way to detect multibyte characters in the form fields on submission and return an alert dialogue box if it detect one. I'm not very good with JavaScript and I've been trying all day without much success.

    Thanks,

    Rio
    ~~ My website - Can You Chopstick? ~~
    http://www.canyouchopstick.com/

  2. #2
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's the encoding (characterset), which decides whether a specific character is encoded as a single or multiple bytes. For example, if you use ISO-8859-1 as encoding, the character is encoded as a single byte, but if you use UTF-8 as encoding, it's encoded as 2 bytes. So to know how many bytes a character will be encoded with, you need to know which characterset, you're going to transport the text in.

  3. #3
    SitePoint Zealot Rio's Avatar
    Join Date
    Nov 2001
    Location
    United Kingdom
    Posts
    171
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi there,

    Thanks for the imput. It seems far more complicated than I thought.

    This form in question is a very simple registration for a newsletter. I've put together a regex like below and matched it with the imput, which seems to work so far.

    Does it look right to you?

    Code:
    var sb = new RegExp("[a-zA-Z0-9_|\<|\>|\"|\'|\%|\;|\(|\)|\&|\+|\-|\,|\.\|#|\?|\*]", "i");
    ~~ My website - Can You Chopstick? ~~
    http://www.canyouchopstick.com/

  4. #4
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I guess that, what you're asking, is how to strip out characters, which are not in the lower range (0-127) of UTF-8 (Sometime referred to as ASCII). You could use the following regex:
    Code:
    lowerAscii = function(txt) {
      return txt.replace(/[^\x00-\x7F]/g, "");
    }
    The actual number of bytes used to encode characters, still depends on the encoding; However, most encodings with variable byte-length (Such as UTF-8), will encode these characters with a single byte.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •