SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Enthusiast
    Join Date
    Jan 2007
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Bi-Lingual site issues

    Hello,

    I came across this thread about best practices for bi-lingual (or multi-lingual) sites:

    http://www.sitepoint.com/forums/showthread.php?t=448179

    in this forum but rather than replying to it, I have opened this thread to ask some specific questions about some of the solutions proposed there - though it's no requirement to read that thread in order to reply.

    1. When we are talking about language data being in php files we are assuming that our enconding will be in ascii, right? Because I don't think that unicode/utf-8 php files would compile (ie work). So does this approach not work if I want to use utf-8 or am I missing something? What is the recommended encoding for a multi-lingual website? utf-8 or is it just simpler overall to use ascii (if we are not mixing languages that is...)

    2. If I have everything in the database, even every little menu/form word, such as login, register, about us, services, products, name, username, password etc, how do I query for these?
    a. One big query to get all words/phrases into a PHP structure and then use that as I am rendering the entire page?
    b. Make a query per item? This seems a bit excessive to me, because I know that making a couple of queries per page is not that bad but how about 20, 30 or more queries? Or is it ok as long as it's just a single connection?
    c. Some sort of compromise where I query once for a subset of the translated text that fills up most of the page and if I need to print a larger body of text query specifically?


    So I am thinking maybe the best "compromise" would be to put the little words/phrases that make up the site (menus/forms etc) into some php files and then bigger items such as descriptions, news articles etc into the database. I would not hesitate to throw everything in the database except I have not convinced myself that I have a tried, tested, practical and efficient scheme for rendering the pages.

    What do you think?


    Thanks!
    John

  2. #2
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    To question 1:
    UTF-8-encoded PHP files do indeed run just fine - in fact one of my sites is done entirely in UTF-8, from the static HTML headers to the PHP scripts themselves, even on down to the database. Everything is UTF-8, and I've not had any problems. Just be aware that with any multi-byte encoding scheme (like UTF-8), you'll need to use multi-byte-aware string functions (not an issue in my particular implementation, as everything I'm doing could be done just as easily in ASCII). See http://www.php.net/manual/en/ref.mbstring.php

    To question 2:
    Here's how my particular multi-lingual site is beginning to come together (still in the planning stages, though I've started some of the core code already):
    As I process a page, I build a list of string identifiers that are used on that page. When I'm ready to finally generate the output, I turn that list into a comma-delineated list of strings and run one single MySQL query using the IN syntax. This then gives me all the strings I need, and I only had to make a single query to get them all. An alternative approach I'm considering is to store separate copies of my templates, one set for each language. This will make managing the common strings easier (won't have to query the database for them), but complicates site maintenance (have multiple copies of otherwise identical files that need to be maintained).

    Anyway, that's my two cents.
    PHP questions? RTFM
    MySQL questions? RTFM

  3. #3
    SitePoint Enthusiast
    Join Date
    Jan 2007
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi kromey,

    Thanks for your response. How do you save php files as utf-8? I tried Eclipse (and even Netbeans and jEdit) but I could find no way that they let you choose an encoding to save your files. I could only do it with Notepad by choosing Save as... Unicode but that did not work, apache served the file as text, it did not invoke php on it.


    John

  4. #4
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    > An alternative approach I'm considering is to store separate copies of my
    > templates, one set for each language.

    That is how I do it as well; There are a number of advantages to this approach though, one being is that you can more easily have variable formatted layouts on a locale basis, for example?

    > How do you save php files as utf-8?

    In your favourite editor (I use jEdit) you can select which encoding to use, so read up on your editors documentation. In regards to using UTF-8, you need to declare this encoding in your templates meta data.

    Notepad from what I believe is basic ASCII only and doesn't support Unicode.

    You are better off to use the encoding in all of your forms as well, and just to be sure, send the encoding in your headers as well, which for the majority of browsers, the send header will over-ride the encoding specified in the template... But that is what you want anyways, since the encoding specified in the template is a fail safe - a fall back if you want?

    Then there is your database, it's better to specify UTF-8 as well, when you create you database schema, ie

    Code:
    // example for mysql
    create table ... (
    ... ) engine=innodb, charset=utf8, auto_increment=1
    Look at the Wikipedia as well

  5. #5
    SitePoint Enthusiast
    Join Date
    Jan 2007
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Indeed Utf-8 does work (using jEdit). I am now trying to figure it out in Eclipse as well. But notepad does indeed allow you to "Save as" in Unicode as well (but does not mention which encoding exactly) though it does not appear to be utf8 maybe utf-16 or some kind of Microsoft unicode which does not work (this is what threw me off).

    I am still wondering whether to implement this as files+DB mix or pure DB, I guess there is no one "correct" answer so I guess I will just make a decision and move on.


    Thanks
    John


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •