Serializing PHP data structures for Javascript

Tweet

Currently messing with some code that serializes PHP data structures into Javascript and back, as a means for easy data exchange.

Going from Javascript to PHP is fairly easily done (see user comments at http://www.php.net/serialize) by generating strings PHP is capable of unserializing e.g;


String.prototype.toPHP=function() {
var s = this
s=s.replace(/\/g, "\\")
s=s.replace(/"/g, "\"")
s=s.replace(/n/g, "\n")
s=s.replace(/r/g, "")
return 's:'+s.length+':"'+s+'";';
}

That modifies Javascript String class so you can call toPHP() on any string and get a PHP-serialized representation of it.

Going the other way, with what I’ve currently got, if I have a simple PHP array like;


$a = array('x','y');

a “serialized” string ready for “unpacking” by Javascript, would be;


'new Function('var a = new Array();a[0]="x";a[1]="y";return a');'

If I’ve got that string in a Javascript variable called “serializedData” (perhaps fetched via XMLHttpRequest), I can use it like;


// Creates a function object
unserializedDataFunc = eval(serializedData);

// Get back the data from the function object
unserializedData = unserializedDataFunc();

The reason for using a Function object combined with eval() is so I can avoid all naming conflicts in Javascript. Not 100% sure this is the best mechanism to “serialize” a data structure for Javascript – escaping quotes is likely to be a headache. It does help keep it completely anonymous until I specifically assign it to a variable but anyone got any better approaches? I want to avoid all XML formats btw, to have as few “layers” as possible between PHP and Javascript and keep overhead low as well as being easy to work with.

Another thing proving tricky is mapping PHP arrays, which can be both indexed and associative e.g.;


$a = array('x','y','z','a'=>1,'b'=>2);

The Javascript Array object only keeps track of indexed elements – I can add properties to it e.g.


var a = new Array();
a.push('x');
a.push('y');
a.push('z');
a["a"] = 1;
a["b"] = 2;

But I need a for loop for the indexed elements and a for..in loop for the properties. It’s tempting to implement a new PHPArray class in Javascript but think that’s heading for trouble. Not sure if it’s really a problem, as it’s rare you’d actually need to iterate over both indexed and associative elements but if anyone has a better idea, would love to hear it.

The general mission here, BTW, is a simple mechanism to get XUL talking to PHP.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Tim

    I’ve used a similar mechanism with some scripts called JSRS (Javascript Remoting S? – forget the actual name). Anyway, it used WDDX to translate backwards and forwards between Javascript and PHP (or ASP or Perl or whatever language you have handy).

    http://www.ashleyit.com/rs/main.htm

    I know that WDDX is an XML based format, but it may give you some ideas on how to solve some of your problems. From memory, it doesn’t allow you to use text based indexes in the arrays, but I’m not 100% sure on that one.

    I ended up removing it from my project as it was causing problems with cross browser compatability and also the entity translation because of the xml formats was a headache to deal with as well.

  • Herman

    What about using WDDX? There are libraries for js and php as well as other common languages.

  • Alan Knowles

    to HTML_Javascript_Convert works pretty well for this.
    .. this is the embedded version in flexy.
    http://cvs.php.net/co.php/pear/HTML_Template_Flexy/examples/tests/flexy_tojavascript.html?r=1.1
    http://cvs.php.net/co.php/pear/HTML_Template_Flexy/examples/tests/results1/flexy_tojavascript.html.en.php?r=1.1

    I have a project ongoing with XUL – and we are using Flexy with great success (as long as I fix all the bugs..) –

    There is even a XUL test in the testsuite..
    http://cvs.php.net/co.php/pear/HTML_Template_Flexy/examples/tests/namespaces.html?r=1.1

  • http://www.phppatterns.com HarryF

    Alan – thanks for the tip off. Had no idea HTML_Javascript had PHP to Javascript type conversion. Any chance you’ll write up some of your experiences with XUL sometime?

    Re: wddx – there’s an old SDK (from the days Allaire) at http://www.openwddx.org/downloads/, with implementations for most languages. Alain again has put together PEAR::XML_WDDX as a substitute for PHP’s WDDX extension. All that said, I still don’t want to use WDDX, partly because of overhead and party because I think PHP and Javascript can be brought much closer together with “native” serialization – I’m not worried about other languages in this case.

  • http://www.phppatterns.com HarryF

    PS: Some prototype code is now up here: http://www.sitepoint.com/forums/showthread.php?t=165471

  • jayboots

    Doesn’t this sound like a job for YAML?

  • http://www.phppatterns.com HarryF

    Doesn’t this sound like a job for YAML?

    Almost yes. I notice there’s this http://sourceforge.net/projects/yaml-javascript. Not sure what the current status of YAML is in PHP (believe there’s an extension floating around somewhere.

    But think what I’m doing brings PHP and Javascript even closer than YAML can, meaning there’s a very low overhead for serializing / unserializing

  • timmy

    ya want to have a look at neuromancer. its for coldfusion but gives very high performance for serializing/deserializing js arrays. not sure if someones done something similar in php :(

  • Anonymous

    this JS Object
    http://www.devpro.it/javascript_id_102.html
    is maybe the faster serializer / unserializer javascript version.

    Regards

  • Varrah at _nospam_ mail dot ru

    this JS Object
    http://www.devpro.it/javascript_id_102.html
    is maybe the faster serializer / unserializer javascript version.

    Be careful – this script makes a wrong conversion for the Unicode strings. Since JavaScript str.length function counts symbols, not bytes, for the string (as do PHP serialize() and unserialize() functions), strings from e.g. UTF-8 non-latin-1 arrays (Eastern and Nothern Europe, Asian etc. languages) will not be unserialized by PHP correctly, after such JavaScript serialisation.

  • andr3a

    Hello Varrah,
    I am the author of that JS Object. Could you please give me a detailed description of the problem you’ve encountered ? Is there a JS/PHP serialize/unserialize function, object or class that has not this problem ? (maybe usefull, for me, to understand what you’re speaking about)

    Thank you for debug, best regards.

    andr3a at _nospam_ 3site dot it

  • Varrah

    Hi again

    here’s a test-script I’ve wrote:
    http://neworld.spb.ru/tmp/serialize/serial.php

    Several notes:
    1. I’ve set UTF-8 as the default page encoding, as well as the form accept-charset, don’t waste your time trying the script with other encodings, it can give other results, and actually the question is not in the encoding.
    2. When one uses only Western-Latin symbols it’s ok for all three results, so try something better (you can use this link: http://www.guntherkrauss.de/computer/xml/daten/edicode.html). Practice shows, that using of the e.g. Cyrrilic (e.g. ЯЯЯ – 3 reversed R leters, named “ya” in Russian) symbols gives different results in my (Varrah’s) and Andrea’s JS-serialization results.
    3. What’s worse, is that Usage of e.g. Japanese symbols (especially Kanji, not Kanaб e.g. 車) gives different result in all three methods – PHP, mine and Andrea’s.

    The problem is, that PHP counts number of bytes in some other way than JavaScript does, when calling str.length.
    I’ve tried to manage it through getting the charcode of the Unicode char and count how many bytes it’s needed to store this code, but this goes incorrect, I believe there’s some other, more convinient way to make it correct, i.e. the same as in PHP. If anybody here knows this way, please join the conversation :-)

    I can give the sources of the test-script, just tell me the best way you’d like me to do this.

  • andr3a

    hello Varrah, please contact me if there are the same problems with this updated version of my PHP_Serializer.js file: http://www.devpro.it/code/102.html

    To use expreimental multibytes parsing (serialize / unserialize) use true or 1 on constructor:
    var php = new PHP_Serializer(true); // enable experimental multybyte convertion

    I’m waiting for your responce, thank you for test and debug

    P.S. actually works only with iso utf8 encoded strings and not with all type of charsets

    Regards,
    Andrea Giammarchi

  • Varrah

    Made a new version of the test-scrip, with the updated version of the Andrea’s script: http://neworld.spb.ru/tmp/serialize/serial2.php
    Seems to be ok this time.
    Thanks!

  • andr3a

    Hello Varrah, the multibyte function should work correctly with this version (1.6b) but in every case utf8_encode php function converts only iso-8859-1 charset.
    There are few cases (for me) where an UTF-8 header is required then i think that my object should work without this feature.
    Remember that multibytes version (the same with true on contructor) is really slower because “char by char parsing” for each string is a crazy way to parse quikcly some or a lot of vars and find again string value without the correct length is a killer for client-side convertion speed (for every serialized string try to reproduce the correct string checking if re-encoded length is the same of serialized integer value … !!!).
    BTW, I hope this will be the last version of this object.

    Best regards and thank you again for debug.

  • Varrah

    Well, I was thinking about some other algorythm, may be it’ll be faster (actually you’ve already mentioned somethnig like that in e-mail): we can make the string escaped and then count its length, divided by 3 (since it’s using % and two symbols for each char), yet escaping doesn’t make it correct either – Latin-1 chars are stil just chars (with no % and all) after escaping.
    Then, he-he :-) One could count number of % symbols in a string, like percentNum and then calculate the length as:
    utfLen = strEscaped.length – (percentNum * 2)
    Now only a fast method of counting the % symbols is needed :-)

  • andr3a

    maybe a regexp is the fastest way to do that … but maybe you’re not thinking about big size of serialized and escaped strings !!!

    Well, I’ve created this javascript class for ajax / php “real-time” interaction (search AJSHP Project on Google if you’re interested), then bytes loaded or sent should be less and not more than an XML interaction (then faster than XML) :-)

    However this version, with multibytes conversion enabled, should be usable with 100 or more serialized strings inside an array, then I hope it’s enought for UTF8 encoded interaction, not fast as iso, but efficent, isn’t it ?

    See you :-)

  • Varrah

    I hope it’s enought for UTF8 encoded interaction, not fast as iso, but efficent, isn’t it ?

    Well, yes
    Just hoped to find even faster and optimized solution. After all – PHP does that somehow, then why JS can not? :-)

  • andr3a

    PHP does that somehow, then why JS can not?

    because it’s a strange features of PHP and PHP only … that use something like sizeof(*string) and not strlen(*string) that should count chars and not bytes used.

    Maybe on PHP6 and Unicode native support, this problem will disappear ? … I hope so :-)

  • andot

    http://www.coolcode.cn/?p=171

    Here is a best PHP serialize/unserialize implementation for javascript.

    It can serialize/unserialize N,b,i,d,s,U,r,R,a,O,C.

    It is included in PHPRPC: http://sourceforge.net/project/showfiles.php?group_id=163368

  • mysticav

    Please can somebody can help me with this issue ?
    http://www.sitepoint.com/forums/showthread.php?p=2863921#post2863921

  • michelangelot

    I have tried userialize the php serialized string with js, but the special characters of iso-8859-1 are not recognized.
    Help me!!

  • Fordi

    Just so’s you know, since PHP doesn’t use the quotes as delimiters, and in fact, holds strict to the number of bytes given (s:Num:”data”;), you don’t have to escape anything.

    As evidence, the return value of serialize(‘””‘) is: ‘s:2:””””;’

    Anyway, I’ve dealt with serialization a LOT (especially for pseudoRPC systems, I pass serialized objects to PHP (as JSON support won’t be widespread until PHP5 gains dominance, and an interpreted JSON decoder is too much of a performance hit), and pass JSON object back to Javascript (as a JSON encoder is a lot less insensive than a JSON decoder). So, I’ve got a nicely stable PHP serializer. I deal with the UTF problem by converting all non-ascii to HTML Entities. Not the best way, but since I RPC strings mainly for text for websites, it’s speedier than trying to figure out how many bytes a UTF symbol occupies.

    Enough talk now. Here’s the code:
    Object.toPHP = function(object) {
    var type = typeof object;
    switch (type) {
    case 'undefined':
    case 'unknown': return 'N;';
    }
    if (object === null) return 'N;';
    if (object.toPHP) return object.toPHP();
    if (Object.isElement(object)) return null;
    var ret = [];
    for (var property in object) {
    var value = Object.toPHP(object[property]);
    if (value !== undefined)
    ret.push(property.toString().toPHP() + value);
    }
    return 'a:'+ret.length+':{'+ ret.join('')+'}';
    }
    Date.prototype.toPHP = function() {
    return 'i:'+this.getTime()+';';
    }
    Object.extend(String.prototype,{
    toPHP: function () {
    var s=this.escapeUTF();
    return 's:'+s.length+':"'+s+'";';
    },
    escapeUTF: function () {
    var charCode,ret = '';
    for (i=0; i=32))?
    this.charAt(i):
    ('&#x' + charCode.toString(16).toUpperCase() + ';');
    }
    return ret;
    }
    });
    Array.prototype.toPHP = function () {
    var ret=[];
    this.each(function (v,i) {
    ret.push(i.toPHP()+(!!v.toPHP?v.toPHP():Object.toPHP(v)));
    });
    return 'a:'+ret.length+':{'+ret.join('')+'}';
    };
    Number.prototype.toPHP = function () {
    return (parseInt(this)==parseFloat(this)?'i':'d')+':'+this.toString()+';';
    };
    Boolean.prototype.toPHP = function () {
    return 'b:'+(this?'1':'0')+';';
    };
    Function.prototype.toPHP = function () {
    return 'N;';
    };