SitePoint Sponsor

User Tag List

Page 2 of 2 FirstFirst 12
Results 26 to 27 of 27
  1. #26
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    66 Post(s)
    Tagged
    2 Thread(s)
    OK, little breather from this one

    So. The serialization is not what you need to worry about, it is the fact that you'll have to be writing to disk. (slow)

    This will however solve your problem of too much memory be consumed. At the same time, you can start caching searches through this, and possibly lowering the amount of queries being sent out to the remote site.

    Your goal through this approach would be to keep as much as you can in the db without assigning it to a variable first. This is where I feel your breaking down, and I've had another person agree with me. In php, 8mb of raw data does not translate to 18 times the memory. You said you have over 190mb of PHP code? That's a lot of places where you can be duplicating these variables out unnecessarily.

  2. #27
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Memory

    This is how PHP stores arrays:
    PHP Code:
                                 |  64 bit   32 bit
    ---------------------------------------------------
    zval                         |  24 bytes 16 bytes
    cyclic GC info             |   8 bytes |  4 bytes
    allocation header          |  16 bytes |  8 bytes
    ===================================================
    zval (valuetotal           |  48 bytes 28 bytes
    ===================================================
    bucket                       |  72 bytes 36 bytes
    allocation header          |  16 bytes |  8 bytes
    pointer                    |   8 bytes |  4 bytes
    ===================================================
    bucket (array elementtotal |  96 bytes 48 bytes
    ===================================================
    total total                  144 bytes 76 bytes

    +the element size
    Most my items are numeric (INTs, 8 bytes on 64Bit)
    So, to store those 8 bytes I need 152 bytes, 19 times the size of the data...

    If I were to use SplFixedArray (no bucket, save that 96 bytes per element), it would need 64 bytes (56 structure + 8 data), so 8 times the size.
    But then the way I access it would be pretty retarded (since I don't have a set length of the data, it's dynamic...)

    Currently I have structure like this:
    PHP Code:
    $r = array(
        
    'fare' => array(
            
    'id' => 0,
            
    'ticketing_deadline' => 0,
            
    'basis' => array(),
            
    'airline' => array(),
            
    'pax_count' => 0,
            
    'cost' => 0,
            
    'tax' => 0,
            
    'cabin_code' => '*',
            
    'fare_info_xml' => array(),
            
    'price_validation_md5' => 0,
            
    'with_land' => false,
            
    'Adult' => array(
                
    'pax_count' => 0,
                
    'cost' => 0,
                
    'ticketing_deadline' => 0,
                
    'tax' => 0,
                
    'taxes' => array(),
                
    'Outbound' => array(
                    
    'basis' => array(),
                    
    'airline' => array(),
                    
    'date' => null,
                    
    'from' => '',
                    
    'to' => '',
    ... 
    So when I access them, I foreach whatever inner structure I need (so the code is clean and fast to develop).
    If I flatten this structure, I save some RAM at the cost of development time...

    Serialization

    I ran some tests here.
    The fastest way to serialise data was igbinary_serialize/igbinary_unserialize by far (used to pass data from one level to another).
    But even then, it uses 50% of the CPU time needed for a cached search... So I will have to rethink some logic here (maybe there's no need to pass all the data over, every time).

    DB Storage
    I ran some numbers, and the DB (noSQL) will have to be set on localhost (or same rack on GB connection worst case scenario), else the network will become the bottleneck.
    But unfortunately, my servers are to crappy to be able to handle that, plus whatever else I need to throw at them... so again, I would have to rethink some logic here.
    Remember, each search spawns 900+ sub searches, and those go on the cluster, so the data needs to be transferred to the parent searches for processing, in turn passing it to their parents and so on. Since not all these "threads" are on the same machine/rack (and sometimes cluster), if I was to store the data on localhost, I would have the same data stored in multiple places, raising the server requirements...

    Large code base
    This app is not that small, and PHP is not the only thing used.
    PHP is mainly used for the fronted/admin interface, but even the front-ends need to apply some rules (and sometimes parse some XML, etc).
    So while it's a big code base, that data will not be passed through so much code... just through what it needs to.




    I'm still looking for solutions for this, but the easiest one I found is to use Judy arrays, it does not require that much code changes/logic changes.
    Unfortunately they don't work correctly in Debian stable... not sure why...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •