PHP Memory
This is how PHP stores arrays:
PHP Code:
| 64 bit | 32 bit
---------------------------------------------------
zval | 24 bytes | 16 bytes
+ cyclic GC info | 8 bytes | 4 bytes
+ allocation header | 16 bytes | 8 bytes
===================================================
zval (value) total | 48 bytes | 28 bytes
===================================================
bucket | 72 bytes | 36 bytes
+ allocation header | 16 bytes | 8 bytes
+ pointer | 8 bytes | 4 bytes
===================================================
bucket (array element) total | 96 bytes | 48 bytes
===================================================
total total | 144 bytes | 76 bytes
+the element size.
Most my items are numeric (INTs, 8 bytes on 64Bit)
So, to store those 8 bytes I need 152 bytes, 19 times the size of the data...
If I were to use SplFixedArray (no bucket, save that 96 bytes per element), it would need 64 bytes (56 structure + 8 data), so 8 times the size.
But then the way I access it would be pretty retarded (since I don't have a set length of the data, it's dynamic...)
Currently I have structure like this:
PHP Code:
$r = array(
'fare' => array(
'id' => 0,
'ticketing_deadline' => 0,
'basis' => array(),
'airline' => array(),
'pax_count' => 0,
'cost' => 0,
'tax' => 0,
'cabin_code' => '*',
'fare_info_xml' => array(),
'price_validation_md5' => 0,
'with_land' => false,
'Adult' => array(
'pax_count' => 0,
'cost' => 0,
'ticketing_deadline' => 0,
'tax' => 0,
'taxes' => array(),
'Outbound' => array(
'basis' => array(),
'airline' => array(),
'date' => null,
'from' => '',
'to' => '',
...
So when I access them, I foreach whatever inner structure I need (so the code is clean and fast to develop).
If I flatten this structure, I save some RAM at the cost of development time...
Serialization
I ran some tests here.
The fastest way to serialise data was igbinary_serialize/igbinary_unserialize by far (used to pass data from one level to another).
But even then, it uses 50% of the CPU time needed for a cached search... So I will have to rethink some logic here (maybe there's no need to pass all the data over, every time).
DB Storage
I ran some numbers, and the DB (noSQL) will have to be set on localhost (or same rack on GB connection worst case scenario), else the network will become the bottleneck.
But unfortunately, my servers are to crappy to be able to handle that, plus whatever else I need to throw at them... so again, I would have to rethink some logic here.
Remember, each search spawns 900+ sub searches, and those go on the cluster, so the data needs to be transferred to the parent searches for processing, in turn passing it to their parents and so on. Since not all these "threads" are on the same machine/rack (and sometimes cluster), if I was to store the data on localhost, I would have the same data stored in multiple places, raising the server requirements...
Large code base
This app is not that small, and PHP is not the only thing used.
PHP is mainly used for the fronted/admin interface, but even the front-ends need to apply some rules (and sometimes parse some XML, etc).
So while it's a big code base, that data will not be passed through so much code... just through what it needs to.
I'm still looking for solutions for this, but the easiest one I found is to use Judy arrays, it does not require that much code changes/logic changes.
Unfortunately they don't work correctly in Debian stable... not sure why...
Bookmarks