SitePoint Sponsor

User Tag List

Page 1 of 2 12 LastLast
Results 1 to 25 of 44
  1. #1
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Efficient way to passing large objects between scripts?

    Hi

    I have a system set up a bit like a tree, where the trunk is the start and end point.
    Ex:
    Request goes to the controller (1)
    That controller starts up multiple sub-controllers (N)
    Each sub-controller starts some workers (n)
    The workers return the data to the sub-controller (which does it's magic), which in turn returns the data to the main controller, so it can do it's magic before returning it to the user.

    These scripts are spread between multiple servers (on the same GB network), usually there about 900 scripts started for every request, and the data passed between scripts is usually under 1MB (multi-dimensional arrays making up objects )

    Right now, the way I pass the data is by json_encode in the worker and json_decode in the parent.
    But, this is #1 to slow (and about 5x faster than serialize) and #2 takes WAY to much RAM (sometimes for 500KB of values it takes 60mb of ram, and this is per worker/child).

    From one request that takes about 20sec, 10 to 15sec is usually only this json_encode/json_decode part.

    So the question is:
    - Is there a better way to transfer this data from one script to another (I need to use all the data in each script, so can't pass the ID and select from the global cache/db)

    Please reply.

  2. #2
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,026
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    What are you doing exactly?

  3. #3
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's a system that sells plane tickets, but that's irrelevant to the problem (could be anything)

  4. #4
    SitePoint Zealot 2ndmouse's Avatar
    Join Date
    Jan 2007
    Location
    West London
    Posts
    196
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Are you sure that the bottle-neck occurs during the json_encode/decode process?

    At what point does the system branch out to other servers - sub-controller or worker? - or maybe both
    Data transfer speed over the network might be responsible???

    Justa thought!

    I could be miles off target here, but I found this article a while back - might be of interest.
    Detect file changes remotely. SimpleSiteAudit is an early
    warning anti-hacker system which sends an alert on detection.

    PHP Find Orphan Files - Finds all the unreferenced files on your site.

  5. #5
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Vali View Post
    #2 takes WAY to much RAM (sometimes for 500KB of values it takes 60mb of ram, and this is per worker/child).
    Hmm. This just doesn't seem right to me....

    $_POST[] allows for multidimensional arrays to be transferred..

    To me it sounds like your problems could be solved by a proper object oriented design.

  6. #6
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, one of the bottlenecks is the way I pass the actual object from one server/script to another.
    The 20sec test did not max up the network (went to ~50Mb/sec), but the CPU/RAM of the servers encoding/decoding the php object spike up for a good 5 to 10 sec.

    I also added some logging around the part that just reads/sends the data, and I'm 100% sure that it's one of the bottlenecks that needs to be fixed.
    And since it's literally 2 lines of code (encode/decode...), that's the place I figure I would look for a smarter way to do it.

    I tried the php serialise/unserialise, but that was 4x slower than json encode/decode.

    So basically, I need a better way to transfer an array of arrays (objects) from one server to another.

  7. #7
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    K. Wolfe I post the data to the children, and then I need to get it back, they echo json_encoded variables, and I json_decode them so I can use them. (that is the slow part...)

  8. #8
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,026
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    There aren't many problems that would require starting up 900+ scripts per request, and airline ticket sales aren't one of those; and I can think of none that need 60MB to process 500K of data. Something is seriously wrong. I sense a ball of mud project that has been evolved rather than designed - which has been nursed along by the old throw more hardware solution at it and may be fast approaching the end of the line where it must be replaced as the expense of maintaining it will eclipse the cost of replacing it. Dealt with those myself - they aren't fun - especially when upper management would rather deny the reality of the situation within the code and keep trying to patch it along.
    Last edited by Michael Morris; Jan 30, 2013 at 13:30. Reason: Added wiki link

  9. #9
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Vali View Post
    K. Wolfe I post the data to the children, and then I need to get it back, they echo json_encoded variables, and I json_decode them so I can use them. (that is the slow part...)
    Right... But why do you have so many "children" on remote servers. If you can get some of these operations on to the same machine, you can avoid network / json bottleknecks. Additinally, you can save more resources by a proper object oriented design.

  10. #10
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Michael Morris View Post
    There aren't many problems that would require starting up 900+ scripts per request, and airline ticket sales aren't one of those; and I can think of none that need 60MB to process 500K of data. Something is seriously wrong. I sense a ball of mud project that has been evolved rather than designed - which has been nursed along by the old throw more hardware solution at it and may be fast approaching the end of the line where it must be replaced as the expense of maintaining it will eclipse the cost of replacing it. Dealt with those myself - they aren't fun - especially when upper management would rather deny the reality of the situation within the code and keep trying to patch it along.
    +1

  11. #11
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    K. Wolfe It's OOP, and I need it on multiple machines since one can't handle it.

    Michael Morris unfortunately, I have to start that many...
    My system loads data from other systems, and as a simplified example, this is why I need to start 900 workers:

    The user wants to fly from YYZ to NYC and back to YYZ, with flexible dates (as in, +-3 days on departure/arrival), with any airline and any seat in economy or business class.

    The data source, only accepts requests in the FROM CITY/DATE - TO CITY/DATE - AIRLINE - CLASS
    So, this means:
    [CITY] [-3 to +3] - [CITY] [-3 to +3] - [airline] - [class]
    [7 days]*[7 days]*[10 airlines that fly between those cities]*[2 classes] = 980 requests right there (assuming no cache was hit), all workers that just standardise this data to something the rest of the script can use.

    And this assumes one data source, no need to ask for a "next page", composite tickets and so on...

    The bottleneck I'm trying to fix is getting the standardised data from those 980 requests back to the controller script, where I can actually start the real work.
    And I have to standardise the data, since each data source has it's own format/rules/aliases/etc... (systems made in the 70s that never changed... and with crap on top of crap as Michael Morris explained).

    Any suggestions? (So I have an array in script A, server A and want to pass it to the script that called it, on server B)

  12. #12
    I solve practical problems. bronze trophy
    Michael Morris's Avatar
    Join Date
    Jan 2008
    Location
    Knoxville TN
    Posts
    2,026
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Vali View Post
    K. Wolfe Any suggestions? (So I have an array in script A, server A and want to pass it to the script that called it, on server B)
    Hmm.. The only thing I can think of to set up a server who's only job is to hold the standardized data, feed it to the front requests and continually negotiate the translation of data. The PHP frontend would talk only to that server which would have the schedule data for it. The translation side would probably be better off in another language - C++ I would think. There'd be a lag between when the old systems got updated and the new system gets the data right, but this could be worked around as provisional, with some legal text on the front end explaining the prices displayed are continually in flux. Once the user has made the choice you can then hunt up that exact ticket and send a confirmation.

    Or, you could honor the old price and eat the losses when they occur, but also take the profits when a customer agrees to pay more than what the airline charged in that period of lag. Such issues are policy related - the sort of decisions managers should make.

    Whatever you discover though, Good Luck. I don't envy your predicament.

  13. #13
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried that approach already (have one server with the data, passing the IDs back from the workers), but then I max up the bandwidth to/from that server (since I need multiple requests in the same time), and that server (was a cluster of memcached servers) needed a ton of ram for nothing.

    And the worst, I still need to serialise the data somehow to place it in that one server, and that's what's slow...
    (after I parse the standardised data, I cache the results in a similar way, but my issue if before I parse it, when I get it from the 900+ workers, serialise/unserialise seems redundant...)

  14. #14
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by Vali View Post
    The data source, only accepts requests in the FROM CITY/DATE - TO CITY/DATE - AIRLINE - CLASS
    So, this means:
    [CITY] [-3 to +3] - [CITY] [-3 to +3] - [airline] - [class]
    [7 days]*[7 days]*[10 airlines that fly between those cities]*[2 classes] = 980 requests right there (assuming no cache was hit), all workers that just standardise this data to something the rest of the script can use.
    Can you go into detail on this? I really don't understand what you have so far, but right here I'm having a feeling this can be simplified.

    Ideally you should have 1-2 application servers (second is a backup, not a second machine to split duties) and as many data servers as needed to fulfill the requests in a timely fashion. If it's designed correctly and its still falling behind, start adding in new data servers to start sharing the load. But unless you are over 120 gigs of active working set, you don't even need to think about a second data server.

  15. #15
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    K. Wolfe My system gets it's data from other systems (outside of my control).
    Each of those systems have their own format, so I need to standardize the data to my own format (I'm getting this data in parallel calls).

    I have to spread the load of the workers to multiple servers (got ~24 for now), since one server cannot handle parsing all that XML/HTML/JSON/TEXT/SOAP calls to the various data sources, and parsing those responses to my standard format.
    Because of that, I need a way to pass that data over to the parent of those scripts, where I can apply my business logic and so on.

    If 1 request takes 20 sec, this is how the time it's spent:
    ~ 1 sec parent business logic/starting the 900 workers in parallel
    **** This is where work in parallel starts, while the parent waits ****
    ~ 1 to 10 sec workers waiting for data (in parallel per worker, so 900 to 9,000 sec worker time)
    ~ 1 to 3 sec workers parsing the data and formatting it to my standard format. (in parallel per worker, so 900 to 2,700 sec worker time)
    ----- this is what I want optimised, since it seams redundant -----
    ~ 1 to 2 sec workers json_encoding/serialising the data (cpu, in parallel per worker, so 900 to 1)
    ~ 1 sec transfer the data to the parent (network)
    **** This is where work in parallel ends ****
    ~ 5 to 10 sec parent decoding the data from all workers, as it receives it (cpu)
    ----- up to here -----
    ~ 5 sec applying my business rules/magic
    -----------------------
    Total user time: 15 to 32 sec , where 7 to 12 sec seems redundant & useless (~37% of total time used just to pass data around)

    If I don't use workers and do everything in the parent, I have to wait the 1 to 3 sec parsing the data per worker, so 900 to 2,700 sec (15 to 45min)

    What I'm looking for, is an efficient way to get the standardized data (done by the workers) to the parent/controller that initiated them.

  16. #16
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    I see. This still comes back to my and Micheals original point. We don't feel that should be taking that long to parse out JSON unless there's something else extremely goofy going on.

    Just curious, how many different remote systems are you hitting?

    EDIT: BTW My current job has me doing much of this type of thing. All my company deals with is external systems syncing to our own through curl / soap / xml etc.

  17. #17
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I currently have 7 different data sources:
    - 2 HTML websites,
    - 2 stateless soap (xml/custom format),
    - 3 TA based (basically plain text over socket communication).
    I take the data from there, and when the user takes an action that needs to be synced there, I sent them data.

    Each request gives 100-800Kb of data (but I do 900 of them), and after I parse that data, I end up with arrays of objects like this one:
    Code:
    $fare_tpl = array(
                'id' => 0,
                'airline' => 0,
                'consolidator' => 0,
                'cost' => 0,
                'tax' => 0,
                'adult_cost' => 0,
                'adult_tax' => 0,
                'child_cost' => 0,
                'child_tax' => 0,
                'infant_cost' => 0,
                'infant_tax' => 0,
                'flights' => array(),
                'filters' => array(
                    'outbound_start_date' => 0,
                    'outbound_end_date' => 0,
                    'outbound_duration' => 0,
                    'outbound_stops' => 0,
                    'inbound_start_date' => 0,
                    'inbound_end_date' => 0,
                    'inbound_duration' => 0,
                    'inbound_stops' => 0,
                    'duration' => 0,
                    'stops' => 0,
                    'airline' => '',
                    'price' => 0,
                ),
            );
    // That's just a random object I got at the end of the line.
    The one the workers return has about 100 fields per object, and each worker returns about 100 of these objects, each of these objects with about 100 different flight objects, each flight with 1 to 5 legs. (ex: $fare[0]->fights[24]->legs[outbound][1]->departure_time).

    So each worker returns about 100KB to 500KB of json/serialized (it's gziped content so there's no problem for the network).

    How long should it take to parse that json (encode/decode)?
    Maybe I'm missing something stupid here...

  18. #18
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,182
    Mentioned
    65 Post(s)
    Tagged
    2 Thread(s)
    Oh, this is a fun project! I'll dive into this later after work

  19. #19
    Non-Member
    Join Date
    Oct 2007
    Posts
    363
    Mentioned
    11 Post(s)
    Tagged
    0 Thread(s)
    This is fascinating reading guys... I've never worked on a project on this kind of magnitude. It's fascinating to read about. Not that I think I can add much to the discussion - but just out of interest, this all happens in real time? That is, as a customer of your website, when I go to search for tickets, these searches to remote sources all happen in real time and are actually triggered by me doing a search?

    I've often thought about how these aggregate websites work, and figured they must cache data and have workers storing the data in the background constantly. I guess that's not so easy for you to do due to the sheer complexity of combinations involved?

    As I say, I don't think I can really add much here, but it's fun to read about, so I'm getting out the popcorn and I'm sitting in the background

  20. #20
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, so this grabbed my attention and I'm interested, so I did a quick test.

    First, I generated a 783K json file using the following code (granted it doesn't have nested arrays, but I planned on building that in later):

    build.php
    PHP Code:
    <?php
    define
    ('VALID_CHARS''abcdefghijklmnopqrstuvwxyz');

    $keys = array();
    for (
    $i 0$i 100$i++)
    {
        
    $keys[] = randomString(6);
    }

    $objs = array();
    for (
    $i 0$i 400$i++)
    {
        
    $obj = array();
        foreach (
    $keys as $key)
        {
            
    $obj[$key] = randomString(8);
        }

        
    $objs[] = $obj;
    }

    file_put_contents('data/objects.json'json_encode($objs));

    function 
    randomString($length)
    {
        
    $str '';

        
    $validChars VALID_CHARS;
        
    $validCharsLength strlen($validChars);

        for (
    $i 0$i $length$i++)
        {
            
    $str .= $validChars[mt_rand(1$validCharsLength) - 1];
        }

        return 
    $str;
    }
    Then I created a script that reads the file and performs json_decode:

    read.php
    PHP Code:
    <?php
    $content 
    file_get_contents('data/objects.json');
    $objs json_decode($content);
    Both run in a split second on my development machine (granted I've got a quad-core 8 GB RAM machine, but I really don't see this being your hold up).

    Next I profiled the code using xDebug, the time to run build.php was 1,166 ms, 63% of the time to run was in randomString (which you wouldn't have, but you would have something that generates your objects).
    read.php ran in 42 ms. 97% of the time was spent in json_decode (DUH! there were only two lines, what did you expect?).

    So now I obviously need to go bigger, and see if I can start seeing seconds instead of ms.

    2M file, I changed 400 to 1000 in the build.php

    Profiler shows 3,135 ms for build.php, with 57% in randomString, and read.php shows 105 ms with 98% being in json_decode.

    So from these numbers I have concluded thus far that as the file size grew (ie the number of objects grew) the time to build the json encode file did not increase, it remained pretty static (granted I didn't make a big leap, but I did go from 700K to 2M in file size). The read.php reacted the same as the build.php, so reading the larger file didn't adversely affect performance either.

    I've attached my profiles here and the code for others to observe as well, but I strongly think you need to setup xDebug and figure out where your bottlenecks are at, as I don't believe it is json_encode or json_decode directly.
    Attached Files Attached Files

  21. #21
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    As a quick update, I then attempted 10000, but reached my set memory limit in PHP, so I dropped it to 4000 (the largest it would let me before reaching my memory limit), and that tool 13,074 ms, 61% in randomString (again on par with the other runs). However, read.php reaches the memory limit trying to read this file using file_get_contents(), so I might try using fopen and reading smaller chunks to see if I can't get around this issue, but this may be a problem... as I'm not sure how I'm going to tackle that issue, as I can't use json_decode on a partial json string...

  22. #22
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    cpradio Thank you for your time, I clearly need to run more tests.
    I have tested with json/serialize/igbinary, and they do make a difference as you add more data/run them in parallel.

    But you bring up a good point, maybe it's slow because of the multi-dimensional arrays, I will have to verify that.

    As a note on the memory, I have mine set to 124MB.
    BUT since I get the data in chunks from the workers, I json_decode it (it spikes memory allot, then releases it), and then only keep the values I really need.
    I also re-arrange the data in a way so I never have anything duplicated, so I end up with needing about 70MB instead of a few GB RAM.

    BUT, having said that, there are instances where the worker returns more data than I can receive (needs to much memory to decode it), that is another reason I wanted to find a fix for this.
    (But that happens less than 0.0023% of the time, so about twice per search, once every 20 or so searches when I factor in caching, and when it happens, I just order by price and truncate the data a bit...)

    aaarrrggh Most sites only give you one or two airlines, they pre-cache the prices (which change, but they update it on the client, or absorb the price change) and check availability as you select your price.
    Or, they load their data from a data source like the one I'm working on (there are many many layers... with sometimes mainframes in the lower levels).

  23. #23
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Okay, our memory limits are nearly identical, mine is set at 128M, so that is appropriate. As for your memory statement, can you elaborate how you decided you only need roughly 70MB?

    One thing you can deduce from my read.php test is that since 98% of the processing time is in json_decode, the time you will see CPU and Memory spike is during that call. If you think about it, it really makes a lot of sense, you are loading all of the objects of the json objects into memory. If you are loading all of it, then wiping out portions because you don't need it, you are still taking a hit up front because you don't have the ability to filter that unnecessary data prior to calling json_decode.

    At this point, I'd like to point out something that could be useful and that is designing your return data to mimic copybooks on the mainframe. A copybook is just a giant string. Each piece of data in the string is a fixed length located at a fixed position.

    Example:
    Code:
    id   price    other
    00001000560.34More Info
    The header record could be optional, but the remainder you would then be able to use fopen() and fread to read a line at a time (as you could denote how long each record is), thus you load a single record in memory, grab the data you need and move on, filtering out the data you don't need on a record by record basis.

    Similarly, you could use a MongoDB table, SQLite table, or any other database for that matter to perform the same concept and you gain a better pre-made framework for sorting purposes.

    One of the biggest issues with json_decode and unserialize is you have to load the entire dataset before you can process anything. That is harsh when the data is large and it can be costly (as you are finding). 900 processes each loading an entire dataset of 100+ objects (that may be 70M each) means you are trying to use 70M * 900 amounts of RAM. If the data was in a table/copybook style, you could filter out the stuff you don't need in your query and maybe end up using 30-35M each, cutting your memory usage in half.

    A couple years ago, I was tasked with processing 5-10 thousand policies (each about 1-2 MB in size) as quickly as possible, that meant loading them up, validating all of the data, ordering reports from third parties, and then processing business and validation rules against them, then finally providing a quote for each one. The process we built could run 1000-2500 policies an hour based on the thresholds we set. However we only got to that speed because we profiled our code weekly taking the highest costing portion and reworking it until it fell into a group of 3 or more functions that took equal amounts of time. Doing this weekly allowed us to catch any code that seemed to take longer than the rest and drop it so it fell in line with other methods, in the end no method was taking longer than the others, they all performed equally.

    Now if all of your methods are performing badly, that isn't much help, but chances are, they aren't and a similar approach might be useful here. If you profile your code and see that the json calls are indeed where the most time is spent, then deviating from json is important to do and I'd recommend going to either a flat fixed file (like a copybook on the mainframe) or to a database so you can retrieve what you need without loading all of the data into PHP's memory.

  24. #24
    SitePoint Guru
    Join Date
    Jun 2006
    Posts
    638
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The copybooks format is a nightmare to maintain (and actually parsing something like that from 2 of the data sources).

    The old system was passing data like this, but it's a million and one times slower than the new one (bad code + slow parsing).

    The 70mb is the result of the 900 workers (the data I really need), and to verify it, I just get the PHP starting memory, print memory_get_peak_usage and memory_get_usage for an idea (beside server stats and so on).

    Also, one important note is tha I don't concatenate the 900 json replies, I parse them 1 by 1, so I don't have to holed EVERYTHING in memory, just the reply of one worker at a time.

    For the database, I tried, I would have to make way to many inserts, same approach with a memcached cluster, it ended up being the bottleneck (send data there, to send it back to what started the worker).

    So now, I return the data directly to whomever started the worker, and the load balancing starts the parent on a server that can handle the bandwidth.

  25. #25
    Hosting Team Leader silver trophybronze trophy
    cpradio's Avatar
    Join Date
    Jun 2002
    Location
    Ohio
    Posts
    5,127
    Mentioned
    152 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Vali View Post
    The copybooks format is a nightmare to maintain (and actually parsing something like that from 2 of the data sources).

    The old system was passing data like this, but it's a million and one times slower than the new one (bad code + slow parsing).
    Yes, they are a pain to maintain (I can definitely agree with that), but slower? That seems a bit odd as the filesize should definitely have been smaller than the json footprint...at least I know you were already down this path now.

    Quote Originally Posted by Vali View Post
    The 70mb is the result of the 900 workers (the data I really need), and to verify it, I just get the PHP starting memory, print memory_get_peak_usage and memory_get_usage for an idea (beside server stats and so on).
    Ah, I mis-read that 70mb was the resulting memory footprint after getting rid of the data you didn't need

    Quote Originally Posted by Vali View Post
    Also, one important note is tha I don't concatenate the 900 json replies, I parse them 1 by 1, so I don't have to holed EVERYTHING in memory, just the reply of one worker at a time.
    My point was if there was anything in the returned json data for each reply that you do not use, you are loading it into memory during json_decode and then ditching it, but maybe that isn't what you are doing here, maybe you are filtering out whole json results from some of the 900 replies. Can you elaborate on that process?

    Quote Originally Posted by Vali View Post
    For the database, I tried, I would have to make way to many inserts, same approach with a memcached cluster, it ended up being the bottleneck (send data there, to send it back to what started the worker).
    Interesting again, could have been useful to look into bulk inserting, but I'm willing to consider this as "tried and determined it was a bottleneck"


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •