404 Not Found nginx/1.12.2 - in foreach with File_get_contents

I’ve tried in several ways and still can not solve. I have an array with many urls, I use foreach to access the site and extract data with file_get_contents()

But 404 Not Found nginx/1.12.2 is the error that appears after approximately 2 minutes and registering approximately 280 records.

In addition to trying to execute max_execution_time and mysql.connect_timeout by inserting in the code also changing in the ini.php the value of 0 or 1200, the result is always the same, after 2 minutes it displays the error 404 Not Found nginx / 1.12.2

in the code I tried to use set_time_limit (0); and sleep (1) but it is not solving the problem either

ini_set('mysql.connect_timeout','0');   
ini_set('max_execution_time', '0');   

function create_wordpress_post_with_code() {

    if( isset( $_POST['mysubmitbtn'] ) ) {
        set_time_limit(0);

        $myGetCsv = function ($str) {
            return str_getcsv($str, ';');
        };

        $lines = array_map($myGetCsv, file(''.get_template_directory_uri() . '/list.csv', FILE_IGNORE_NEW_LINES));    

        foreach($lines as list($var1, $var2)){

            set_time_limit(0);  

        $html = file_get_contents( 'https://example.com/$var1/$var2' ); 
        $document = new DOMDocument();              
        @$document->loadHTML(mb_convert_encoding($html, 'HTML-ENTITIES', 'UTF-8'));
        $domxpath = new DOMXPath($document);

    // more code here

                $post_id = wp_insert_post(
                            array(
                                'comment_status'    =>   'open',
                                'ping_status'       =>   'closed',
                                'post_author'       =>   $author_id,
                                'post_name'         =>   $slug,
                                'post_title'        =>   $title,
                                'post_content'      =>   $content,
                                'post_status'       =>   'publish',
                                'post_type'         =>   'post'
                            )
                        );
         sleep(1);

        }

    }


} 
add_action( 'init', 'create_wordpress_post_with_code' );

What can I do to resolve this error? Or what are the probable reasons that this error is happening besides possibly because of the execution time?

Where do you get the 404 from? From your server, or the other one? In both cases you should have a look on the server that raises the error. If it’s not yours, contact the other admin, there may be some kind of DDOS protection or ip blocking.

my whole screen turns white and only this error appears 404 Not Found nginx/1.12.2

I do not know where the error comes from if on my server or another, but in case of being another, what do you suggest? I have no contact with the administrator, I am collecting statistics from the government website, there are many pages

You should only use file_get_contents() to access files on the local file system. If you are trying access files outside the local file system you should be using cURL.

Thanks, I already tried with curl too but the result was the same

If that’s a governments website, i would suggest they have build in protection against what you are doing. I would stop doing this anyways - without permission, automatic crawling may be illegal, depending on your country and what you do with the data afterwards. Just ask for an API.

the data is public, the government allows the use since he quotes as author, I believe that the government is already prepared for the parsing of data at large scales.

I did another test, using a large csv list and just file_get_content ('pages on my own site'), and the problem was the same after two minutes appeared the error 404 Not Found nginx / 1.12.2

Try this:

https://johns-jokes.com/downloads/sp-h/uniqueideman-proxify/index.php?url_to_proxify=https%3A%2F%2FGoogle.com&zspec=on

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.