Count Files in Directory by Wildcard

Right now we manually tell our database how many images for a single item.

Something like

wild.jpg
wild-2.jpg
wild-3.jpg

And we’d say there are “3” in a column with image name “wild” and it would know how to loop through.

But, we’d like php to check the directory and wild card automatically and come up with the sequence of posters available.

Any feedback appreciated.

Cheers!
Ryan

Have you tried glob?
http://php.net/manual/en/function.glob.php

It doesn’t have full regex pattern capability, but it may be enough for what you need.

The pregmatch is the issue. I think this is closer to what I need, but I suck at pregmatch.

// create an array to hold directory list
$results = array();

// create a handler for the directory
$directory = $_SERVER['DOCUMENT_ROOT'].'/some/path/to/images/';
$handler = opendir($directory);

// open directory and walk through the filenames
while ($file = readdir($handler)) {

    // if file isn't this directory or its parent, add it to the results
    if ($file != "." && $file != "..") {

        // check with regex that the file format is what we're expecting and not something else
        if(preg_match('#^(prefixone|prefixtwo)[^\s]*\.'.$wordpress-slug.'\.[^\s]+(\.(jpg|jpeg|png))#', $file)) {

            // add to our file array for later use
            $results[] = $file;
        }
    }
}

The syntax for glob is a bit different, but I think GLOB_BRACE would work well for getting image files.

glob("*.{jpeg,jpg,png}", GLOB_BRACE) 

This is actually working okay…

foreach (glob("$wildcard*.jpg") as $filename) {
    echo "$filename size " . filesize($filename) . "\n";
    $results[] = $filename. "\n";
    $count++;
}

But one issue. It matches longer strings too…

the-meg.jpg
the-meg-2.jpg
the-meg-3.jpg

[these should not be included]

the-meg-2018.jpg
the-meg-2018-2.jpg
the-meg-2018-3.jpg

and so on. So it’s figuring out how to match a dash and number and period, without allowing any further dashes.

Cheers!
Ryan

How intensive is “file_exists” function? My only solution was this, where a run a high loop until a file can’t be found.

for ($k = 1 ; $k < 10; $k++){ 
	if($k == 1) {
		$fileexists = file_exists("$wildcard.jpg");
	} else {
		$newwildcard = $wildcard."-$k";
		$fileexists = file_exists("$newwildcard.jpg");
	}
	if($fileexists == 1) {
		if($k == 1) {
			$results[] = $wildcard.".jpg";
		} else {
			$results[] = $newwildcard.".jpg";
		}
	} else {
		break;
	}
	$count++;
}
1 Like

How difficult would it be to change the character that you use to delimit the wildcard from the image count suffix? It strikes me that if you changed that to a character that you do not allow in the wildcard name, your problem would go away. Or use a separate table to link the images to the item, so you can just use any old random unique filename for the images.

I’m not exactly sure what you mean. CAn you give example?

Cheers!
Ryan

In your example, you mentioned that when you search for images for the-meg, it will also find images for the-meg-2018, and I think that’s because you are using the same delimiter. If you change the delimiter between the name and the suffix, and make sure it can never appear in your name field, then it goes away:

the-meg.jpg
the-meg~2.jpg
the-meg~3.jpg
the-meg-2018.jpg
the-meg-2018~2.jpg

and so on. Just search first for name.jpg, then get everything that matches name~, and it should find the correct images. But don’t allow ~ in the name section.

Or, have a table that contains the unique id for wherever your name comes from, and assign a unique id to every image that’s uploaded, and maybe use that as the name. The table then links the image to the name without having convoluted rules, so all you have to do is run a query to retrieve all the images for a given name record.

Okay, I have about 150,000 images using this delimiter, going to be tough to rename them all. But do understand what you are saying.

That quite a lot! AFAIK, there are image apps that can do bulk renames, but unless the renaming could be done as an “every image in a folder” job, regex would be needed.

TBH, I don’t know if it would be more efficient to filter to a smaller set and then use regex, or just use regex from the get go.

In any case, since regex at some point can’t be avoided, all possible naming variations will need to be determined to avoid making errors that could result in an even bigger problem. It is not enough to craft a regex pattern that will match names the regex should match with, it is equally as important that the regex won’t match names it shouldn’t match with.

How are the files organized, one folder, or many?

Many, but studio name.

So,

/content/[studio-name]/[tons of images here]

So about 100 folders about.

That might be a help having separate “studio” folders. Depending on which has the smallest “ton” you could save a backup if you’re going to do any renaming. Even if you don’t do any renaming having a relatively smaller set to work with should be some help.

The file extensions should be easy to put into a regex pattern. What you will need to consider are things like

names always / never / sometimes begin with letters / digits, uppercase / lowercase
how many occur in the name strings before there is a non-alphanumeric character
if there are any sequences of characters that can be used.
etc. etc.

Not always so easy I know. The key is to identify as many patterns as you can.

Is it always the names with the 4 digit year your want to exclude? Are they always enclosed by dashes? Does that pattern never occur in names you want to match?

I am currently updating my applicaiton folder in a PHP FrameWork because the original PHP files are missing the new PHP7 declare(strict_types=1);. The declaration which makes it so much easier to debug.

I have written the following script which should be modified to suit your requirements.

I would suggest initially:

  1. creating a temporary folder
  2. populate the folder with some dummy images
  3. have a dry-run by not trying to rename, replace, etc.
  4. once the relevant files are listing to your satisfaction
    a. count the relevant files
    b. when the total matches your requirements adjust the search criteria
<?php 
declare(strict_types=1);

ini_set('html_errors', 'TRUE');
ini_set('display_errors', 'TRUE');
ini_set('display_startup_errors', 'TRUE');
error_reporting(-1);

# $ffs = glob('application/*.php');
$ffs = rglob('application/the-meg*.*');

$cntHas = 0;
$cntNot = 0;
foreach($ffs as $i2 => $ff):
	$tmp = file_get_contents($ff);
	if( strpos($tmp, 'declare(strict_types=1);') ):
		echo '<br>Already ==> '.$ff;
		$cntHas++;
	else:	
		echo '<br> &nbsp;&nbsp;&nbsp;Trying to insert ==> '.$ff;
		echo '<br>&nbsp;&nbsp;&nbsp;' .replace($ff, 'declare(strict_types=1);');
		$cntNot++;
	endif;	
endforeach;	
echo '<br>$cntHas ==> ' .$cntHas;
echo '<br>$cntNot ==> ' .$cntNot;

# ======================================
function replace($fileName, $insert='declare(strict_types=1);')
:string
{
	$result = 'Problem'; // DEFAULT

	$tmp 		= file_get_contents($fileName);
	$subScr = substr($tmp, 6);  
	echo 'XXX ==> ' .strlen($tmp);

	$txt 		= "<?php \n" 
					.		$insert ."\n" 
					.	$subScr;

	$ok = FALSE;
	try {
		$rsc 	= fopen($fileName, "w"); #  or die("Unable to open file!");
		if($rsc):
			$ok  = fwrite($rsc, $txt);
		endif;	
	} catch (Exception $e) {
    echo 'Caught exception: ==>';
    echo '<pre>';	
    	print_r($e);
    echo '</pre>';	
	}
	if($ok):
		$result = 'Success ==> ' .$fileName;
		fclose($rsc);
	endif;	

	return $result;
}

# ======================================
# Does not support flag GLOB_BRACE
# ======================================
function rglob($pattern, $flags = 0)
:array
{
	$files = glob($pattern, $flags); 
	foreach (glob(dirname($pattern).'/*', GLOB_ONLYDIR|GLOB_NOSORT) as $dir) {
	    $files = array_merge($files, rglob($dir.'/'.basename($pattern), $flags));
	}

	return $files;
}

# ====================================
function fred($val=NULL)
:string
{
	echo '<div class="width:88%; margin:1em auto">';
	echo '<pre>';
		print_r($val);
	echo '</pre>';

	return '';
}

Edit:
Spelling not my forty :frowning:

If you start the renaming process using the longest studio names, then your scenario in post #5 should not occur - by the time you get around to finding the images for the-meg, those for the-meg-2018 and the-meg-with-any-other-suffix will already be gone.

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.