Count Files in Directory by Wildcard

casbboy · June 28, 2018, 8:51pm

Right now we manually tell our database how many images for a single item.

Something like

wild.jpg
wild-2.jpg
wild-3.jpg

And we’d say there are “3” in a column with image name “wild” and it would know how to loop through.

But, we’d like php to check the directory and wild card automatically and come up with the sequence of posters available.

Any feedback appreciated.

Cheers!
Ryan

Mittineague · June 28, 2018, 8:57pm

Have you tried glob?
http://php.net/manual/en/function.glob.php

It doesn’t have full regex pattern capability, but it may be enough for what you need.

casbboy · June 28, 2018, 9:22pm

The pregmatch is the issue. I think this is closer to what I need, but I suck at pregmatch.

// create an array to hold directory list
$results = array();

// create a handler for the directory
$directory = $_SERVER['DOCUMENT_ROOT'].'/some/path/to/images/';
$handler = opendir($directory);

// open directory and walk through the filenames
while ($file = readdir($handler)) {

    // if file isn't this directory or its parent, add it to the results
    if ($file != "." && $file != "..") {

        // check with regex that the file format is what we're expecting and not something else
        if(preg_match('#^(prefixone|prefixtwo)[^\s]*\.'.$wordpress-slug.'\.[^\s]+(\.(jpg|jpeg|png))#', $file)) {

            // add to our file array for later use
            $results[] = $file;
        }
    }
}

Mittineague · June 28, 2018, 9:38pm

The syntax for glob is a bit different, but I think GLOB_BRACE would work well for getting image files.

glob("*.{jpeg,jpg,png}", GLOB_BRACE)

casbboy · June 28, 2018, 9:53pm

This is actually working okay…

foreach (glob("$wildcard*.jpg") as $filename) {
    echo "$filename size " . filesize($filename) . "\n";
    $results[] = $filename. "\n";
    $count++;
}

But one issue. It matches longer strings too…

the-meg.jpg
the-meg-2.jpg
the-meg-3.jpg

[these should not be included]

the-meg-2018.jpg
the-meg-2018-2.jpg
the-meg-2018-3.jpg

and so on. So it’s figuring out how to match a dash and number and period, without allowing any further dashes.

Cheers!
Ryan

casbboy · June 28, 2018, 10:10pm

How intensive is “file_exists” function? My only solution was this, where a run a high loop until a file can’t be found.

for ($k = 1 ; $k < 10; $k++){ 
	if($k == 1) {
		$fileexists = file_exists("$wildcard.jpg");
	} else {
		$newwildcard = $wildcard."-$k";
		$fileexists = file_exists("$newwildcard.jpg");
	}
	if($fileexists == 1) {
		if($k == 1) {
			$results[] = $wildcard.".jpg";
		} else {
			$results[] = $newwildcard.".jpg";
		}
	} else {
		break;
	}
	$count++;
}

droopsnoot · June 29, 2018, 9:22am

How difficult would it be to change the character that you use to delimit the wildcard from the image count suffix? It strikes me that if you changed that to a character that you do not allow in the wildcard name, your problem would go away. Or use a separate table to link the images to the item, so you can just use any old random unique filename for the images.

casbboy · June 29, 2018, 7:13pm

I’m not exactly sure what you mean. CAn you give example?

Cheers!
Ryan

droopsnoot · July 1, 2018, 5:56pm

In your example, you mentioned that when you search for images for the-meg, it will also find images for the-meg-2018, and I think that’s because you are using the same delimiter. If you change the delimiter between the name and the suffix, and make sure it can never appear in your name field, then it goes away:

the-meg.jpg
the-meg~2.jpg
the-meg~3.jpg
the-meg-2018.jpg
the-meg-2018~2.jpg

and so on. Just search first for name.jpg, then get everything that matches name~, and it should find the correct images. But don’t allow ~ in the name section.

Or, have a table that contains the unique id for wherever your name comes from, and assign a unique id to every image that’s uploaded, and maybe use that as the name. The table then links the image to the name without having convoluted rules, so all you have to do is run a query to retrieve all the images for a given name record.

casbboy · July 2, 2018, 5:58am

Okay, I have about 150,000 images using this delimiter, going to be tough to rename them all. But do understand what you are saying.

Mittineague · July 2, 2018, 6:11am

That quite a lot! AFAIK, there are image apps that can do bulk renames, but unless the renaming could be done as an “every image in a folder” job, regex would be needed.

TBH, I don’t know if it would be more efficient to filter to a smaller set and then use regex, or just use regex from the get go.

In any case, since regex at some point can’t be avoided, all possible naming variations will need to be determined to avoid making errors that could result in an even bigger problem. It is not enough to craft a regex pattern that will match names the regex should match with, it is equally as important that the regex won’t match names it shouldn’t match with.

How are the files organized, one folder, or many?

casbboy · July 2, 2018, 6:27am

Many, but studio name.

So,

/content/[studio-name]/[tons of images here]

So about 100 folders about.

Mittineague · July 2, 2018, 7:06am

That might be a help having separate “studio” folders. Depending on which has the smallest “ton” you could save a backup if you’re going to do any renaming. Even if you don’t do any renaming having a relatively smaller set to work with should be some help.

The file extensions should be easy to put into a regex pattern. What you will need to consider are things like

names always / never / sometimes begin with letters / digits, uppercase / lowercase
how many occur in the name strings before there is a non-alphanumeric character
if there are any sequences of characters that can be used.
etc. etc.

Not always so easy I know. The key is to identify as many patterns as you can.

Is it always the names with the 4 digit year your want to exclude? Are they always enclosed by dashes? Does that pattern never occur in names you want to match?

John_Betong · July 2, 2018, 7:19am

I am currently updating my applicaiton folder in a PHP FrameWork because the original PHP files are missing the new PHP7 declare(strict_types=1);. The declaration which makes it so much easier to debug.

I have written the following script which should be modified to suit your requirements.

I would suggest initially:

creating a temporary folder
populate the folder with some dummy images
have a dry-run by not trying to rename, replace, etc.
once the relevant files are listing to your satisfaction
a. count the relevant files
b. when the total matches your requirements adjust the search criteria

<?php 
declare(strict_types=1);

ini_set('html_errors', 'TRUE');
ini_set('display_errors', 'TRUE');
ini_set('display_startup_errors', 'TRUE');
error_reporting(-1);

# $ffs = glob('application/*.php');
$ffs = rglob('application/the-meg*.*');

$cntHas = 0;
$cntNot = 0;
foreach($ffs as $i2 => $ff):
	$tmp = file_get_contents($ff);
	if( strpos($tmp, 'declare(strict_types=1);') ):
		echo '<br>Already ==> '.$ff;
		$cntHas++;
	else:	
		echo '<br> &nbsp;&nbsp;&nbsp;Trying to insert ==> '.$ff;
		echo '<br>&nbsp;&nbsp;&nbsp;' .replace($ff, 'declare(strict_types=1);');
		$cntNot++;
	endif;	
endforeach;	
echo '<br>$cntHas ==> ' .$cntHas;
echo '<br>$cntNot ==> ' .$cntNot;

# ======================================
function replace($fileName, $insert='declare(strict_types=1);')
:string
{
	$result = 'Problem'; // DEFAULT

	$tmp 		= file_get_contents($fileName);
	$subScr = substr($tmp, 6);  
	echo 'XXX ==> ' .strlen($tmp);

	$txt 		= "<?php \n" 
					.		$insert ."\n" 
					.	$subScr;

	$ok = FALSE;
	try {
		$rsc 	= fopen($fileName, "w"); #  or die("Unable to open file!");
		if($rsc):
			$ok  = fwrite($rsc, $txt);
		endif;	
	} catch (Exception $e) {
    echo 'Caught exception: ==>';
    echo '<pre>';	
    	print_r($e);
    echo '</pre>';	
	}
	if($ok):
		$result = 'Success ==> ' .$fileName;
		fclose($rsc);
	endif;	

	return $result;
}

# ======================================
# Does not support flag GLOB_BRACE
# ======================================
function rglob($pattern, $flags = 0)
:array
{
	$files = glob($pattern, $flags); 
	foreach (glob(dirname($pattern).'/*', GLOB_ONLYDIR|GLOB_NOSORT) as $dir) {
	    $files = array_merge($files, rglob($dir.'/'.basename($pattern), $flags));
	}

	return $files;
}

# ====================================
function fred($val=NULL)
:string
{
	echo '<div class="width:88%; margin:1em auto">';
	echo '<pre>';
		print_r($val);
	echo '</pre>';

	return '';
}

Edit:
Spelling not my forty

droopsnoot · July 2, 2018, 9:34am

If you start the renaming process using the longest studio names, then your scenario in post #5 should not occur - by the time you get around to finding the images for the-meg, those for the-meg-2018 and the-meg-with-any-other-suffix will already be gone.

system · October 1, 2018, 4:34pm

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Glob() to get all files in a dir -- filter names -- then get count PHP	6	1193	October 22, 2014
Image Extension Wild Card PHP	9	1966	July 19, 2016
Foreach? PHP	20	6616	October 8, 2014
List images in a directory with parameters PHP	2	756	April 16, 2015
Php image array PHP	3	7410	October 8, 2014

Count Files in Directory by Wildcard

Related topics