The pregmatch is the issue. I think this is closer to what I need, but I suck at pregmatch.
// create an array to hold directory list
$results = array();
// create a handler for the directory
$directory = $_SERVER['DOCUMENT_ROOT'].'/some/path/to/images/';
$handler = opendir($directory);
// open directory and walk through the filenames
while ($file = readdir($handler)) {
// if file isn't this directory or its parent, add it to the results
if ($file != "." && $file != "..") {
// check with regex that the file format is what we're expecting and not something else
if(preg_match('#^(prefixone|prefixtwo)[^\s]*\.'.$wordpress-slug.'\.[^\s]+(\.(jpg|jpeg|png))#', $file)) {
// add to our file array for later use
$results[] = $file;
}
}
}
How difficult would it be to change the character that you use to delimit the wildcard from the image count suffix? It strikes me that if you changed that to a character that you do not allow in the wildcard name, your problem would go away. Or use a separate table to link the images to the item, so you can just use any old random unique filename for the images.
In your example, you mentioned that when you search for images for the-meg, it will also find images for the-meg-2018, and I think that’s because you are using the same delimiter. If you change the delimiter between the name and the suffix, and make sure it can never appear in your name field, then it goes away:
and so on. Just search first for name.jpg, then get everything that matches name~, and it should find the correct images. But don’t allow ~ in the name section.
Or, have a table that contains the unique id for wherever your name comes from, and assign a unique id to every image that’s uploaded, and maybe use that as the name. The table then links the image to the name without having convoluted rules, so all you have to do is run a query to retrieve all the images for a given name record.
That quite a lot! AFAIK, there are image apps that can do bulk renames, but unless the renaming could be done as an “every image in a folder” job, regex would be needed.
TBH, I don’t know if it would be more efficient to filter to a smaller set and then use regex, or just use regex from the get go.
In any case, since regex at some point can’t be avoided, all possible naming variations will need to be determined to avoid making errors that could result in an even bigger problem. It is not enough to craft a regex pattern that will match names the regex should match with, it is equally as important that the regex won’t match names it shouldn’t match with.
That might be a help having separate “studio” folders. Depending on which has the smallest “ton” you could save a backup if you’re going to do any renaming. Even if you don’t do any renaming having a relatively smaller set to work with should be some help.
The file extensions should be easy to put into a regex pattern. What you will need to consider are things like
names always / never / sometimes begin with letters / digits, uppercase / lowercase
how many occur in the name strings before there is a non-alphanumeric character
if there are any sequences of characters that can be used.
etc. etc.
Not always so easy I know. The key is to identify as many patterns as you can.
Is it always the names with the 4 digit year your want to exclude? Are they always enclosed by dashes? Does that pattern never occur in names you want to match?
I am currently updating my applicaiton folder in a PHP FrameWork because the original PHP files are missing the new PHP7 declare(strict_types=1);. The declaration which makes it so much easier to debug.
I have written the following script which should be modified to suit your requirements.
I would suggest initially:
creating a temporary folder
populate the folder with some dummy images
have a dry-run by not trying to rename, replace, etc.
once the relevant files are listing to your satisfaction
a. count the relevant files
b. when the total matches your requirements adjust the search criteria
If you start the renaming process using the longest studio names, then your scenario in post #5 should not occur - by the time you get around to finding the images for the-meg, those for the-meg-2018 and the-meg-with-any-other-suffix will already be gone.