thatās because classes do not exist in XML. and DOM targets XML as well as HTML.
Yeah I only will be needing to loop through HTML. With the knowledge that I still need about 9 more elements to pull, do you recommend looping still?
[quote=āDormilich, post:18, topic:115635ā]
(didnāt the XPath work?)
[/quote]Couldnāt get it working due to my noobness. How does your xpath factor into my code I have now? How can I rewrite it to remove hte unneeded loops and be optimal?
I cannot guarentee there only being one such wrapper. User uses a WYSIWYG and who knowsā¦
If you have Windows, even if you donāt use the SDK the chm in it is a valuable reference for things XML
vvv download page vvv
https://www.microsoft.com/en-us/download/details.aspx?id=3988
it depends which elements you need, up till now only the h1 was mentioned.
I figured once I get one element, I can sort of use the same logic to get the other elements.
The number of elements can vary depending on how many paragraph tags there are. Iāll need at least the h1, img, and a handful of paragraph tags. Also 1 span.
You see now in my current code I use if() to determine if the header matches my criteria. I was going to just add conditions to match what I need. Maybe switches.
@Mittineague , Iām afraid Iām very restricted here at work. That will not be of help to me Iām afraid. It looks like I have to download that ot use it.
Update: I have this so far, which finds the first h1 and sets it into a variable.
I do have a question though, how can I use the getElementsByTagName (or something similar) as part of an IF condition? I need to do find if the certain element Iām using in my loop is a certain tag name, and then I need to add it as part of a variable
<?php
error_reporting(E_ALL);
if(true)
{
if(!isset($_POST['submit']))
{
?>
<form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]); ?>" method="post">
<label for="url">Enter the URL of the article:</label> <input id="url" name="URL" type="text" />
<label for="email">Enter the E-mail of the person this will go to:</label> <input id="email" name="email" type="text" />
<label for="submit"><input id="submit" class="button" name="submit" type="submit" /></form>
<?php
}
else if(filter_var($_POST['URL'], FILTER_VALIDATE_URL) === false || filter_var($_POST['email'], FILTER_VALIDATE_EMAIL) === false)
{
?>
<div class="error"><p>Error: The URL or e-mail address you entered was invalid. Please try again.</p></div>
<form action="<?php echo htmlspecialchars($_SERVER["PHP_SELF"]); ?>" method="post">
<label for="url">Enter the URL of the article:</label> <input id="url" name="URL" type="text" />
<label for="email">Enter the E-mail of the person this will go to:</label> <input id="email" name="email" type="text" />
<label for="submit"><input id="submit" class="button" name="submit" type="submit" /></form>
<?php
}
else
{
$url=filter_var($_POST['URL'], FILTER_SANITIZE_URL);
$email=filter_var($_POST['email'], FILTER_SANITIZE_EMAIL);
$masthead="";
$title="";
$datetime="";
$leftImage="";
$article="";
$footer="";
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
$doc->loadHTMLFile($url);
$xpath=new DomXPath($doc);
$results = $xpath->query("//div[contains(@class, 'page-main-content')]");
foreach($results as $cr)
{
$title=$cr->getElementsByTagName('h1')->item(0)->textContent;
}
echo $title;
}
}
?>
So far, everything is figured out. Iām beginning to transfer my content over to the e-mail template. I donāt think Iāll run into any more issues but Iāll let you know if I do! Thanks for everything.
you are aware that the code as given only uses the last found h1?
Yes: I should note that my code has changed dramatically since the last post. I almost entirely redid it.
So I have this now:
$doc = new DOMDocument;
$doc->preserveWhiteSpace = FALSE;
$doc->loadHTMLFile($url);
$xpath=new DomXPath($doc);
$results = $xpath->query("//div[contains(@class, 'page-main-content')]");
foreach($results as $cr)
{
//find first h1, add it to $title element
$title=$cr->getElementsByTagName('h1')->item(0)->textContent;
//find first img, add it to $title element
$leftImage=$cr->getElementsByTagName('img')->item(0)->getAttribute('src');
$url_info = parse_url($leftImage);
if (!isset($url_info['host']))
{
$path = $url_info['path'];
if (substr($path,0,1) !== '/') $path = '/'.$path;
$leftImage= $dir.$path;
}
$leftImageAlt=$cr->getElementsByTagName('img')->item(0)->getAttribute('alt');
//find all paragraph tags, add them to $article
for($i=0;$i<$cr->getElementsByTagName('p')->length;$i++)
$article.=$cr->getElementsByTagName('p')->item($i)->textContent."<br>";
}
I have the h1, and the image being grabbed. Now, how can I set it up so that besides these elements Iām grabbing (first image, first h1) that it will grab ALL OTHER elements and add it into the $article variable? Right now itās just grabbing all paragraphs but realistically I need all other nodes/elements added into it aside from the few elements Iām grabbing.
I want it equivilant of me copy/pasting the other pages HTML, and putting it all inside $article (aside from a few select elements.)
all other elements of what?
if you can live without line breaks: $cr->textContent
, otherwise you need a clear picture of what elements should be displayed how.
[quote=āDormilich, post:34, topic:115635ā]
if you can live without line breaks: $cr->textContent
[/quote]Iām using that but concentating a <br>
into it.
Letās say I have this sort of structure (pseudo code)
div.page-main-content
āh1
āspan of date time etc
āp
āp
āp
āimg
āul
----li
ā/ul
āp
āp
ā/div
Now, I want EVERY element there to be added to $article, except for the img, and h1 (first occurance of each, and ignore all others.)
It can be any sort of tags that has the text. Not just paragraphs and ULs. It could be other spans or anything.
Basically this entire div holds an article. I want the whole article as the value for $article except for a few key elements.
http://www.codefundamentals.com/test2.php
See the header āMiddle East Forum Presents: Dr. Robert Rubinstein, āCulture, Interagency Dynamics, and Health in the Middle Eastāā?
From there, until the ending āThe Regional Forum Lectures are sponsored by the Class of 1993.āā¦all that needs to be in my $article variable minus a few elements.
If you go to codefundamentals.com/test.php
Enter in the test2.php URL and then a random email (it does nothing so far) youāll see what itās outputting now.
It works great except I need to exclude the last 2 paragraphs, and also I donāt account for any spans or ul/li or any other tags that might appear. Right now itād be easy to miss chunks of text or lists with my method.
I just went through all the other articles as a baseline and they only use paragraphsā¦so I think Iāll be fine.
If grabbing EVERY element and then sorting out from there to exclude the first h1/img etc is too much work, then perhaps we can just move on.
That being said, I believe my script is finished unless you can optimize it.
in your for() loop, use length - 2 as break condition.
Yes, I realized that shortly after I posted . I had to make it 3 actually but nevertheless.
Do you have any suggestions as to the more specific grabbing of elements as noted in post #37? Iām fine if you donāt.
grabbing all elements gives you a plain (one-dimensional) list of items. the term first then may lose its necessary context. if that doesnāt matter, get that list, find the desired items and remove them (removeChild() returns the removed element to you, so you can still grab its content).