Read ms word file from php script

Dear all,

i have this php code which read ms word file, the problem is that the .doc file is not plain text, the script read the .doc file but is also get many special char which is not needed i need the word content only i also will run this script on linux server so i must not use the COM object

$userDoc = “rev.doc”;

function parseWord($userDoc)
{
$fileHandle = fopen($userDoc, “r”);
$line = @fread($fileHandle, filesize($userDoc));
$lines = explode(chr(0x0D),$line);
$outtext = “”;
foreach($lines as $thisline)
{
$pos = strpos($thisline, chr(0x00));
if (($pos !== FALSE)||(strlen($thisline)==0)) { }
else { $outtext .= $thisline." “; } }
$outtext = preg_replace(”/[^a-zA-Z0-9\s\,.
\r\ @\/_()]/“,”",$outtext);
return $outtext;
}

$text = parseWord($userDoc);
echo $text;

it get the content but after this it print the bad data like the next i need the word data only

Word.Picture.8 DDSr2D6qFXt1tFDxJih8FDDpnFODWZgu3l4XJQDd F (EZmt)1WlLrOoGNmED2xq2Dcm9W_jEDnnIauElLv43Qo/M4I4I52DU2tqRMlnSDDNIkx@v22DS4DDnMuQxfoWfMP4tf .Xyd ) XDL.sq nREfr gU o v nPDTcnrc7pyPrK)UnbfRImNIKbikBff@neEn wk_sSpRXg6dvd_v_/czqnWS/lzHZ2OAZYvEZNEd6mhwc6QVipe1ec 5c9FAWUZk5n M7eLRsSiFYolJ MyCfoUaY3D5fP/dn9v ivWs qXkt9)7nM GthIvFv.A A)xOe/17 yzo0tCnoVO wsNubtRqMo6BmK ,.AGt jcc 9p.OWnRd 8jDrIPsjW1I.Qbq VA7U3lu.

Greetings!
MS Word files not only contain texts but also objects, images, equations etc… so you can not read it by scanning line by line this way. Because you want your script to run on a Linux server, COM is not a solution. You may search the Internet for functions/class that suit your need.
I found this class, but it is used to create .doc files, not read from them. Anyway, it may be helpful to you:
phpclasses dot o r g slash browse slash package slash 2631 dot html

PS: The forum doesn’t allow members with less than 10 posts (like me) posting links, so I use that format instead. :slight_smile:

Hi

If you save the word file as an RTF file you can do a search and replace for words and phases.

I have a letter generation script that looks for words to replace in an array.


$filename = "/path/to/file/name.rtf";

//FUNCTION TO do the replacing

function textreplace($needle,$haystack){
	
	foreach($needle as $search => $value){
			
		
			$haystack=str_replace("$search","$value",$haystack);
			

			//echo $haystack."<br>";
	}
		
	return $haystack;
		
}

$handle = fopen($filename, "r");
$contents = fread($handle, filesize($filename));
fclose($handle);

//assign the variables to the array

$mailmerge=array();

//REPLACE INFO
$mailmerge['phrase_1']="text for phase 1";
$mailmerge['phrase_2']="text for phase 2";

$haystack=textreplace($mailmerge,$contents);

Header('Content-Type: application/rtf');
Header("Content-disposition: inline; filename=Filename.rtf");
echo $haystack;

And that will output your file as a download

Hope that helps

Keith

Even an RTF contains special formatting codes as well as content. You’d need to convert to plain text if the text is all you want.