How To Filter Content Before Loading On Screen?

My Fellow Php Folks!

I am now doing a peeping Tom into Php-Proxy script to gain work experience in building a web proxy as I am struggling to build one from scratch.

The script has 2 major pp files.
index.php
config.php

I am including both’s code below.
I need your aid to add comments on the lines so by looking at your comments I can learn what each line does. So, who will be the good Samaritan ?
In the past, the following have helped. Ranked according to who started helping first (according to my memory):

SpaceShipTrooper
droopsnoot
mlukac89
SamA74
John_Betong
CpRadio

If I’ve forgotten anybody then please excuse.
Let’s not forget the mighty TechnoBear.

Actually, let’s play a little game and make this forum a little fun.
The first person who reads this, make your comments on the first few lines. The 2nd person who reads this can do for the next few lines and so on.
That way, everyone contributes a little.
Ok, the index.php, I believe is the actual script and even though it has comments they are very brief and not so friendly to a newbie. Therefore, any volunteering to make them in depth would be appreciated by all newbies.
Once I finish learning from your comments, then it would be easy for me to complete my other learning project where I try building the web proxy from scratch.

Thanks :blush:

PS - In the index.php, on which line do I add the content filtering code ?
This is where, I will list a list of banned words and if these banned words are found on the page the user is trying to load via the web proxy then the page would not load. I’ll write code for the user to get alerted that the page won’t load because it has banned words in it’s content/title/ meta keywords/meta descriptions/file names/img names/link anchors/etc.

index.php

<?php

define('PROXY_START', microtime(true));

require("vendor/autoload.php");

use Proxy\Http\Request;
use Proxy\Http\Response;
use Proxy\Plugin\AbstractPlugin;
use Proxy\Event\FilterEvent;
use Proxy\Config;
use Proxy\Proxy;

// start the session
session_start();

// load config...
Config::load('./config.php');

// custom config file to be written to by a bash script or something
Config::load('./custom_config.php');

if(!Config::get('app_key')){
	die("app_key inside config.php cannot be empty!");
}

if(!function_exists('curl_version')){
	die("cURL extension is not loaded!");
}

// how are our URLs be generated from this point? this must be set here so the proxify_url function below can make use of it
if(Config::get('url_mode') == 2){
	Config::set('encryption_key', md5(Config::get('app_key').$_SERVER['REMOTE_ADDR']));
} else if(Config::get('url_mode') == 3){
	Config::set('encryption_key', md5(Config::get('app_key').session_id()));
}

// very important!!! otherwise requests are queued while waiting for session file to be unlocked
session_write_close();

// form submit in progress...
if(isset($_POST['url'])){
	
	$url = $_POST['url'];
	$url = add_http($url);
	
	header("HTTP/1.1 302 Found");
	header('Location: '.proxify_url($url));
	exit;
	
} else if(!isset($_GET['q'])){

	// must be at homepage - should we redirect somewhere else?
	if(Config::get('index_redirect')){
		
		// redirect to...
		header("HTTP/1.1 302 Found"); 
		header("Location: ".Config::get('index_redirect'));
		
	} else {
		echo render_template("./templates/main.php", array('version' => Proxy::VERSION));
	}

	exit;
}

// decode q parameter to get the real URL
$url = url_decrypt($_GET['q']);

$proxy = new Proxy();

// load plugins
foreach(Config::get('plugins', array()) as $plugin){

	$plugin_class = $plugin.'Plugin';
	
	if(file_exists('./plugins/'.$plugin_class.'.php')){
	
		// use user plugin from /plugins/
		require_once('./plugins/'.$plugin_class.'.php');
		
	} else if(class_exists('\\Proxy\\Plugin\\'.$plugin_class)){
	
		// does the native plugin from php-proxy package with such name exist?
		$plugin_class = '\\Proxy\\Plugin\\'.$plugin_class;
	}
	
	// otherwise plugin_class better be loaded already through composer.json and match namespace exactly \\Vendor\\Plugin\\SuperPlugin
	$proxy->getEventDispatcher()->addSubscriber(new $plugin_class());
}

try {

	// request sent to index.php
	$request = Request::createFromGlobals();
	
	// remove all GET parameters such as ?q=
	$request->get->clear();
	
	// forward it to some other URL
	$response = $proxy->forward($request, $url);
	
	// if that was a streaming response, then everything was already sent and script will be killed before it even reaches this line
	$response->send();
	
} catch (Exception $ex){

	// if the site is on server2.proxy.com then you may wish to redirect it back to proxy.com
	if(Config::get("error_redirect")){
	
		$url = render_string(Config::get("error_redirect"), array(
			'error_msg' => rawurlencode($ex->getMessage())
		));
		
		// Cannot modify header information - headers already sent
		header("HTTP/1.1 302 Found");
		header("Location: {$url}");
		
	} else {
	
		echo render_template("./templates/main.php", array(
			'url' => $url,
			'error_msg' => $ex->getMessage(),
			'version' => Proxy::VERSION
		));
		
	}
}

?>

config.php

<?php

// all possible options will be stored
$config = array();

// a unique key that identifies this application - DO NOT LEAVE THIS EMPTY!
$config['app_key'] = '04e8155d1ddc8d00c578a7ffc0018692';

// a secret key to be used during encryption
$config['encryption_key'] = '';

/*
how unique is each URL that is generated by this proxy app?
0 - no encoding of any sort. People can link to proxy pages directly: ?q=http://www.yahoo.com
1 - Base64 encoding only, people can hotlink to your proxy
2 - unique to the IP address that generated it. A person that generated that URL, can bookmark it and visit it and any point
3 - unique to that session and IP address - URL no longer valid anywhere when that browser session that generated it ends
*/

$config['url_mode'] = 2;

// plugins to load - plugins will be loaded in this exact order as in array
$config['plugins'] = array(
	'HeaderRewrite',
	'Stream',
	// ^^ do not disable any of the plugins above
	'Cookie',
	'Proxify',
	'UrlForm',
	// site specific plugins below
	'Youtube',
	'DailyMotion',
	'RedTube',
	'XHamster',
	'XVideos',
	'Twitter'
);

// additional curl options to go with each request
$config['curl'] = array(
	// CURLOPT_PROXY => '',
	// CURLOPT_CONNECTTIMEOUT => 5
);

//$config['replace_title'] = 'Google Search';

//$config['error_redirect'] = "https://unblockvideos.com/#error={error_msg}";
//$config['index_redirect'] = 'https://unblockvideos.com/';

// $config['replace_icon'] = 'icon_url';

// this better be here other Config::load fails
return $config;

?>

You have chosen a difficult learner’s task which requires an in depth knowledge of php classes.

I would be tempted to try the following free solution and concentrate on your main goal.

1 Like

Speaking of classes, I would suggest to wrap that functionality in a proper service which would allow injecting a proxy mock for your unit tests…

<?php

use Proxy\Proxy;

/**
 * Service to proxy content requests
 */
class ProxyService {

  /**
   * Proxy instance
   *
   * @var Proxy
   */
  private $proxy;

  /**
   * Constructor function
   *
   * @param Proxy|null $proxy
   */
  public function __construct($proxy = null) {
    $this->proxy = $proxy ?: new Proxy();
  }
}

Sorry John. But I don’t quite understand what you mean.
Are you saying that to build a web proxy or atleast the web proxy that I have chosen (php-proxy) has codes that include php classes and it is a complicated subject and you deem I try another approach. Discuss approach ?
I really don’t understand the Disqus approach though.
Are you hinting that, I open a thread there as my topic is broad and I will gain a lot of replies which in turn will earn me revenues. Kill 2 birds with 1 stone:

  1. Get my questsions answered and;
  2. Earn money on the side.

??

I had a look at the disqus site just now and if I’m correct then they are sort of a forum/yahoo answers type host where we can launch our own forum/tyahoo answers like threads and when people engage in it then we earn cpm rates per each time our threads load. Do correct me if I am wrong…
Anyway, what discussions have you started so far (or ever) at Disqus or anywhere where you are profiting from the user engagements ?
And, are you aware of a simpler web proxy php script that I can do a Peeping Tom on on the source code for my learning purpose ?

I’m not seeing it either. AFAIK, Disqus is comment system.

What you posted looks to be more about how to work with a code library.

Working with that code means that you’ll not only need to visit the PHP documentation, but also the library’s documentaion. If the library does not have documentation you’ll need to be able to “read” it. If you need to “hack” any of the library’s code, you’ll need to thoroughly understand it or be prepared to have bugs or worse.

2 Likes

Maybe, John knows something we don’t know ? No offense, but probably has better eyes and sense than us all put together when it comes to spotting a money making system or a money making ‘chance’. Let us see what he replies. I think he is onto something. Big. Very, very big!

And I thought that, I was the best around here when it comes to coming-up with your own money making concepts or spotting a money maker here and there.

Php Folks,

In what way would you build your content filter for your web proxy ?
Imagine, you gave you children a web proxy. Installed it on their computers so you can have them privately surf the web but all while you can track their movements in order to make sure they are behaving.
Ok, children safe filters exist but I’m talking of building my own ontop of an existing web proxy (Php-Proxy which I did not write) so I can customize the features to my needs. Plus, I get to learn more php, this way. It is all about learning, really. :wink:
Now, imagine you don’t want them viewing or downloading from bad sites like software pirate sites, porn sites, pirate music sites, etc.
Now, imagine you are using Php-Proxy and find-out it does not have a content filter or banned words filter and you decide to write-up your own code. Let us call this “your mini script” which is a chunk of php code (a few lines) which you would add onto your chosen web proxy (eg. Php-Proxy). How would you write it ?
Here are a few methods I am guessing you could use but which one would you use and why that one over the others and which ones you would stay away from and why from them ? What are flaws in their methods ?

Q1. Would you build a mini script that would:

1). Check the meta content of the page they load on their screen ?
2). Check the img file names and ALT tags ?
3). Check the content on the page as a general check and that would be enough as it would also check all the things mentioned above ?

Q2. How would you prevent downloads such as video downloads, img downloads, software (.exe) downloads ? Got to prevent the downloads to prevent them downloading trashy imgs and trashy clips and trashy .exe (that might be malware).
And so, what method would you use to prevent these downloads ?
I’m guessing you would get your mini script to check the links for what their extention types are. Right ? Yes or no ? If so, then how would you get your mini script to deal with it so the downloads don’t download ? This is how I might do it and I need your advice if the method is sound or not.
I’d get my mini script to replace (str_replace/preg_match) the link file extensions on the proxified pages. Only those links that download anything. Not those links (.html, .shtml, .php, jpeg, .giff, .pdf, etc.) that take you to another page. That way, the download links would become useless. The browser won’t understand it is a link that downloads something. If you deem this method is ok then tell me, how do I know which link leads to another page and which link leads to a download ? Ok, I can check for .zip, .rar (zip files) but any other extension or anything else I should get my mini script to check for to spot a “downloading link” ?
Is there a php function that checks for download links ?

Q3. Is it possible to load a webpage in the background then get your mini script to check the content and if the filter gets the page not flagged then load the page on screen ? That way, the users don’t view pages containing banned content ? I managed to do this on my .exe tool (free tool which I may upload to this forum for you guys to check it out) but I don’t have enough experience with php and so need your advice and tips.

Q4. How would you prevent viewing streaming sound or video files ? How to detect streaming ?

Q5. Reading the 4 questions above no doubt has given you some ideas to which php functions I should be using and so which ones you have in mind that would do me the job ?

So, what do you think ? What are your answers for all my 4 questions ?

And no. I can’t be creating a whitelist to only allow these or those sites. Will become too restrictive. Can’t create a blacklist either as there are too many sites to blacklist and we never know them all. So, we are back to square one: Content and File Types Filtering.

Doing a search now for:
banned words filtering in proxies

And, checking this out to see if the source code would be available for me to get my code snippets:
http://www.ipcop.org/2.0.0/en/admin/html/services-urlfilter.html
(I’m not affiliated with them. Checking the site out for the firs time. Just mentioning the link so you understand to what kind of things I want the script to do.

Looks like I was mistaken and did not understand the original post.

I got the idea that an online web-proxy web-page was to be created and a comments system to be incorporated. The comments were to be checked before being added.

As mentioned, Disqus has a robust commenting application, once registered filtered comments for that particular web-page can be included with a single line.

1 Like

How-about getting answers to my questions (see original post) so I can move along with this project ?
Or, we’ll be stuck here forever. It has been 4 days now. :frowning:

Can you supply the above file contents?

Edit:
Why have declare, error_reporting, etc not been included?

I supplied an online web-proxy solution with source code that displays web page contents.

The script also renders your str_replace “proxified” strings.

If the final result is not what is required I suggest modifying your “proxified” strings.

Frankly, I don’t have any experience with php Classes.

I have not touched the Php-Proxy script yet. Hence the error reporting codes has not been added by me yet. If the error reporting codes are missing then the author did not include them.

Web Proxy Details

Download Link:
https://www.php-proxy.com/download/php-proxy.zip

John, let me tell you what I want to do. You will no doubt find it interesting.
I want to:

  1. Add Content Filter
    (The code is still in progress at my other thread: How To Count Banned Words On Thread).

  2. Add Link Click Tracker
    (That way, the tracker would log all pages proxified. Meaning whatever proxified webpages the users browse would be logged into my mysql db like this:

                           on-screen/271236/1               

2 | UI Man | https://www.sitepoint.com/community/t/how-to-count-banned-words-on-
page/271432/38

That is just a simple example. Ofcourse, their would be other columns such as ip, country, time, searchengine, keywords searched, etc.

At this moment, I don’t need help to add the tracker as I know how to do it. But, if I get stuck then you’re welcome to help. Ofcourse.

Thanks for your willingness to help.

Does anybody know the answers to my 4 questions asked in my original post ?
Or, were my questions too much ?

I am confused in one post in this topic you mention being stuck for four days and later state " you have not touched the Php-proxy script yet".

Users here freely spend their time trying to help others who have made an effort in solving a problem.

Perhaps your ideas would be more appreciated by raising a Crowd Funding Scheme

1 Like

John_Betong,

I have LIKED your post!
I just read my previous post to see what you are replying too. I actually made that post when I was about to go to sleep. Looking at it now (while wide awake) seems that, this part probably did not seem polite:

“Or, were my questions too much ?”

I actually did not ask that in a sarcastic way. I wanted to know if I asked too many questions in one post because it is obvious that, if the questions in number seem too many then no-body’s gonna bother scrolling up and down to read each of my questions while replying to them one by one. That maybe the cause to why I am not getting any responses. (So, I thought and still do so). Other times, when I ask very few questions on my threads and posts, I always get replies more than expected. Therefore, I have now come to the conclusion to ask very little questions per post.

Anyway, thanks for reminding me about the Crowd Funding site. I came across it a few times when reading lists of top 10 this or that. Then, forgot about it.

Well, like I said, my projects are for 3 reasons:

  1. Gain work experience in php;
  2. Earn money for the users of web 2.0 (those who use the services run by the php scripts I build);
  3. Make a variety of php project codes available in this forum so other newbies get attracted to this forum via google and team-up with us. The more engagements in a forum with like-minded people the more fun. Else, the forum seems dead.

If I was really ONLY into building a php script/project to earn money (run a business) then I would get someone experienced like you from freelancers.com. But, doing things that way does not help me learn anything about php, now does it ?
I’ll learn more this way, squeezing the juice out of every forums’ senior php developers I can.
When you learn things in class room from a book and your teacher, the learning is raw in your mind. You won’t find work due to not having gained work experience. You’d have to go and work as an apprentice with some web developing company to gain work experience. Go out in the field. On the job training.
My training fields are the forums. :wink:
By the time I finish learning php, I won’t need to go out on work experience as an apprentice. I’m gaining the required (from the forums) while still learning. Saving time. :wink:

I forgot to answer you in my previous post.
Yes, I have not touched the code side of Php-Proxy (third part script). I saw it has no Admin Account and so no filtering of sites will be available.
Therefore, I figured I need to write my own filter with php and so I opened this thread and the other one and been working on this filter writing for 5 days now. I hope I am clear now.

Thanks!

Off Topic

@uniqueideaman: please stop quoting entire posts unnecessarily.

It makes the topic very long and cumbersome, and it is difficult for those on small screens, or who have problems scrolling. Just quote excerpts where required.

Thank you.

3 Likes

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.