This article will explain step by step, a process of creating some simple PHP scripts and data base management system such as MySQL
inorder to parse, cache and serve dynamic RSS feeds.
But before we go into those details let us see what is RSS.
What is RSS?
From WikiPedia entry, RSS (which, in its most recent format, stands for "Really Simple Syndication") is a family of web feed formats used to publish frequently updated content such as blog entries, news headlines or podcasts. An RSS document, which is called a "feed", "web feed", or "channel", contains either a summary of content from an associated web site or the full text. RSS makes it possible for people to keep up with their favorite web sites in an automated manner that's easier than checking them manually.
RSS content can be read using software called a "feed reader" or an "aggregator." The user subscribes to a feed by entering the feed's link into the reader or by clicking an RSS icon in a browser that initiates the subscription process. The reader checks the user's subscribed feeds regularly for new content, downloading any updates that it finds.
There are various popular RSS feed readers available such as Google Reader or RSS reader Some require you to download the software on your computer while some allow you to read the feeds inside a web browser.
The initials "RSS" are used to refer to the following formats:
* Really Simple Syndication (RSS 2.0)
* RDF Site Summary (RSS 1.0 and RSS 0.90)
* Rich Site Summary (RSS 0.91)
RSS formats are specified using XML, a generic specification for the creation of data formats.
The RSS 2.0 specification can be found here
RSS feeds can be used for various purposes:
Why You Would Use RSS:
1. News: To get the freshest news on your favorite celebrity, the country you are about to visit, or your favorite sports team.
e.g. BBC World News has this RSS enabled website which allows you to easily syndicate latest news from around the world in your browser
2. Hobby interests. If you are a food lover, a technocrat, a pottery enthusiast, or perhaps a dog trainer, hundreds of conversations and bits of hobby advice can be fed directly to your screen.
e.g. The MyKoreanKitchen web journal has RSS feed that allows you to have advice from Sue on how to dish out tastiest Korean dishes.
3. Cartoon Strips: If you love a cartoon strip like Dilbert (which is one among of the most popular) and would like to syndicate it daily (along with past say, ten dilbert strips) then you can subsrcibe to this to enjoy daily dose of the the cartoon.(Note: This is unofficial feed for Dilbert and is completely un-associated with the offical site for dilbert which does not have a RSS feed but does send a dilbert a day in your inbox )
I have created one such RSS feed for Chintoo - which is a popular marathi cartoon. For more information look here.
Among all kinds of feeds this last type of feed is arguably most interesting - to cartoon lovers for obvious reason and to folks who like to play with PHP, MYSQL and all things webdesign.
The feed or the location of the cartoon strip is usually changing everyday so parsing it, caching and serving it makes it all the more challenging than other types of feeds.
How to make RSS Feed
This article assumes that the reader is familiar with elements of a RSS 2.0 feed and also has rudimentary knowledge of PHP, MySQL.
Also, it is assumes that a strip will be usually some kind of image - .jpeg, .gif or something similar.
Step1: Creating MYSQL database and user accounts.
We will need a database to cache the feed URLs and serve them as and when requested by the client browser.
Create a table with following columns:
1. image_id
2. image_date
3. image_r_date
4. image_url
5. image_size
Log into your mysql database and on mysql prompt:
mysql>create table rss_cache{
image_id int(4) PRIMARY NULL auto_increment,
image_date char(15) YES,
image_r_date char(35) YES NULL,
image_url char(255) YES MUL,
image_size int(6) 0};Note that the image_id field is an autoincrementing and a primary-key field. This is necessary in order to allow us to fetch last 10 cartoon strip URLs from the database.
The image_url can either be MUL meaning, it can contain multiple occurances of URL or a PRI meaning
it can contain unique occurances.
The image_size is zero by default.
Create a user for this table which has very restricted access rights to the databse. This will ensure a basic saftey precaution against hacking your database is observed.
Note: For this you need to have creat-user previleges yourself and a database named as mydatabse.
shell> mysql --user=root mysql
mysql> GRANT SELECT,INSERT
-> ON mydatabse.*
-> TO 'rss_user'@'localhost'
-> IDENTIFIED BY 'somepassword';Ok. Now that you have setup a database, let us get down to some PHP scripting for parsing and caching the feed URL.
Step2: Parse and Cache the dynamically varying URL
With your faviorate editor, edit a php file say, configuration.php
This file will parse and cache the feed URL.
We will create another file which will actually serve the feed.
The separation is a simple yet effective protection of your PHP script from most of the hackers.
For this very reason, first, add a simple access protection prohibiting direct access.
if (realpath ($_SERVER['SCRIPT_FILENAME']) == realpath (__FILE__))
{
// Prevent direct access
die ( 'No donut for you...Now Move along' );
}Then have a various flavors of dates and a link-URL where the comic strip is found.You will also need a little regex matching to extract the actual image strip URL from the link URL.
$date_var=date(' d F Y ');
$date_holder=date(Ymd);
$another_date=date(Ym);
$date_r_var=date(r);
$link="http://www.dilbert.com/comics/dilbert/archive
/dilbert-$date_holder.html";
//Dump the URL in the file
$url=@file($link);
$match="/dilbert+$another_date+[\S]+\.gif/";
// Parse through the array for Dilbert's GIF image.
foreach ($url as $line_num=>$line)
if(preg_match($match,$line,$matched_url))
{
$blogurl='www.dilbert.com/comics/dilbert/
archive/images/dilbert.$matched_url[0];
$size = remote_filesize($blogurl);
break;
}
/*
* Courtsey:http://snipplr.com/view/29/get-remote-filesize/
* (mixed)remote_filesize($uri,$user='',$pw='')
* returns the size of a remote stream in bytes or
* the string 'unknown'. Also takes user and pw
* incase the site requires authentication to access
* the uri
*/
function remote_filesize($uri,$user='',$pw='')
{
// start output buffering
ob_start();
// initialize curl with given uri
$ch = curl_init($uri);
// make sure we get the header
curl_setopt($ch, CURLOPT_HEADER, 1);
// make it a http HEAD request
curl_setopt($ch, CURLOPT_NOBODY, 1);
// if auth is needed, do it here
if (!empty($user) && !empty($pw))
{
$headers = array('Authorization: Basic ' . base64_encode($user.':'.$pw));
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
}
$okay = curl_exec($ch);
curl_close($ch);
// get the output buffer
$head = ob_get_contents();
// clean the output buffer and return to previous
// buffer settings
ob_end_clean();
// gets you the numeric value from the Content-Length
// field in the http header
$regex = '/Content-Length:\s([0-9].+?)\s/';
$count = preg_match($regex, $head, $matches);
// if there was a Content-Length field, its value
// will now be in $matches[1]
if (isset($matches[1]))
{
$size = $matches[1];
}
else
{
$size = 'unknown';
}
return $size;
}Now that you have parsed the URL for the cartoon strip, store it in the database such that it can be later fetched as required.
//Only a unique user with restricted priveges is allowed to connect
$description = 'RSS 2.0 Feed for Dilbert cartoon strip';
$username="rss_user";
$password="password";
$database="mydatabse";
//Decide the maximum strips that you would like to display
$max_image_count = 10;
mysql_connect(localhost,$username,$password);
@mysql_select_db($database) or die( "Unable to select database $database");
// A fetch by client in other timezones before the URL is updated can result in empty result.
if (strlen($blogurl) > 1)
{
// Store into the database
$query="INSERT INTO mydatabase(image_date,image_r_date,image_url,image_size) VALUES ('$date_var','$date_r_var','$blogurl','$size');";
$result=mysql_query($query);
if(!$result)
{
if (mysql_errno() == 1062) {
// Do nothing in case of attempt of duplicate insert.
}
else
{
die('Error: Query for insert failed');
}
}
?>Now close the configuration.php file and edit another file which will perform main function of serving the feed.
Step3: Serving the feed with a valid XML
There are several substeps for this.
Step3-a: Define headers and render the channel elements.
header("Pragma: no-cache");
// simple config.php protection
require ('configuration.php'); Next output the channel elements such as title, description and lastBuildDate etc.
// Output the channel, title etc.
echo '',"\n\n";
echo '',"\n";
echo ' Dilbert Cartoon ',"\n";
echo ' $link',"\n";
echo '',$description,' ',"\n";
echo '', date(r), ' ',"\n";
echo 'en-us ',"\n\n"; Step 3-b Fetch the cached URLs
After connecting to the database, fetch last "n" (in this example,10) cartoon strips.
Note how the URLs are fetched in descending order (latest first) sorted by image_id field.
This ensures an efficient way of parsing the database. Since here, the query ends as soon as requsite count is reached.
//Connect and select the database
mysql_connect(localhost,$username,$password);
@mysql_select_db($database) or die( "Unable to select database $database");
// Fetch ($max_image_count-1) records in descending order.
$query="SELECT * FROM rss_cache ORDER BY image_id DESC LIMIT $max_image_count ;";
$result=mysql_query($query);
Step 3-c: Rendering item elements
Loop for those "n" times every and display formated XML for all the fetched records. This is the core of rendering the XML, where you render the elements of item.
while ($array_dump =mysql_fetch_array($result))
{
echo '- ',"\n";
echo '','Dilbert for ',$array_dump[image_date],' ',"\n\n";
echo ' http://',$array_dump[image_url],'',"\n";
echo ' <img src="http://',$array_dump[image_url], '" border="0" /> ',"\n";
echo ' ',"\n";
echo 'http://', $array_dump[image_url], ' ',"\n";
echo '',$array_dump[image_r_date],' ',"\n";
echo ' ',"\n";
}
Close all the tags, free the MySQL memory and close the Database
echo '',"\n";
echo '',"\n";
// Free the MySQL memory and close the DB.
mysql_free_result($result);
mysql_close();
?>
And voila!! you are ready with a PHP MYSQL feed collector.
You may want to validate the feed with a 3rd party feed validator such as Feed Validator.
This is essentially a generic feed renderer in nature (save for the regular expression matching which will specific to given URL from where the feed will be parsed), in that it can be suitably modified to render any feed as desired, cartoon strip or otherwise.
Did you find this useful? Do you think this can be made even better? The please do not hesitate to leave a comment.
I look forward to your inputs!!
cheers!
