Digital Colony!

Creating an HttpHandler to Build a Search Engine Site Map

My previous post Build a Search Engine SiteMap in C# covered how to create a sitemap.xml file using the File System. It also provided guidelines on how to go about validating the sitemap as well as submitting it to the major search engines. If you came here for background on search engine sitemaps, go read that post first. If all you care about is the HttpHandler, you may proceed.

An Overview of the HttpHandler

Scenario: There is a page you wish to create that will be generated dynamically with code, but you want to use a file extension that isn't dynamic. Like XML. RSS feeds and a sitemap are two examples of xml files that would be ideal for an HttpHandler. There are other uses for HttpHandlers, but in this post we are only interested in creating a dynamic sitemap.

Here is a brief overview of how the HttpHandler will work. A request will be made for the sitemap.xml, probably by a search engine like Google. Instead of looking for it on the file system, your web application will intercept that request and pass it off to a class which which generate and deliver an XML document.

Step 1: Create SitemapHandler.cs

Inside the App_Code folder create SitemapHandler.cs. This class will implement the IHttpHandler interface. Before we deliver an XML document, let's create a simple HTML test to make sure the HttpHandler is working.
namespace HttpExtensions
{
    public class SitemapHandler : IHttpHandler
    {
        public SitemapHandler()
        { }

        #region IHttpHandler Members

        public bool IsReusable
        {
            get { return true; }
        }

        public void ProcessRequest(HttpContext context)
        {
            response = context.Response;
            response.Write("<html><body><h1>HTTP Handler is Working!</h1></body></html>");
        }
        
        #endregion
    }
}

Update the Web.Config

Add the following section inside system.web. The sitemap.aspx line is for debugging purposes only and will be removed once everything is working.
<httpHandlers>
<add verb="*" path="sitemap.xml" type="HttpExtensions.SitemapHandler"/>
<add verb="*" path="sitemap.aspx" type="HttpExtensions.SitemapHandler"/>
</httpHandlers>
From Visual Studio 2005, test the site. Now type in sitemap.xml in the path of the URL. You should see your HTTP Handler is Working! message. And if you type in sitemap.aspx, you should also see the message. As long as you view your site through Visual Studio 2005, you handler will work fine for both cases. Once you hand that job back to IIS, you'll need to do one more step.

Map *.XML to the aspnet_isapi.dll

If you run your code without doing this step, you will see your HTTP Handler is Working! message only on the sitemap.aspx request. Any request to sitemap.xml will return a 404 Page Not Found error. Instead of the HTTP Handler intercepting the request, IIS sees that the file extension is not a .NET file extension so it takes command. It looks on the file server and doesn't see a sitemap.xml and returns the 404.

The solution is map the *.xml file extension to the ASP.NET DLL (aspnet_isapi.dll). Once this is done and the server is restarted, the handler should work. For more information on how to do the IIS mapping go to Protecting Files with ASP.NET and scroll down to Protecting .mdb Files. Replace .xml for .mdb when following those directions.

Back to the SitemapHandler

Now that we have a working HttpHandler, we can go back and replace the HTTP Handler is Working! message with a real sitemap XML file. I've commented out the loop to add pages. Here is where you would add the code to pull the urls from some data store, be it a component or database.
public void ProcessRequest(HttpContext context)
{
   response = context.Response;
   response.ContentType = "text/xml";       
   using (TextWriter textWriter = new StreamWriter(response.OutputStream, System.Text.Encoding.UTF8))
   {
       XmlTextWriter writer = new XmlTextWriter(textWriter);
       writer.Formatting = Formatting.Indented;
       writer.WriteStartDocument();
       writer.WriteStartElement("urlset");
       writer.WriteAttributeString("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");
       writer.WriteAttributeString("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd");
       writer.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9");

       // Add Home Page
       writer.WriteStartElement("url");
       writer.WriteElementString("loc", "http://example.com");
       writer.WriteElementString("changefreq", "daily");
       writer.WriteEndElement(); // url

       // Add code Loop here for page nodes
       /*
       {
           writer.WriteStartElement("url");
           writer.WriteElementString("loc", url);
           writer.WriteElementString("changefreq", "monthly");
           writer.WriteEndElement(); // url
       }
       */
       writer.WriteEndElement(); // urlset
   }                      
}

Validate and Submit

Details on how to validate and submit your sitemap can be found on Build a Search Engine SiteMap in C#. Don't forget to remove the sitemap.aspx directive inside the web.config file. That was just for debugging.

A Word of Warning

After writing this and patting myself on the back for being so clever, I discovered that other XML files on my site were not displaying. They were throwing errors. Whereas IIS can natively display XML files, the ASP.NET DLL can't.

This leaves you with 2 possibilities. ONE: Use a different file extension such as .MAP instead of .XML. TWO: Write a second HTTP Handler to catch all other XML file requests. That handler would open the XML and stream it back to the browser with a contentType of "text/xml".
<add verb="*" path="sitemap.xml" type="HttpExtensions.SitemapHandler"/>
<add verb="*" path="*.xml" type="HttpExtensions.XMLHandler"/>
public void ProcessRequest(HttpContext context)
{
  HttpResponse response; 
  response = context.Response;
  string thisURL = context.Request.RawUrl.ToString();
  string thisXMLFile = HttpContext.Current.Server.MapPath(thisURL);
           
  StreamReader xmlStream = File.OpenText(thisXMLFile);
  string xmlOutput = xmlStream.ReadToEnd();
  response.ContentType = "text/xml";
  response.Write(xmlOutput);
}

Labels: , , , ,

 

Build a Search Engine SiteMap in C#

Sitemaps are XML files that web masters can create to let search engines know what what pages to index and how frequently to check for changes on each page. The XML format of the sitemap file is detailed on sitemaps.org. Here is a sample of a sitemap with a single url.
<?xml version="1.0" encoding="utf-8"?>
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 
http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd" 
xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://example.com</loc>
    <changefreq>daily</changefreq>
  </url>
</urlset>
My sample differs from the one on sitemaps.org. The more defined namespace in my example will validate on Google, Yahoo! and with Ask.com. At the time of this writing, theirs doesn't. The loc and changefreq are required, priority and lastmod are optional.

My advice with the search engines is treat them like a passport agent. Say only what is required and nothing else or you could find yourself sent to the end of the line. Once you've determined what URLs will be on the sitemap, the only decision is defining the changefreq of each page. One strategy might be to set the home page to daily, section pages to weekly and content pages to monthly. If your home page has stock tickers or sports scores, you could set the changefreq to always or hourly.

Google prefers the sitemap file to be in the root folder and every example on their site names the file sitemap.xml. What Google wants, Google gets.

Additional Namespaces

using System.IO;
using System.Xml;

Sample Code

You will need write access to the sitemap.xml file. Since you don't want to give your entire root folder write access, my advice is to create a dummy sitemap.xml, place it into the root folder and then set write access to that file. Adding a try...catch to the code below will alert you if that write access is not there.
string SiteMapFile = @"~/sitemap.xml";
string xmlFile = Server.MapPath(SiteMapFile);

XmlTextWriter writer = new XmlTextWriter(xmlFile, System.Text.Encoding.UTF8);
writer.Formatting = Formatting.Indented;
writer.WriteStartDocument();
writer.WriteStartElement("urlset");
writer.WriteAttributeString("xmlns:xsi", "http://www.w3.org/2001/XMLSchema-instance");
writer.WriteAttributeString("xsi:schemaLocation", "http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd");
writer.WriteAttributeString("xmlns", "http://www.sitemaps.org/schemas/sitemap/0.9");

// Add Home Page
writer.WriteStartElement("url");
writer.WriteElementString("loc", "http://example.com");
writer.WriteElementString("changefreq", "daily");
writer.WriteEndElement(); // url

// Add Sections and Articles
SqlConnection con = new SqlConnection(connectionString);
string sql = @"SELECT url, 'weekly' as changefreq FROM Section 
UNION SELECT url, 'monthly' as changefreq FROM Articles ";
SqlCommand cmd = new SqlCommand(sql, con);
cmd.CommandType = CommandType.Text;

try
{
    con.Open();
    SqlDataReader reader = cmd.ExecuteReader();
    while (reader.Read())
    {
        string loc = "http://example.com" + reader["URL"].ToString();
        string changefreq = reader["changefreq"].ToString();
        writer.WriteStartElement("url");
        writer.WriteElementString("loc", loc);
        writer.WriteElementString("changefreq", changefreq);
        writer.WriteEndElement(); // url
    }
    reader.Close();
}
catch (SqlException err)
{
    throw new ApplicationException("Data Error (Sections):" + err.Message);
}
finally
{
    con.Close();
}

writer.WriteEndElement();// urlset        
writer.Close();

Validate Your Sitemap

Once you've confirmed you have a good looking sitemap.xml file in the root folder of your web site and it contains all the pages you want indexed by the search engines, it is now time to validate it. XML-Sitemaps.com has a sitemap validator that you can test out your new sitemap. Once it's fine, move to the next step.

Update Your robots.txt File

Add the location of your sitemap in your robots.txt file.

Sitemap: http://example.com/sitemap.xml

Google Webmaster

Google has a suite of tools for managing your relationship between your web sites and them. They call this suite Google Webmaster Central. It is here that you will register your web sites with validation files. Once they've established you are the webmaster of your site, they will present you will a screen to submit your sitemap. You will need a Google Account for this process. If you don't have one, follow the link to Create a Google Account.

Yahoo! Site Explorer

Yahoo! has a similar setup which is called Yahoo! Site Explorer. Using your Yahoo! ID, you will go through the same process of registering your web sites. And once that process has been completed, you can then submit your sitemap.xml file. Don't have a Yahoo ID? Get one.

Ask.com

Ask.com doesn't require any accounts or site validation. Just ping their server with the location of your sitemap.xml file modeled after the URL example below. Of course replace example.com with your domain name.

http://submissions.ask.com/ping?sitemap=http%3A//example.com/sitemap.xml

Monitoring the Sitemap Crawl

Both the Yahoo! Site Explorer and the Google Webmaster Tools have reports that provide updated status on the success and failure of the sitemap crawl. Ask.com to my knowledge doesn't have any such tools. And I couldn't locate a sitemap submission tool at all for MSN.com.

Using an HTTP Handler

This example uses the File System. Another option is to use an HTTP Handler to deliver the sitemap.xml.

Final Word

A friend of mine with a low traffic site saw his page views double after adding a sitemap. If one page of code can potentially double your page views, then it is worth pursuing.

Labels: , , , , ,

 

ASP Photo Gallery v2

This article was written in 2001. It has a good example of how Classic ASP can use an XML file as a data source. However, I wouldn't use it today to create a photo gallery. There are much better options out there. With that said, this code is no longer supported.

In the article Creating an ASP Photo Gallery, I showed you how to create a very basic photo gallery using just a few lines of ASP code. It was quick and dirty, however, it had limitations.

The first limitation was that all the photos had to be the same size. You could remove the width and height tags from the image, but that is clumsy coding and browsers always render non-sized images funky. The second problem was that we had to rename the images to a number and then maintain a sequence. If you have 5 images that isn't a problem, if you have 100 it becomes tedious. And the last thing missing was the ability to quickly add titles, descriptions, and alt tags for each image.

Choosing a Weapon

OK, we need to store and maintain data about the picture. We have 3 options. Option 1 - We build data arrays on the actual ASP page and populate it there. This is not a clean design. Ideally we want to keep the data and code separated in order to support code reuse. Option 2 - Create a database. Building a database in Access or MySQL is easy enough, but it's overkill for a single photo gallery. Do we really want to setup connection strings and deal with versioning and licensing issues for such a small dataset? No. Option 3 - XML. With XML we can keep the data apart from the code and we don't need a database.

The XML Image File

The XML image file will capture a sequence number, the file name, image width, image height, short caption, long description, and an image ALT tag.
<?xml version="1.0"?> 
<images> 
  <image num="1" file="me.jpg" width="400" height="302" short="Me" long="Me at  Graceland." alt="photo"/> 
</images>

What Can We Automate?

I don't know about you, but I'm not going to fire up an editor and hack out the XML image file. The code should generate most of the body of the file with the exception of the short, long, and alt descriptions. We can use the FileSystemObject to detect images. From this point, we can use the Microsoft.XMLDOM object to build the XML for the photo gallery. Once that is created, wouldn't it be nice to go through the gallery and assign those values in a form? As for the image width and height, we can use client-side javscript to calculate those values the first time that the gallery is viewed.

Security

Since we don't want other users modifying our image captions, we will assign a password to gallery. The password is passed in the querystring under the parameter k and will open the Admin FORM. Example: gallery.asp?k=password&pic=1. This will also give you the power to make text changes from any browser.

Other Admin Features

Besides saving image attributes, the Admin form will allow you the ability to remove an image. The image is removed from the XML file, not the server. And should you upload additional images or accidently remove an image you need, there is a Detect Images buton which will append an image file found in the image folder that doesn't appear on the XML image file.

Setting Up The Photo Gallery Page

My first goal was to make this code as easy as possible for a non-techie to use. The best way to pull that off is to hide all the gritty file detection and XML manipulation in a separate file. All coding logic is kept in an include file called galleryCode.asp. The 2nd file is the ASP file used to house the image gallery. At the top of this file we need to define the following:

key - This is password you'll use to access the gallery as an administrator. galleryURL - The URL to the photo gallery from the root (the virtual path). galleryASP - The name of the ASP page of the actual photo gallery. galleryImageURL - The URL (virtual path) to the photo gallery images. xmlName - The name of the XML file. xmlPath - This is the full file path of the XML file you will be using. This is the only tricky line. Some web hosts don't give read/write permission on directories accessible via the web server. This variable allows you to define a path to a folder that will permit writing and modifying files. Without such a folder this code won't work, so ask your web host for a read/write folder should this code fail.
<% 
' EXAMPLE PHOTO GALLERY SETUP 
key = "secret" 
' page password to access ADMIN 
galleryURL = "/pix/springbreak/" 
galleryASP = "default.asp" 
'this page 
galleryImageURL = "/pix/springbreak/images/" 
xmlName = "springbreak.xml" 
xmlPath = "c:\readwritefolder\" 
%>

Building the Photo Gallery

In addition to your choice of colors and layout, your photo gallery will make simple calls to functions in the galleryCode.asp file. These include:

backLink, nextLink - This will draw the HREF link to the previous and next image. The user can use a text or image link after the call. Be sure to close out the link with a </a>.

thisShort, thisLong - thisShort returns the assigned short title and thisLong returns the long description.

drawImage(border) - Make a call to this subroutine where you want the image placed. The parameter is the pixel width of the border.

drawSequence(perLine) - This returns a sequential list of all the images linked to a number. This allows the user to bypass using the Next and Previous navigation and jump directly to a particular image. The perLine parameter restricts the number of links per line.

drawAdmin - This call is where you want the Admin form to be placed on the page.

Overview

1 - Define filePaths and password on the top few lines of the gallery page.

2 - Upload gallery ASP page and gallerycode.asp file.

3 - Upload images to server. The directory should be unique to that photo gallery.

4 - View the page.

5 - If this the first time the page has been viewed, the images XML file will be created.

6 - Place the password in the querystring and then a FORM will display for each image.

7 - Update the title, description, and alt tag. Go the the next image; repeat this sequence until you've finished. NOTE: Verbiage is optional. However, you'll still want to go through each image so the image height and width are saved to the XML file.

Download

Download Photos2.zip (includes gallery.asp and galleryCode.asp) if you wish to run the ASP Photo Gallery.

Labels: , , ,

 

COM Informant for Classic ASP

I currently do business with 4 different web hosts. Each of them has a different subset of 3rd-party ASP components installed on their server. Sometimes they are open with which components are installed, sometimes they aren't. Whenever I wanted to test to see if a particular COM object was available, I'd write a quick script. The script would try to create the object via the Server.CreateObject method and then I'd go to the page to see if it returned an error code. No error code meant it was installed and I could start coding my application around that knowledge.

Tedious and Repetitive

After about the 10th script I wrote, it hit me that there probably is a better way to do this. What was needed was a script that tested the most common ASP components and allowed the user to quickly add new ones to the list. Having a bunch of test scripts lying on the file server wasn't optimal. And the last thing you would want is to hard-code all the test cases inside your ASP code. My solution was to have a single page handle the creation, modification, and display of component test cases. The data source would be a single XML file.

Sample comlist.xml file

<?xml version="1.0"?> 
<comlist> 
<com company="Microsoft" name="CDONTS" id="CDONTS.NewMail"/> 
<com company="Persit Software Inc." name="ASPUpload" id="Persits.Upload"/> 
</comlist>

Developers Tool

COM Informant is handy tool to have if your development team creates custom components and then deploys them across multiple servers. What better way to test if a component is installed than viewing a single web page. The top portion of the tool allows the user to add any component name to test list.

screen shot

The Download

COM Informant For Classic ASP consists of sniff.asp, comlist.xml, com.css, and blue.gif. This tool will be ready to run after download. You need to be running IIS in order to download and run COM Informant. Also, if you want to add or delete components from the list, then the comlist.xml file must reside in a directory that has read/write permission. You can modify the location of that file within the first 3 lines of code. The code performs some XML file manipulation, which is outside the scope of this article.

This article was first written in 2001

Labels: , ,

 

Displaying a SmugMug Gallery with ASP.NET

Back in the day I used to host all my own image galleries on my site. It's a tedious process and you can quickly use up your allocated disk space with today's multi-mega pixel cameras. Fortunately we have companies like SmugMug and Flickr that will host, manage and back-up all our images.

The problem with not hosting photo galleries is you send your audience away from your site over to their server. And your photo galleries develop their own audience which knows nothing about the parent site.

I discovered that using the XmlDataSource and DataList ASP.NET controls you can build a photo gallery on your site while the images stay over on SmugMug using a simple RSS feed.

The XmlDataSource

ASP.NET 2.0 introduced the asp:XmlDataSource control which we will use to connect to the SmugMug RSS feed. The DataFile parameter is the path to the RSS file for that photo gallery. In the snippet below, I hard-coded that value. In the example and lab that value is populated in the code behind. Also important is the XPath parameter. This is the address inside the XML Document that holds the information about each photo.
<asp:XmlDataSource ID="xmlDS" runat="server" XPath="rss/channel/item" 
    DataFile= "http://www.smugmug.com/hack/feed.mg?Type=gallery&Data=1838622&format=rss200" />

The RSS Feed (XML Document)

Here is a snippet of how a single photo is represented inside the XML Document. I've removed the portion which deals with the gallery name, as it is not used in this example. In this example the 2 values that are used when rendering the gallery inside the DataList will be link and guid.
<item>
    <title>Image Title</title>
    <link>http://criticalmas.smugmug.com/gallery/1838622/1/92083732</link>
    <description>Image description</description>
    <category>Vacation</category>
    <comments>http://criticalmas.smugmug.com/comment.mg...</comments>
    <exif:DateTimeOriginal>2006-08-24 19:07:19</exif:DateTimeOriginal>
    <pubDate>Thu, 31 Aug 2006 19:02:05 -0700</pubDate>
    <author>feeds-nobody@smugmug.com (criticalmas)</author>
    <guid isPermaLink="true">http://criticalmas.smugmug.com/photos/92083732-Th.jpg</guid>
    <enclosure url="http://criticalmas.smugmug.com/photos/92083732-Th.jpg" length="7741" type="image/jpeg"/>
</item>

The DataList Control

For this gallery, I set the RepeatColumns to 6 and just displayed the thumbnail image with a link to the full-sized image back on the SmugMug web site.
<asp:DataList ID="dlPhotos" runat="server" RepeatColumns="6">
<ItemTemplate>
  <a href="<%# XPath("link").ToString() %>">
     <asp:Image ID="img" ImageUrl='<%# XPath("guid") %>' runat="server"  /></a>
</ItemTemplate>
</asp:DataList>

Sample ASPX Using a Gallery Dropdown

The GalleryID is pulled from the URL of the photo gallery over on SmugMug. This is covered in detail in the sample lab.
<asp:XmlDataSource ID="xmlDS" runat="server" XPath="rss/channel/item" />
<h4>Select Gallery</h4>
<asp:DropDownList ID="ddlGallery" runat="server" AutoPostBack="true">
    <asp:ListItem Text="1838622: Uruguay - Colonia" Value="1838622" />
    <asp:ListItem Text="2473610: Lower Hellhole Canyon Desert Hike" Value="2473610" />
    <asp:ListItem Text="1887671: New Zealand - Whakarewarewa Thermal Village" Value="1887671" />
</asp:DropDownList>
<br /><br />
<asp:DataList ID="dlPhotos" runat="server" RepeatColumns="6">
<ItemTemplate>
    <a href="<%# XPath("link").ToString() %>">
        <asp:Image ID="img" ImageUrl='<%# XPath("guid") %>' runat="server" /></a>
</ItemTemplate>
</asp:DataList>
<asp:Label ID="lblError" Visible="false" runat="server" />

And the Code Behind (C#)

protected void Page_Load(object sender, EventArgs e)
{
   string rssURL;
   string galleryID;
   galleryID = ddlGallery.SelectedValue.ToString();
   rssURL = "http://www.smugmug.com/hack/feed.mg?Type=gallery&Data=" + galleryID + "&format=rss200";

   try
   {
       xmlDS.DataFile = rssURL;
       dlPhotos.DataSource = xmlDS;
       dlPhotos.DataBind();
   }
   catch (XmlException err)
   {
       lblError.Text = "Oops, looks like an error occured with this gallery: " + err.Message;
       lblError.Visible = true;            
   }   
}

Lab Demo

SmugMug Image Gallery in ASP.NET

The gallery uses a simple 6 column layout. Once you have the RSS feed, you can be as creative as you like in designing the layout for your photo gallery.

Other Gallery options

This demo is for the standard galleries. SmugMug has other Feed options described here.

My SmugMug referral code is: IzodUqeQndZYc
It will save you $5 on any new account.

Labels: , , , , , ,

 

Digital Colony Copyright © 1999-2008 XHTML   508
This site uses Blogger, which is not 100% XHTML compliant.
Try...Catch Disclaimer: For brevity many examples do not include error handling. That is your responsibility.