Simple XML Parser in C# using XmlDocumentIn this article we will look at how to read a websites Sitemap.xml with C# and parse it's contents using a simple XML Parser.

A look at how we can load an XML Sitemap into a XmlDocument object and use it to create an xml parser. This example will read the contents of the XML sitemap and show them on the screen but you can do anything with the data - create a crawler, store in a database, merge XML documents - the list is endless.
An XML Sitemap is a specially structured XML file which provides important structural information about a website to search engine crawlers for indexing purposes. The basic sitemap structure looks like this.
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://lonewolfonline.net/</loc>
<priority>1.0</priority>
<lastmod>2010-09-14</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://lonewolfonline.net/simple-xml-parser/</loc>
<priority>0.5</priority>
<lastmod>2009-09-14</lastmod>
<changefreq>monthly</changefreq>
</url>
</urlset>
Individual <url>
tags are wrapped inside the containing <urlset>
nodes. Each <url>
represents a page on the site. Inside the <url>
node, are four nodes.
The <loc>
node represents the page URL.
The <priority>
node represents the webmaster-defined site map priority.
The <lastmod>
node represents the date on which the page was last modified.
The <changefreq>
node indicates how often the page is updated and suggests to the search engine how often to crawl it again.

Writing a Simple XML Parser in C#
For this example, I am creating a small console application and outputting the results to the screen. I am also reading the sitemap from a file, but you can just as easily download files from a website instead.
You can also download a sample project from GitHub.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
namespace SitemapXMLParser
{
class Program
{
static void Main(string[] args)
{
XmlDocument urldoc = new XmlDocument();
urldoc.Load("Sitemap.xml");
XmlNodeList xnList = urldoc.GetElementsByTagName("url");
foreach (XmlNode node in xnList)
{
Console.WriteLine("url " + node["loc"].InnerText);
Console.WriteLine("priority " + node["priority"].InnerText);
Console.WriteLine("last modified " + node["lastmod"].InnerText);
Console.WriteLine("change frequency " + node["changefreq"].InnerText);
Console.WriteLine(Environment.NewLine);
}
}
}
}