Simple XML Parser in C# using XmlDocumentIn this article we will look at how to read a websites Sitemap.xml with C# and parse it's contents using a simple XML Parser.
An XML Sitemap is a specially structured XML file which provides important structural information about a website to search engine crawlers for indexing purposes. The basic sitemap structure looks like this.
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
<url>
<loc>https://lonewolfonline.net/</loc>
<priority>1.0</priority>
<lastmod>2010-09-14</lastmod>
<changefreq>daily</changefreq>
</url>
<url>
<loc>https://lonewolfonline.net/simple-xml-parser/</loc>
<priority>0.5</priority>
<lastmod>2009-09-14</lastmod>
<changefreq>monthly</changefreq>
</url>
</urlset>
Individual <url>
tags are wrapped inside the containing <urlset>
nodes. Each <url>
represents a page on the site. Inside the <url>
node, are four nodes.
The <loc>
node represents the page URL.
The <priority>
node represents the webmaster-defined site map priority.
The <lastmod>
node represents the date on which the page was last modified.
The <changefreq>
node indicates how often the page is updated and suggests to the search engine how often to crawl it again.
Writing a Simple XML Parser in C#
For this example, I am creating a small console application and outputting the results to the screen. I am also reading the sitemap from a file, but you can just as easily download files from a website instead.
You can also download a sample project from GitHub.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;
namespace SitemapXMLParser
{
class Program
{
static void Main(string[] args)
{
XmlDocument urldoc = new XmlDocument();
urldoc.Load("Sitemap.xml");
XmlNodeList xnList = urldoc.GetElementsByTagName("url");
foreach (XmlNode node in xnList)
{
Console.WriteLine("url " + node["loc"].InnerText);
Console.WriteLine("priority " + node["priority"].InnerText);
Console.WriteLine("last modified " + node["lastmod"].InnerText);
Console.WriteLine("change frequency " + node["changefreq"].InnerText);
Console.WriteLine(Environment.NewLine);
}
}
}
}