Simple XML Parser in C# using XmlDocument

In this article we will look at how to read a websites Sitemap.xml with C# and parse it's contents using a simple XML Parser.

By Tim TrottC# ASP.Net MVC • September 14, 2009
Simple XML Parser in C# using XmlDocument

A look at how we can load an XML Sitemap into a XmlDocument object and use it to create an xml parser. This example will read the contents of the XML sitemap and show them on the screen but you can do anything with the data - create a crawler, store in a database, merge XML documents - the list is endless.

An XML Sitemap is a specially structured XML file which provides important structural information about a website to search engine crawlers for indexing purposes. The basic sitemap structure looks like this.

xml
<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">
  <url>
    <loc>https://lonewolfonline.net/</loc>
    <priority>1.0</priority>
    <lastmod>2010-09-14</lastmod>
    <changefreq>daily</changefreq>
  </url>
  <url>
    <loc>https://lonewolfonline.net/simple-xml-parser/</loc>
    <priority>0.5</priority>
    <lastmod>2009-09-14</lastmod>
    <changefreq>monthly</changefreq>
  </url>
</urlset>

Individual <url> tags are wrapped inside the containing <urlset> nodes. Each <url> represents a page on the site. Inside the <url> node, are four nodes.

The <loc> node represents the page URL.

The <priority> node represents the webmaster-defined site map priority.

The <lastmod> node represents the date on which the page was last modified.

The <changefreq> node indicates how often the page is updated and suggests to the search engine how often to crawl it again.

Code coder coding html xml web sourcecode
Simple XML Parser in C#

Writing a Simple XML Parser in C#

For this example, I am creating a small console application and outputting the results to the screen. I am also reading the sitemap from a file, but you can just as easily download files from a website instead.

You can also download a sample project from GitHub.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;

namespace SitemapXMLParser
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument urldoc = new XmlDocument();
            urldoc.Load("Sitemap.xml");

            XmlNodeList xnList = urldoc.GetElementsByTagName("url");

            foreach (XmlNode node in xnList)
            {
                Console.WriteLine("url " + node["loc"].InnerText);
                Console.WriteLine("priority " + node["priority"].InnerText);
                Console.WriteLine("last modified " + node["lastmod"].InnerText);
                Console.WriteLine("change frequency " + node["changefreq"].InnerText);
                Console.WriteLine(Environment.NewLine);
            }
        }
    }
}

Download from GitHub

About the Author

Tim Trott is a senior software engineer with over 20 years of experience in designing, building, and maintaining software systems across a range of industries. Passionate about clean code, scalable architecture, and continuous learning, he specialises in creating robust solutions that solve real-world problems. He is currently based in Edinburgh, where he develops innovative software and collaborates with teams around the globe.

Related ArticlesThese articles may also be of interest to you

CommentsShare your thoughts in the comments below

My website and its content are free to use without the clutter of adverts, popups, marketing messages or anything else like that. If you enjoyed reading this article, or it helped you in some way, all I ask in return is you leave a comment below or share this page with your friends. Thank you.

There are no comments yet. Why not get the discussion started?

New comments for this post are currently closed.