Creating Sitemaps in Django

A Sitemap is a simple list of links in a website which you want search engine to crawl and thus index. In addition to that they are also used to tell following information about the pages:

  1. How frequently the page changes.
  2. Last modification date of the page.
  3. Priority of the url in relation to other urls.

Types of Sitemap

Sitesmaps are of two types:

  1. HTML Sitesmaps.
  2. XML Sitesmaps.

HTML Sitesmaps

A HTML Sitemaps are designed for the users to help them navigate the site. We can easily crete a HTML Sitesmap by simply creating a list using <ol> or <ul> tag. For example:

<h2>The Great Django Blog Sitemap</h2>

<ul>
    <li><a href="http://example.com">Home</a></li>
    <li><a href="http://example.com/blog">Blog</a></li>
    <li><a href="http://example.com/contact">contact</a></li>
    <li><a href="http://example.com/careers">Careers</a></li>
    <li><a href="http://example.com/eula">EULA</a></li>
</ul>

Remember HTML siteamps are for human consumption, they are not meant for search engines. For that reason Google Webmaster Tool and others doesn't even allow you to submit a HTML sitemap.

XML sitemaps

XML sitemap is the most preferred way of creating sitemaps today. Webmaster tools provided by major search engines accepts XML sitemap. Here is an example of XML sitemap:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>http://www.example.com/home</loc>
    <lastmod>2017-05-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.9</priority>
  </url>
  <url>
    <loc>http://www.example.com/blog/</loc>
    <lastmod>2017-05-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.6</priority>
  </url>
  <url>
    <loc>http://www.example.com/contact/</loc>
    <lastmod>2017-05-10</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.6</priority>
  </url>
</urlset>

Django provides a sitemap framework (django.contrib.sitemaps) which automates the process of creating sitemaps.

Installing Sitemap Framework

To use sitemap framework you must first install it in your Django project. To install it follow these steps:

1) add 'django.contrib.sitemaps' to the INSTALLED_APPS list in setings.py file. Sitemap framework also uses Sites framework (django.contrib.sites) , as we have already added sites framework to the INSTALLED_APPS list in the previous chapter, we don't need to add it again.

2) In the TEMPLATES setting, make sure you have BACKEND and APP_DIRS set to 'django.template.backends.django.DjangoTemplates' and True respectively.

At this point INSTALLED_APPS and TEMPLATES settings should look like this:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',    
    'django.contrib.sites',
    'django.contrib.sitemaps',
    'blog',
    'cadmin',
]

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [ os.path.join(BASE_DIR, 'templates'), ],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages',
            ],
        },
    },
]

'django.contrib.sitemaps' framework does not require any additional table. So, this time you don't actually need to run the migrate command. We can verify this fact by running migrate command as follows:

(env) C:\Users\C\TGDB\django_project>python manage.py migrate
C:\Users\C\TGDB-V2\django_project
Operations to perform:
  Apply all migrations: admin, auth, blog, contenttypes, flatpages, sessions, si
tes
Running migrations:
  No migrations to apply.

(env) C:\Users\C\TGDB\django_project>

Notice the output "No migrations to apply.". This tells us that sitemap framework doesn't creates any additional tables.

We are now ready to create sitemaps.

Creating Sitemap Class

To create sitemaps we use Sitemap class from django.contrib.sitemaps. A Sitemap class represent a section of the entries in the sitemap. For example, one Sitemap class could represent all entries in the blog post, while another Sitemap class represent all the categories of blog posts and so on.

In our case,  We want sitemap to contain links to all blogs post and all flatpages. As a result, we will create two new sitemap classes namely PostSitemap and FlatPageSitemap which extends Sitemap class.

Create a new file named sitemaps.py in the blog app and add the following code to it.

from django.contrib.sitemaps import Sitemap
from .models import Post

class PostSitemap(Sitemap):    
    changefreq = "monthly"
    priority = 0.9

    def items(self):
        return Post.objects.all()

    def lastmod(self, obj):
        return obj.pub_date

Here is how it works:

In lines 1-2, we are importing Sitemap class and the Post model.

In lines 6-7, we are setting changefreq and priority attributes. The changefreq and priority are optional class attributes which indicates how frequently the pages changes and priority of the urls in relation to other urls respectively.

Other possible values for changefreq attribute are:

  • 'always'
  • 'hourly'
  • 'daily'
  • 'weekly'
  • 'monthly'
  • 'yearly'
  • 'never'

Similarly, priority attribute can only contain value from 0.0 to 1.0.

The changefreq and priority class attributes corresponds to <changefreq> and <priority> XML elements. In other words sitemap frameowork will use information from changefreq and priority to create <changefreq> and <priority> elements.

In lines 9-10, we are defining items() method. The items() is a special method whose job is to return a list of all objects whose urls we want to have in the sitemap. Post.objects.all() returns a list of all the post objects. By default, sitemap framework calls get_absolute_url() on each object to retrieve the URI for the posts.

In lines 16-17, we are defining optional lastmod() method. The job of the lastmod() method is to tell when the object (Post object in this case) was modified. The lastmod() method receives each Post object one by one and returns the last time Post object was modified. Notice that pub_date in obj.pub_date is coming from the Post model. We wouldn't be able to write this, If we had not defined pub_date field in the Post model. The lastmod() method corresponds to <lastmod> XML element.

Our sitemap class is ready. We just need to create a url pattern for it.

Sitemap framework (django.contrib.sitemaps) provides a view called sitemap() which facilitates the creation of sitemap from sitemap class. sitemap() view accepts a required argument called sitemaps which is a dictionary object mapping to short section labels to it's sitemap class.

Open urls.py and the following url pattern just above lousy_login.

from django.contrib.sitemaps.views import sitemap
from .sitemaps import PostSitemap
...

urlpatterns = [
    ...
    url(r'^sitemap\.xml/$', sitemap, {'sitemaps' : sitemaps } , name='sitemap'),
    ...
]

Now the only thing reamins is to define sitemaps variable. Just above the urlpatterns list define sitemaps variable as follows:

sitemaps = {
    'posts': PostSitemap
}

As already discussed, sitemaps is a dictionary which maps a short label (posts) to it's Sitemap class (PostSitemap).

Our Django project is ready to server sitemaps. Visit http://127.0.0.1:8000/sitemap.xml/ and you will see a page like this:

[]

Our sitemap is working as expected but notice that the host portion of URL contains example.com.
This domain is coming from the Django sites framework (django.contrib.sites). To Change it, login to Django admin by visiting http://127.0.0.1:8000/admin/, then navigate to http://127.0.0.1:8000/admin/sites/site/.

[]

[]

Click on the example.com to edit and change Domain name and Display name to 127.0.0.1:8000. Click save and revisit sitemap page (http://127.0.0.1:8000/sitemap.xml/) or hit refresh. At this point, your sitemap should be generating URLs using 127.0.0.1:8000 instead of example.com. You will need to update this settings once more at the time of deployment.