The course, "Building AI Applications with Open-Source" is now available

Django SEO: A Sitemap Tutorial For Humans

How to generate a sitemap in Django
Created: 22 February 2024
Last updated: 22 February 2024

Table of Contents

Introduction: So You Want to SEO Optimize Your Django Website?

distribution meme

This post will show you how to correctly use Python’s excellent Django web development framework to build a sitemap. A sitemap is an XML file. In the context of web development, a sitemap acts as a blueprint or an index of all the pages on a website. A sitemap is very important for improved search engine optimization (SEO) because:

  • Facilitates Crawling: Sitemaps make it easier for search engines to crawl your website. By providing a clear structure of all your pages, search engines can discover and index your content more efficiently.
  • Content Discovery: New or updated content is highlighted in sitemaps, which can lead to faster indexing by search engines.
  • User Experience: For large websites, a sitemap can act as a navigation aid, helping users find the content they’re looking for more easily.
  • Content Organization: Sitemaps help in organizing content by categories or themes, making the website more user-friendly.
  • Deep Linking: Websites with deep architecture might have pages that are not easily discovered by crawling alone. Sitemaps ensure these pages are visible to search engines.
  • Comprehensive Indexing: They help in getting a more comprehensive indexing of the website, as all pages are listed in one place.

In short, if you want your Django website to have a chance at being found via search engines, then you need a sitemap. Because Django is awesome, and “batteries included”, it comes with a sitemap framework which allows you to generate your sitemap entirely on the backend (no javascript required). However, in a rare miss for the Django docs, understanding how exactly the sitemap framework should be used is hard to grasp. It took me some time, so I wrote it up.

Note that this tutorial assumes you understand the basics of Django. If you don’t, the documentation is excellent, and ChatGPT has loads of Django examples - another reason why it’s a great choice (just make sure you check the Django version in the code it gives you).

Step 1: Install the Sitemap Framework

Create your Django project and make sure you have the sites (django.contrib.sites) and sitemaps (django.contrib.sitemaps) apps in your installed apps list (these are built-in): Install Django apps

You should then set the: SITE_ID field in your project settings.py. If you just have one site, the value will be SITE_ID = 1

You’ll also want to set your settings.py DOMAIN field, and include that domain in your ALLOWED_HOSTS:


DOMAIN = 'example.com'
SITE_NAME = 'Example Site'

ALLOWED_HOSTS = ['localhost', '127.0.0.1', 'example.com']  # tweak this for production

Step 2: Create Your Initial Routes & Views

For the purposes of this tutorial, we’ll use the example of a podcast summary web app, which has pages for:

  • The Django admin
  • A list of podcasts
  • A list of episodes for that podcast
  • An episode detail page

Here’s what our urls.py might look like, assuming your Python Django app is called “core”:

from django.contrib import admin
from django.urls import path

from core.views import episode_list_view, episode_detail_view, podcast_list

urlpatterns = [
    path('admin/', admin.site.urls),
    path('', podcast_list, name='podcast-list'),
    path('<int:podcast_id>/<slug:podcast_name>-insights', episode_list_view, name='episode-list'),
    path('<int:podcast_id>/<slug:podcast_name>/<int:episode_number>/<slug:episode_guest>-<slug:episode_company>-insights/', episode_detail_view, name='episode-detail'),

And here are the corresponding views (we’ll look at the models shown here next):

from core.models import Podcast, Episode

from django.shortcuts import get_object_or_404, render

def podcast_list(request):
    podcasts = Podcast.objects.all()
    genre_query = request.GET.get('genre', '')
    if genre_query:
        podcasts = podcasts.filter(genre=genre_query)

    context = {
        'podcasts': podcasts,
    }

    return render(request, "home.html", context)

def episode_list_view(request, podcast_id: int, podcast_name: str):
    podcast = get_object_or_404(Podcast, pk=podcast_id)
    
    # For large numbers, you would use Django's Paginator class
    episodes_list = Episode.objects.filter(podcast=podcast).order_by('-episode_number')
    first_episode = episodes_list.first()  # Get the first episode, or None if the list is empty

    context = {
        'episodes': episodes_list,
        'podcast': first_episode.podcast if first_episode else None,
    }

    return render(request, 'episodes.html', context)


def episode_detail_view(request, podcast_id: int, podcast_name: str, episode_number: int, episode_guest: str, episode_company: str):
    # Find the podcast by its name (slugified name)
    podcast = get_object_or_404(Podcast, pk=podcast_id)

    # Find the episode by its episode number under the found podcast
    episode = get_object_or_404(Episode, podcast=podcast, episode_number=episode_number)

    return render(request, 'episode_detail.html', {'episode': episode})

Step 3: Update Your Data Models

Assuming we’re using the Django ORM, we’ll create the following data models:

models.py

from django.db import models
from django.urls import reverse
from django.utils.text import slugify


class PodcastGenre(models.TextChoices):
    INDIE_HACKING = 'indie_hacking', 'Indie Hacking'
    AI = 'ai', 'AI'

class Podcast(models.Model):
    name = models.CharField(max_length=255, unique=True)
    description = models.TextField(null=True)  # Description can be null
    hosts = models.TextField()
    genre = models.CharField(max_length=80, choices=PodcastGenre.choices, default=PodcastGenre.INDIE_HACKING)

    def get_absolute_url(self):
        # Ensure this value is URL-friendly
        podcast_name = slugify(self.name)
        return reverse('episode-list', kwargs={'podcast_id': self.id, 'podcast_name': podcast_name})

    def __str__(self):
        return self.name

Another model:

from django.db import models
from django.urls import reverse
from django.utils.text import slugify


class Episode(models.Model):
    podcast = models.ForeignKey(Podcast, on_delete=models.CASCADE, related_name='episodes')
    title = models.CharField(max_length=255)
    episode_number = models.IntegerField()
    date_published = models.DateTimeField()
    summary = models.TextField()

    def get_absolute_url(self):
        # Ensure this value is URL-friendly
        podcast_name = slugify(self.podcast.name)

        return reverse('episode-detail', kwargs={
            'podcast_id': self.podcast.id,
            'podcast_name': podcast_name,
            'episode_number': self.episode_number,
        })

    
class Transcript(models.Model):
    episode = models.OneToOneField(Episode, on_delete=models.CASCADE)
    content = models.TextField()

Looking at these models, why does the Transcript model not need a get_absolute_url Python method?

The get_absolute_url method in Django provides a way to get the absolute URL to an object. This method is particularly useful when you want to refer directly to a specific instance of a model from templates or views, especially in situations where the URL structure relies on the details of the model instance, such as its ID, slug, or other unique attributes.

In our example, the Episode model is a central part of the web application, with individual pages dedicated to each episode. These pages would display detailed information about the episode, such as a summary, a transcript, and related resources.

By implementing get_absolute_url in the Episode model, you enable a straightforward way to generate links to these detail pages. This Python method dynamically creates URLs based on the episode’s attributes, ensuring that links remain consistent and correct throughout the application, even if the URL patterns change. The method utilizes the reverse function to generate URLs, which is based on the URL configuration’s name (‘episode-detail’) and expected keyword arguments. This approach decouples the model’s logic from the URL configuration, making the codebase more maintainable and flexible.

On the other hand, the Transcript model is designed with a one-to-one relationship to the Episode model, implying that each episode has exactly one transcript associated with it. The primary purpose of the Transcript model is to store the textual content of an episode’s transcript. Since the transcript is not the primary entity users navigate to but rather a component of an episode’s detail, it does not necessitate a standalone page or URL. Users would access the transcript as part of the episode’s detail view, not through a separate URL dedicated to the transcript alone.

In summary, if the data model is used in a dynamic url then it needs a get_absolute_url Python method for the sitemap

Step 4: Create Sitemap Classes

The most basic form of Django sitemap is the GenericSitemap, but I’ve found that most web apps with dynamic endpoints quickly require the use of a custom sitemap class, which must inherit from Django’s Sitemap class.

Here’s an example:

from django.contrib.sitemaps import Sitemap
from django.urls import reverse
from transcriber.models import PodcastGenre

class PodcastListSitemap(Sitemap):
    changefreq = "weekly"
    priority = 0.7

    def items(self):
        # Return a list of genres to create a URL for each genre filter
        return PodcastGenre.choices

    def location(self, item):
        genre = item[0]  # item is a tuple (value, verbose_name)
        return reverse('podcast-list') + f'?genre={genre}'

    
class EpisodeSitemap(Sitemap):
    changefreq = "daily"
    priority = 0.6

    def items(self):
        # Assuming you have a method to fetch episodes
        return Episode.objects.all()

    def lastmod(self, obj):
        # Assuming you have a date field for last modification
        return obj.date_published


# useful example: https://github.com/jclgoodwin/bustimes.org/blob/main/busstops/urls.py#L18
# In your main urls.py
sitemaps = {
    'podcasts': PodcastListSitemap,
    'episodes': EpisodeSitemap,
}

These sitemap classes are defined to help search engines efficiently index the site’s content by providing structured data about the pages (URLs) available on the site. Each class corresponds to a different part of the website: one for a list of podcasts filtered by genre and another for individual episodes. Here’s an explanation of why each Python method is needed:

  • changefreq and priority: Indicate how often the content at these URLs is expected to change (weekly) and their importance relative to other URLs on the site (0.7), guiding search engine crawler frequency and priority.
  • items: Returns a list of genres (PodcastGenre.choices), which the sitemap uses to generate a URL for each genre filter. This method is essential to automate the creation of the sitemap entries based on the available podcast genres.
  • location: Customizes the URL for each genre returned by items. By appending a query parameter for the genre to the base URL, it specifies how to construct the URL for each sitemap entry. This method is necessary to ensure that the sitemap reflects the correct URLs for filtering the podcast list by genre.
  • lastmod: Method provides the last modification date for each episode object, based on the date_published field. This information helps search engines understand the freshness of the content, impacting how often they might re-crawl the page.

Finally, we create a sitemaps dictionary which we will import into our urls.py file to register these sitemap classes with the application. This registration tells Django to generate sitemap XML files for these classes, which can then be submitted to search engines to improve the site’s SEO by ensuring all relevant content is easily discoverable and correctly indexed.

Step 5: Update URLs with sitemaps

Here’s what our updated urls.py looks like now:

from django.contrib.sitemaps.views import sitemap
from django.contrib import admin
from django.urls import path

from core.views import episode_list_view, episode_detail_view, podcast_list, sitemaps


urlpatterns = [
    path('admin/', admin.site.urls),
    path('', podcast_list, name='podcast-list'),
    path('<int:podcast_id>/<slug:podcast_name>-insights', episode_list_view, name='episode-list'),
    path('<int:podcast_id>/<slug:podcast_name>/<int:episode_number>/<slug:episode_guest>-<slug:episode_company>-insights/', episode_detail_view, name='episode-detail'),
    path('sitemap.xml', sitemap, {'sitemaps': sitemaps}, name='django.contrib.sitemaps.views.sitemap'),  # added
]

If we had static pages (like “about”, “contact” etc.) then we could use a StaticViewSitemap like so:

class StaticViewSitemap(Sitemap):
    changefreq = "daily"
    priority = 0.7

    def items(self):
        # List of static URL names
        return ['about', 'contact']  # Example: 'about' corresponds to the name of your about page URL

    def location(self, item):
        return reverse(item)

And we’d then add this to our sitemaps dictionary.

Step 6: Inspect Your Sitemap.xml

Now that everything is setup, run your Django server and visit your localhost /sitemaps.xml route. You should see all your dynamically generated application urls displayed like so:

sitemap example

Congrats - your sitemap is now being generated automatically by Django. Don’t forget to update your Sitemap classes if you add more dynamic routes.

Step 7: Submit Your Site to Google

Finally, deploy your site and then submit it via the Google Search console.

Your Django site now has a much better chance of ranking - although obviously the SEO game is a complex one. Good luck.

p.s. Email me if you’d like a similar tutorial for robots.txt generation

Category

Tags