Django SEO: A Sitemap Tutorial For Humans
How to generate a sitemap in Django
Table of Contents
-
Introduction
-
Step 1: Install Django’s Sitemaps Framework
-
Step 2: Define Your Models
-
Step 3: Update Your Data Models
-
Step 4: Create Sitemap Classes
-
Step 5: Update URLs with Sitemaps
-
Step 6: Inspect Your Sitemap.xml
-
Step 7: Submit Your Site to Google Search Console
Introduction: So You Want to SEO Optimize Your Django Website?
This post will show you how to correctly use Python’s excellent Django web development framework to build a sitemap. A sitemap is an XML file. In the context of web development, a sitemap acts as a blueprint or an index of all the pages on a website. A sitemap is very important for improved search engine optimization (SEO) because:
- Facilitates Crawling: Sitemaps make it easier for search engines to crawl your website. By providing a clear structure of all your pages, search engines can discover and index your content more efficiently.
- Content Discovery: New or updated content is highlighted in sitemaps, which can lead to faster indexing by search engines.
- User Experience: For large websites, a sitemap can act as a navigation aid, helping users find the content they’re looking for more easily.
- Content Organization: Sitemaps help in organizing content by categories or themes, making the website more user-friendly.
- Deep Linking: Websites with deep architecture might have pages that are not easily discovered by crawling alone. Sitemaps ensure these pages are visible to search engines.
- Comprehensive Indexing: They help in getting a more comprehensive indexing of the website, as all pages are listed in one place.
In short, if you want your Django website to have a chance at being found via search engines, then you need a sitemap. Because Django is awesome, and “batteries included”, it comes with a sitemap framework which allows you to generate your sitemap entirely on the backend (no javascript required). However, in a rare miss for the Django docs, understanding how exactly the sitemap framework should be used is hard to grasp. It took me some time, so I wrote it up.
Note that this tutorial assumes you understand the basics of Django. If you don’t, the documentation is excellent, and ChatGPT has loads of Django examples - another reason why it’s a great choice (just make sure you check the Django version in the code it gives you).
Step 1: Install the Sitemap Framework
Create your Django project and make sure you have the sites
(django.contrib.sites
) and sitemaps
(django.contrib.sitemaps
) apps in your installed apps list (these are built-in):
You should then set the: SITE_ID
field in your project settings.py. If you just
have one site, the value will be SITE_ID = 1
You’ll also want to set your settings.py DOMAIN
field, and include that domain in your ALLOWED_HOSTS
:
DOMAIN = 'example.com'
SITE_NAME = 'Example Site'
ALLOWED_HOSTS = ['localhost', '127.0.0.1', 'example.com'] # tweak this for production
Step 2: Create Your Initial Routes & Views
For the purposes of this tutorial, we’ll use the example of a podcast summary web app, which has pages for:
- The Django admin
- A list of podcasts
- A list of episodes for that podcast
- An episode detail page
Here’s what our urls.py
might look like, assuming your Python Django app is called “core”:
from django.contrib import admin
from django.urls import path
from core.views import episode_list_view, episode_detail_view, podcast_list
urlpatterns = [
path('admin/', admin.site.urls),
path('', podcast_list, name='podcast-list'),
path('<int:podcast_id>/<slug:podcast_name>-insights', episode_list_view, name='episode-list'),
path('<int:podcast_id>/<slug:podcast_name>/<int:episode_number>/<slug:episode_guest>-<slug:episode_company>-insights/', episode_detail_view, name='episode-detail'),
And here are the corresponding views (we’ll look at the models shown here next):
from core.models import Podcast, Episode
from django.shortcuts import get_object_or_404, render
def podcast_list(request):
podcasts = Podcast.objects.all()
genre_query = request.GET.get('genre', '')
if genre_query:
podcasts = podcasts.filter(genre=genre_query)
context = {
'podcasts': podcasts,
}
return render(request, "home.html", context)
def episode_list_view(request, podcast_id: int, podcast_name: str):
podcast = get_object_or_404(Podcast, pk=podcast_id)
# For large numbers, you would use Django's Paginator class
episodes_list = Episode.objects.filter(podcast=podcast).order_by('-episode_number')
first_episode = episodes_list.first() # Get the first episode, or None if the list is empty
context = {
'episodes': episodes_list,
'podcast': first_episode.podcast if first_episode else None,
}
return render(request, 'episodes.html', context)
def episode_detail_view(request, podcast_id: int, podcast_name: str, episode_number: int, episode_guest: str, episode_company: str):
# Find the podcast by its name (slugified name)
podcast = get_object_or_404(Podcast, pk=podcast_id)
# Find the episode by its episode number under the found podcast
episode = get_object_or_404(Episode, podcast=podcast, episode_number=episode_number)
return render(request, 'episode_detail.html', {'episode': episode})
Step 3: Update Your Data Models
Assuming we’re using the Django ORM, we’ll create the following data models:
models.py
from django.db import models
from django.urls import reverse
from django.utils.text import slugify
class PodcastGenre(models.TextChoices):
INDIE_HACKING = 'indie_hacking', 'Indie Hacking'
AI = 'ai', 'AI'
class Podcast(models.Model):
name = models.CharField(max_length=255, unique=True)
description = models.TextField(null=True) # Description can be null
hosts = models.TextField()
genre = models.CharField(max_length=80, choices=PodcastGenre.choices, default=PodcastGenre.INDIE_HACKING)
def get_absolute_url(self):
# Ensure this value is URL-friendly
podcast_name = slugify(self.name)
return reverse('episode-list', kwargs={'podcast_id': self.id, 'podcast_name': podcast_name})
def __str__(self):
return self.name
Another model:
from django.db import models
from django.urls import reverse
from django.utils.text import slugify
class Episode(models.Model):
podcast = models.ForeignKey(Podcast, on_delete=models.CASCADE, related_name='episodes')
title = models.CharField(max_length=255)
episode_number = models.IntegerField()
date_published = models.DateTimeField()
summary = models.TextField()
def get_absolute_url(self):
# Ensure this value is URL-friendly
podcast_name = slugify(self.podcast.name)
return reverse('episode-detail', kwargs={
'podcast_id': self.podcast.id,
'podcast_name': podcast_name,
'episode_number': self.episode_number,
})
class Transcript(models.Model):
episode = models.OneToOneField(Episode, on_delete=models.CASCADE)
content = models.TextField()
Looking at these models, why does the Transcript
model not need a get_absolute_url
Python method?
The get_absolute_url method in Django provides a way to get the absolute URL to an object. This method is particularly useful when you want to refer directly to a specific instance of a model from templates or views, especially in situations where the URL structure relies on the details of the model instance, such as its ID, slug, or other unique attributes.
In our example, the Episode
model is a central part of the web application, with individual pages dedicated to each episode. These pages
would display detailed information about the episode, such as a summary, a transcript, and related resources.
By implementing get_absolute_url
in the Episode model, you enable a straightforward way to generate links to these detail pages.
This Python method dynamically creates URLs based on the episode’s attributes, ensuring that links remain consistent and correct throughout the
application, even if the URL patterns change.
The method utilizes the reverse function to generate URLs, which is based
on the URL configuration’s name (‘episode-detail’) and expected keyword arguments. This approach decouples the model’s logic from the
URL configuration, making the codebase more maintainable and flexible.
On the other hand, the Transcript
model is designed with a one-to-one relationship to the Episode model, implying that each episode has
exactly one transcript associated with it. The primary purpose of the Transcript model is to store the textual content of an episode’s transcript.
Since the transcript is not the primary entity users navigate to but rather a component of an episode’s detail, it does not necessitate a
standalone page or URL. Users would access the transcript as part of the episode’s detail view, not through a separate URL dedicated to the
transcript alone.
In summary, if the data model is used in a dynamic url then it needs a
get_absolute_url
Python method for the sitemap
Step 4: Create Sitemap Classes
The most basic form of Django sitemap is the GenericSitemap
,
but I’ve found that most web apps with dynamic endpoints quickly require the use of a custom sitemap class, which must inherit from Django’s Sitemap
class.
Here’s an example:
from django.contrib.sitemaps import Sitemap
from django.urls import reverse
from transcriber.models import PodcastGenre
class PodcastListSitemap(Sitemap):
changefreq = "weekly"
priority = 0.7
def items(self):
# Return a list of genres to create a URL for each genre filter
return PodcastGenre.choices
def location(self, item):
genre = item[0] # item is a tuple (value, verbose_name)
return reverse('podcast-list') + f'?genre={genre}'
class EpisodeSitemap(Sitemap):
changefreq = "daily"
priority = 0.6
def items(self):
# Assuming you have a method to fetch episodes
return Episode.objects.all()
def lastmod(self, obj):
# Assuming you have a date field for last modification
return obj.date_published
# useful example: https://github.com/jclgoodwin/bustimes.org/blob/main/busstops/urls.py#L18
# In your main urls.py
sitemaps = {
'podcasts': PodcastListSitemap,
'episodes': EpisodeSitemap,
}
These sitemap classes are defined to help search engines efficiently index the site’s content by providing structured data about the pages (URLs) available on the site. Each class corresponds to a different part of the website: one for a list of podcasts filtered by genre and another for individual episodes. Here’s an explanation of why each Python method is needed:
changefreq
andpriority
: Indicate how often the content at these URLs is expected to change (weekly) and their importance relative to other URLs on the site (0.7), guiding search engine crawler frequency and priority.items
: Returns a list of genres (PodcastGenre.choices), which the sitemap uses to generate a URL for each genre filter. This method is essential to automate the creation of the sitemap entries based on the available podcast genres.location
: Customizes the URL for each genre returned by items. By appending a query parameter for the genre to the base URL, it specifies how to construct the URL for each sitemap entry. This method is necessary to ensure that the sitemap reflects the correct URLs for filtering the podcast list by genre.lastmod
: Method provides the last modification date for each episode object, based on thedate_published
field. This information helps search engines understand the freshness of the content, impacting how often they might re-crawl the page.
Finally, we create a sitemaps
dictionary which we will import into our urls.py
file to register these sitemap classes with the application.
This registration tells Django to generate sitemap XML files for these classes, which can then be submitted to search engines to improve the
site’s SEO by ensuring all relevant content is easily discoverable and correctly indexed.
Step 5: Update URLs with sitemaps
Here’s what our updated urls.py
looks like now:
from django.contrib.sitemaps.views import sitemap
from django.contrib import admin
from django.urls import path
from core.views import episode_list_view, episode_detail_view, podcast_list, sitemaps
urlpatterns = [
path('admin/', admin.site.urls),
path('', podcast_list, name='podcast-list'),
path('<int:podcast_id>/<slug:podcast_name>-insights', episode_list_view, name='episode-list'),
path('<int:podcast_id>/<slug:podcast_name>/<int:episode_number>/<slug:episode_guest>-<slug:episode_company>-insights/', episode_detail_view, name='episode-detail'),
path('sitemap.xml', sitemap, {'sitemaps': sitemaps}, name='django.contrib.sitemaps.views.sitemap'), # added
]
If we had static pages (like “about”, “contact” etc.) then we could use a StaticViewSitemap
like so:
class StaticViewSitemap(Sitemap):
changefreq = "daily"
priority = 0.7
def items(self):
# List of static URL names
return ['about', 'contact'] # Example: 'about' corresponds to the name of your about page URL
def location(self, item):
return reverse(item)
And we’d then add this to our sitemaps
dictionary.
Step 6: Inspect Your Sitemap.xml
Now that everything is setup, run your Django server and visit your localhost /sitemaps.xml
route. You should see
all your dynamically generated application urls displayed like so:
Congrats - your sitemap is now being generated automatically by Django. Don’t forget to update your Sitemap
classes
if you add more dynamic routes.
Step 7: Submit Your Site to Google
Finally, deploy your site and then submit it via the Google Search console.
Your Django site now has a much better chance of ranking - although obviously the SEO game is a complex one. Good luck.
p.s. Email me if you’d like a similar tutorial for robots.txt generation