Migrating to Pelican from Drupal

What is Pelican?

Pelican is a Python-powered static website generator which comes with a rather decent feature set. It allows you to write your blog entries in reStructuredText, Markdown, or AsciiDoc using any editor that you desire.

It also supports:

  • Atom/RSS feeds.
  • Code syntax highlighting.
  • Integration with various services such as Disqus and Twitter.
  • Optional PDF generation of your posts.

It is also possible to extend its capabilities through the usage of plugins.

Installing Pelican and getting an initial website setup is as simple as following the Getting Started guide and answering the questions that come up when running pelican-quickstart.

Why did I switch?

I want to preface this by saying that I still really like Drupal. I have been running it on various websites for almost a decade now so I will not be completely abandoning by any means. I'll eventually find another project which is much more suited to what it can do than my relatively humble blog.

Why would I switch from a powerful and highly customizable content management system which I have used for years to a something that simply generates a static website? It is really interesting how things happen.

To be perfectly honest, switching to a different blogging system wasn't something that I had planned on doing at all initially as I was pretty content with how things were. I discovered Pelican by sheer chance a few days ago and it intrigued me.

The more I read the documentation and various posts about it online the more I liked it and wanted to try it out. In fact, I had originally decided to do this conversion from Drupal as a fun little side project to satisfy an itch. This itch of course branched into a myriad of reasons for why I wanted to give Pelican a real shot and let it run my actual website.

One reason is that it allows me to quickly create a post, format it properly, and automatically add it to a category on the website by simply moving or saving the file into a directory with the category's name.

Essentially, I found Pelican to be an elegant solution in my opinion as well as very simple to maintain and update.

Starting the Conversion

Pelican comes with an importer which allows you to import posts from Atom and RSS feeds. This was my ticket to export all my posts within Drupal quickly and efficiently.

I slightly modified the Atom feed module for Drupal, which normally has a cap of 100 posts, so that it would output all published nodes in the database.

The importer allows you to convert the feed into a few formats. I personally chose to go with Markdown because that is the format I plan on using for all my future posts and I like to have consistency.

pelican-import --feed -o ~/http/content -m markdown "http://moparx.com/atom.xml"

The conversion went relatively well, but there were a few things that I needed to fix which consisted of leftover HTML and a bunch of newlines in each file.

This is where sed comes into play and makes the problem extremely easy to fix.

# Removes all HTML   
sed -i 's/<[^>]\+>//g' *.md

# Removes all double newlines  
sed -i '/./,/^$/!d' *.md

I also decided that I wanted to switch my large image thumbnails to a smaller counterpart that was available so that they looked better on Pelican's default theme.

sed -i 's/styles\/large/styles\/medium/g' *.md

Finally, I began going through each post to add tags where relevant and to ensure that the slug would match the original post's URL. As I was going through the posts containing screenshots and my photos I discovered that images linked via a thumbnail were not converted properly.

After converting image links to the following format I was ready to go.

[![Alt Text](Thumbnail Image)](URL to large image)

Configuring Pelican

Pelican reads the configuration file each time it is run to regenerate the website.

My configuration file was created with Clean URLs in mind because I think it simply looks better and even more importantly the URLs would match the original posts locations.

I had really wanted to use the sitemap plugin as well, but I quickly discovered it simply does not work if the post's slug does not have a file extension appended to it.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
#!/usr/bin/env python
# -*- coding: utf-8 -*- #

AUTHOR = u'Moparx'
SITENAME = u'Moparx'
SITESUBTITLE = u'$ cat >> /dev/blog'
SITEURL = 'http://moparx.com'
TIMEZONE = 'America/New_York'

DEFAULT_LANG = u'en'
DEFAULT_PAGINATION = 10

MENUITEMS = (('Home', 'http://moparx.com'),
)

DEFAULT_CATEGORY = ('Articles')
FILES_TO_COPY = (('extra/.htaccess', '.htaccess'), ('extra/robots.txt', 'robots.txt'),)
STATIC_PATHS = ['css', 'images', 'js']

MD_EXTENSIONS = ['codehilite','extra']
MARKUP = ('rst', 'md')

THEME = 'themes/moparx'

#PLUGINS = ['pelican.plugins.sitemap',]


# ----------------------------------------------------------------------
# EXTERNAL LINKS
# ----------------------------------------------------------------------

# Recommended Sites
LINKS =  (('DuckDuckGo', 'https://duckduckgo.com/'),
          ('EFF.org', 'https://www.eff.org/'),
          ('Project Honeypot', 'https://www.projecthoneypot.org/'),)

# Social widget
SOCIAL = (('identi.ca', 'https://identi.ca/moparx'),
          ('libre.fm', 'http://libre.fm/user-profile.php?user=moparx'),
          ('reddit', 'http://reddit.com/user/Moparx/'),
          ('minecraft', 'http://mintyfreshcreepers.net'),)


# ----------------------------------------------------------------------
# URL PATHS
# ----------------------------------------------------------------------

# Configure Pelican to output Clean URLs
ARTICLE_URL = '{slug}'
AUTHOR_URL = 'author/{slug}'
CATEGORY_URL = 'category/{slug}'
PAGE_URL =  '{slug}'
TAG_URL = 'tags/{slug}'

# Adjust save location to match previous files.
PAGE_SAVE_AS = '{slug}.html'
TAG_SAVE_AS = 'tags/{slug}.html'

# Atom and RSS feeds
CATEGORY_FEED_ATOM = ''
FEED_ALL_ATOM = 'atom.xml'
FEED_ALL_RSS = 'rss.xml'
FEED_MAX_ITEMS = 15


# ----------------------------------------------------------------------
# PLUGINS
# ----------------------------------------------------------------------

SITEMAP = {
    'format': 'xml',
    'priorities': {
        'articles': 0.5,
        'indexes': 0.5,
        'pages': 0.5
    },
        'changefreqs': {
        'articles': 'monthly',
        'indexes': 'daily',
        'pages': 'monthly'
    }
}

Additionally, I am using a slightly modified version of the default theme as I removed the ability for it to automatically add all available categories to the main menu.

I'll probably be making my own theme soon anyway as there are possible licensing issues with Pelican's default theme from what I have been hearing.

Finalizing the setup

Since I modified all the slugs in the configuration file so that Pelican outputs Clean URLs, it was very important that I configured the server properly so that it knew what to do with them as well.

This is the .htaccess file I have created to ensure things run smoothly.

The important stuff is all after the aptly titled "Clean URLs" as it allows the Clean URLs to work as intended. If a visitor requests a post with .html extension attached it will strip the extension and redirect the visitor to the 'proper' URL.

# Don't show directory listings.
Options -Indexes

# Follow symbolic links.
Options +FollowSymLinks

# Disable the server signature.
ServerSignature Off

# Custom 404 error page.
ErrorDocument 404 /404.html

# Clean URLs
RewriteEngine on
RewriteBase /

# Remove any trailing slashes from URL    
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_URI} (.*)$
RewriteRule ^(.+)/$ http://moparx.com/$1 [R=301,L]

# Redirect index.html to /    
RewriteCond %{THE_REQUEST} ^[^/]*/index\.html [NC]
RewriteRule . / [R=301,L]

# Redirect pages with .html to the Clean URL 
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{THE_REQUEST} ^GET\ /[^?\s]+\.html
RewriteRule (.*)\.html$ /$1 [L,R=301]

RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html

# Redirect page queries to their new location    
RewriteCond %{QUERY_STRING} ^page=([0-9]+)$
RewriteRule ^(.*)$ index%1.html

I also created a robots.txt which tells web crawling bots that they are not allowed to access the author listing and the themes directory. This one is completely optional if you are pondering if you should use it or not.

User-agent: *
Disallow: /author/
Disallow: /theme/

Comments?

recommended sites

social