Website and notetaking system
While I was traveling in Taiwan in March 2020 as the world was descending into Covid-19 chaos, I had some time to ponder improvements to my blogging and notetaking processes. At the time, my website was happily powered by the mostly static generator Movable Type, and my notetaking was strewn across text files, Google Docs and physical paper. I wanted to try to merge them as much as possible, so it seemed to make sense to do both in the same place: text files on my computer. I resisted moving away from MT for a long time, because I love its super intuitive templating system and ease of logging in and blogging from anywhere. MT has served me well for almost 20 years (wow!), and if this site had to support more than one author, I wouldn’t want to change.
So, I thought it would be worth seeing out what a simpler text file based system could look like. I ended up mostly reimplementing my theme as Jinja templates in Pelican before I got home. Sadly, life events and the pandemic got in the way, and it took me almost 3 years to sit back down and finish the rough edges. In that time I basically haven’t blogged anyway, though I did start keeping a simple bullet-like journal in Google Docs to at least keep organized.
That all said, what follows is the brainstorming document I used for this process, which turned into implementation notes and a checklist for Pelican. It’s probably not useful or interesting to anyone else, but it might be worth reviewing if you feel you are on a similar journey.
Note organization concerns:
- Dotfiles contain notes in:
docs
that are combination of:- general notes I could pull out.
- notes that are specific to my configuration and make sense to be alongside the dotfiles.
notes
that are line items for my Rofi quick lookup hack solution.- Old notes Git repository is probably the start of “public” notes.
- Need space for nonpublic notes, probably what I’m using Vimwiki for now.
- Would like public blog automatically published from a subset of the Vimwiki.
Probably this means Vimwiki contains a directory that is public, or there’s a way to tag pages such that they become published. Possibly the docs section of my dotfiles are hardlinked into this structure or duplicated (yuck).
Content types
I’m trying to decide a structure for my content. My existing blog has URLs like
YYYY/MM/basename.html
. I could have a similar file structure for my markdown
files for things I would have published as a blog post.
For notes and other non-date-based content, I could treat them like “pages” on my website with a different URL structure simply based on the page name and possibly an arbitrary category.
There seems to be a grey area where I would want date created and modified on the pages, but not have the date in the URL. But over many years, this might become a mess. So maybe I just want everything under the same date based URL scheme, like everything is a blog entry, and the date corresponds to date created.
Relatedly, on my existing blog, for the regular blog entries, I have some categories where I tried to capture different classes of content: healthlog, reeflog (regarding my aquarium), notes (on videos or books), records, projects, essays, conversations (chat logs), reports, schoolwork, personal and writing.
Possible structure:
YYYY/MM/basename
(blogs)notes/books/basename
notes/videos/basename
notes/articles/basename
notes/topics/basename
(this seems awkward)
Yeah I’m not sure I like separating out “notes”, at least the “one time” notes
on a book or talk or article seem like they could go in the date based structure
just fine. The thing I’m struggling with is where I’d put long running notes
such as “Python”, etc. Those could still go into the date based structure, and
I could use fuzzy find to locate them when I need, but it seems strange for my
Python notes to be located at something like /2009/11/python
and be updated
for decades to come. (Though now that I have been editing this very blog entry
for almost 3 years, I may just need to give up trying to make this perfect!)
Rethinking this, my content might actually be:
- diaries
- documentation (errors, emails, interesting things to remember) possibly with commentary or solutions
- one time notes on some fixed piece of content
- long running notes on a topic based on many sources
It feels like some content is “wiki-like”, which would not have a date based URL, and other content is “publication-like” (could be journal or an article that’s not intended to be revised much). The latter makes more sense to have a posting date, and thus a date based URL. The date a wiki page for some topic first was created is possibly interesting but I would not expect that date to affect the URL to the page.
There’s also the consideration that my local file structure need not match the URL structure. Possibly I want to organize my files into the non-date based hierarchy but still have the URLs be date based. Hmm. Maybe I’ll start by organizing my filesystem until it feels right, and worry about URLs later. I don’t even have a publishing flow set up yet, anyway.
Platforms research
10 best static site generators | Creative Bloq
01. Jekyll: Ruby, Liquid templating engine, Sass, earliest and most popular.
04. Hugo: Go, up and coming most popular, faster.
07. Pelican: Python, Jinja2 templating.
What are people using for their personal blogs? | Lobsters
- zimbatm/wiki: Personal vimwiki + GitHub Pages
- Pelican because it has a good design and staticsite because it does not force you to follow a rigid directory layout.
- I write in Markdown and use pandoc to convert it to HTML. I made a couple of scripts (“panpipe” and “panhandle”) which allow code embedded in the source to be executed during rendering. I generally write about programming stuff, so this is useful to ensure that code snippets actually work (execute them during rendering and abort if they’re broken) and that example output is accurate (generate the example output from the actual code during rendering).
Trying out Pelican
mkdir -p ~/GDrive/Main/site
cd ~/GDrive/Main/site
ln -s ../Notes content
virtualenv venv
source venv/bin/activate
pip install --upgrade pip
python -m pip install git+https://github.com/getpelican/pelican.git@master
pip install pelican[Markdown] typogrify beautifulsoup4 pillow
# Error with advthumbnailer, see https://github.com/AlexJF/pelican-advthumbnailer/pull/9
# pip install pelican[Markdown] typogrify beautifulsoup4 pillow pelican-advthumbnailer
# testing following for foursquare
pip install pelican-data-files
wget https://github.com/fle/pelican-simplegrey/archive/257e30c7e0091df2198b2c778754cd6f23112068.zip
# unzip pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068.zip
unzip 257e30c7e0091df2198b2c778754cd6f23112068
mv pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068 pelican-simplegrey
pelican-themes --install pelican-simplegrey
# summary plugin
git clone --recursive https://github.com/getpelican/pelican-plugins
echo "PLUGIN_PATHS = ['pelican-plugins']\nPLUGINS = ['summary']" >> pelicanconf.py
When resuming testing:
(cd ~/GDrive/Main/site && source venv/bin/activate && rm -rf "output/*" && pelican --listen --ignore-cache --autoreload --debug)
To get entries from the server to local:
cd ~/GDrive/Main/site
rsync -azi charlie:www/blog/pelican/ content/blog
# DONE: rsync -azi charlie:www/issues/pelican/ content/pages/issues
# DONE: rsync -azi charlie:www/pelican/ content/pages
OK now I’m going to set up a more permanent development environment following
https://docs.getpelican.com/en/latest/contribute.html . I had been putting my
virtual environment folders inside the source folders for other small projects,
but I’ll go with their suggested location in ~/virtualenvs
for now since I’ll
be dealing with multiple repositories for plugins and Pelican itself.
git clone git@github.com:CNG/pelican.git ~/projects/pelican
cd ~/projects/pelican
git remote add upstream https://github.com/getpelican/pelican.git
mkdir ~/virtualenvs && cd ~/virtualenvs
python3 -m venv pelican
source ~/virtualenvs/pelican/*/activate
python -m pip install --upgrade pip
python -m pip install invoke
cd ~/projects/website/pelican
rm poetry.lock
invoke setup
# If you forget to cd to `pelican` you'll get: Can't find any collection named 'tasks'!
# `invoke setup` seems to install `pelican`, I assumed from PyPI or something
# due to the instructions at
# https://docs.getpelican.com/en/latest/contribute.html#setting-up-the-development-environment
# saying to run the following command next... but testing if this is necessary
# because I suspect something is getting screwed up later when I Pip install two
# addons.
python -m pip install -e ~/projects/website/pelican
# Later after development:
# invoke tests
# invoke lint
# invoke docserve # Then localhost:8000
git clone git@github.com:CNG/pelican-plugins.git ~/projects/pelican-plugins
cd ~/projects/pelican-plugins
# Forgot to --recursive, so:
git submodule update --init --recursive
git remote add upstream https://github.com/getpelican/pelican-plugins.git
# Running through this again 2022-11-29, my `make html` started failing after
running these two commands:
python -m pip install pillow
python -m pip install pelican-data-files pelican-advthumbnailer
# Ran once, which also sets path in $VIRTUAL_ENV/.project
# pelican-quickstart --path ~/projects/pelican-site
For now I’m going with these folders inside my existing Projects
folder:
pelican-site
: Pelican configuration, my theme, and the output folder, which I may move.pelican-plugins
: Git clone so I can edit plugins and such.pelican
: Git cloned source of Pelican itself.pelican-content
: The original Markdown files and images.
So the new command for resuming testing:
(cd ~/projects/pelican-site && source ~/virtualenvs/pelican/*/activate && make clean && make html)
Update 2023-06-08
I really need to invest enough time to get this system solidified so it doesn’t break each time I try to write a blog post! Now I came back and wanted to write about our new dog Ruffie and struggled to get the site working based on what I wrote above. I was getting output like this:
$ pelican --version
Traceback (most recent call last):
File "/home/cgorichanaz/virtualenvs/pelican/bin/pelican", line 5, in <module>
from pelican.__main__ import main
File "/mnt/docs/GDrive/Main/Projects/website/pelican/pelican/__main__.py", line 5, in <module>
from . import main
ImportError: cannot import name 'main' from 'pelican' (unknown location)
The problem turned out to be that pelican
worked right after running invoke setup
the first time, but then stopped working after I ran python -m pip install -e ~/projects/website/pelican
. Reading pypa/setuptools/issues/3548 clued me into a fix. I added to my pelican-site/Makefile
this:
export PYTHONPATH = /home/cgorichanaz/projects/website/pelican
Having PYTHONPATH
pointing at my local Pelican source directory fixed the
issue. I also have PELICAN?=/home/cgorichanaz/virtualenvs/pelican/bin/pelican
in there, so I don’t need to activate the virtualenv manually either. My future
testing command should be:
(cd ~/projects/pelican-site && make clean && make html)
# OR
cd ~/projects/pelican-site && make devserver
Update 2025-03-05
I’ve been publishing a bunch of posts last year as recently as September, which was 5 months ago now. How does time fly! But anyway, trying to make a post now and my script is giving me:
Traceback (most recent call last):
File "/home/cgorichanaz/virtualenvs/pelican/bin/pelican", line 5, in <module>
from pelican.__main__ import main
File "/home/cgorichanaz/projects/website/pelican/pelican/__init__.py", line 19, in <module>
from pelican.log import console
File "/home/cgorichanaz/projects/website/pelican/pelican/log.py", line 4, in <module>
from rich.console import Console
ModuleNotFoundError: No module named 'rich'
make: *** [Makefile:72: publish] Error 1
I did a bit of poking around and found, when I activated the virtualenv, my reported Python version was 3.13.2, despite the folders in the virtualenv having names like lib/python3.12
and the contents of pyvenv.cfg
being like:
version = 3.12.3
executable = /usr/bin/python3.12
I also saw I did not have that file in /usr/bin
anymore. So I manually renamed the two folders in lib
and include
from 3.12
to 3.13
and in that config file. It seemed to work after that… I should go through and try updating my Pelican to latest one of these days though. Looks like since my current 4.8.0 to the latest 4.11.0 they changed the build system from Poetry to something called PDM.
Code blocks
There are two ways to specify the identifier:
:::python
print("The triple-colon syntax will *not* show line numbers.")
To display line numbers, use a path-less shebang instead of colons:
#!python
print("The path-less shebang syntax *will* show line numbers.")
Need to check if I can use the backticks style code blocks with Pelican or make
whatever Vim plugin I have formatting my *.md
files learn about space indented
code blocks. Because right now in this file everything is italic after a code
block that has an asterisk. Edit: Pelican seems to support both ways.
Implementation issues
Note that even when using cached content, all output is always written, so the modification times of the generated
*.html
files will always change. Therefore,rsync
-based uploading may benefit from the--checksum
option.
Summaries
I’m using the summary plugin to allow for inserting a delimiter into the entry body marking where the summary should end so I don’t have to repeat content in the metadata. My existing website places an anchor at the end of the summary in the full entry page (and sometimes an ad), but there’s no way by default in Pelican to output the full entry minus the summary.
I couldn’t get a basic replace
function to work, and also tried implementing a regex_replace
custom filter to no avail. That’s when I looked closer at the Summary plugin implementation and found it was actually mutating the summary in summary.py#L81, and I’m not sure why it’s needed:
summary = str(BeautifulSoup(summary, 'html.parser'))
This results, for example, in a
changing to a space and one of my image tags getting attributes rearranged, changing from:
<img alt="FCNG6780.jpg" height="413" src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" title="Saturday Sunset" width="620"/>
to:
<img src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" width="620" height="413" alt="FCNG6780.jpg" title="Saturday Sunset">
Maybe I need another custom filter to run the whole entry through BeautifulSoup… hmm. Though this would be better implemented by extending the summary plugin to provide a article.more
or something analogous to MovableType’s EntryMore
tag.
OK for now I patched the summary plugin. TODO: Look into upstreaming / better solution:
summary = content[begin_summary:end_summary]
pre = content[:begin_summary]
more = content[end_summary:]
if remove_markers:
# remove the markers from the content
if begin_summary:
pre = pre.replace(begin_marker, '', 1)
content = content.replace(begin_marker, '', 1)
if end_summary:
more = more.replace(end_marker, '', 1)
content = content.replace(end_marker, '', 1)
summary = str(BeautifulSoup(summary, 'html.parser'))
pre = str(BeautifulSoup(pre, 'html.parser'))
more = str(BeautifulSoup(more, 'html.parser'))
instance._content = content
# default_status was added to Pelican Content objects after 3.7.1.
# Its use here is strictly to decide on how to set the summary.
# There's probably a better way to do this but I couldn't find it.
if hasattr(instance, 'default_status'):
instance.metadata['summary'] = summary
else:
instance._summary = summary
instance.has_summary = True
instance.pre = pre
instance.more = more
Dates
OK, I realize now Movable Type actually has three entry dates:
EntryDate
: the “publish” timestamp manually set in the entry editing screen, which defaults to the time the entry editing page was first opened by hitting Create Entry.EntryModifiedDate
: the last save date. not user editable.EntryCreatedDate
: the timestamp the entry was first saved. not user editable.
The minimal implementation in Pelican would have modified date detected by the mtime
from the filesystem. There’s no support for a “created date” based on the filesystem, which is not consistently implemented in filesystems anyway. So, I think I’ll need to specify all three types as metadata in the top of my file. If there’s no “Modified”, then we can fall back to mtime
. If there’s no “Created”, we can fall back to “Date”?
Wow, it’s even worse than I thought. I’ll definitely need to specify all three dates manually because I just saved this file after a small update, and now apparently my filesystem only has the current date associated with the file instead of the 2020-03 date I started working on this (“Birth” updated to match “Modify”):
$ stat Notes/blog/2020/03/Website\ and\ notetaking\ system.md
File: Notes/blog/2020/03/Website and notetaking system.md
Size: 12822 Blocks: 32 IO Block: 4096 regular file
Device: 9,127 Inode: 166789121 Links: 1
Access: (0644/-rw-r--r--) Uid: ( 1000/cgorichanaz) Gid: ( 1000/cgorichanaz)
Access: 2022-01-30 16:26:34.819133202 -0800
Modify: 2022-01-30 16:24:33.785325347 -0800
Change: 2022-01-30 16:24:33.791992040 -0800
Birth: 2022-01-30 16:24:33.785325347 -0800
It would be great if I could have a source control based “modified” date, and better still if I could show the list of modification dates and what changed, like a wiki.
Wow, and now I realize by looking over Writing
content there is apparently
no created
metadata item in Pelican. I started using it by accident because
I was implementing my Movable Type templates, and my code in my
pelican-site/votecharlie/templates/_blog_meta.html
referencing
article.created
works just fine because I added Created: ...
metadata to my
entries when exporting from Movable Type. I guess I’ll keep it for now. It seems
you might only need the post date and the modified date, but sometimes I liked
to indicate generally when I wrote a post about a past event. Sometimes I would
write my journals about a trip months or a year afterward, once I got around to
sorting the photos. Having a modified date could help, but if I later modified
the entry years later, I would lose that “I wrote this about a year after the
event in question” notion.
Categories
Vanilla Pelican does X.
I’m using the more-categories plugin to support multiple and nested categories:
Category: foo/bar/bazz, bing, bell
TODO
Blockers
- Unpublished / draft blog entries
- Private tags with
@
not captured bymt:EntryTags
Nonblockers
- Implement Pelican Search instead of the Google Custom Search box that’s full of ads.
- Remove all the Google ads, they weren’t worth dealing with.
- Move the signup off each entry to a static page. Let’s trim down…
- Consider removing the Amazon referral links, are they even working?
- Review https://docs.getpelican.com/en/latest/content.html#ref-linking-to-internal-content
- Review https://docs.getpelican.com/en/latest/settings.html#reading-only-modified-content
- Feeds
- Some pages have many URL permutations? Like
/category/adventures/argentina
and/category/adventures/argentina/index.html
.- I think I fixed this with the
CATEGORY_SAVE_AS
or similar.
- I think I fixed this with the
- Subcategories: On MT, entries did not appear on category listings for parent categories to the actually selected categories, whereas in Pelican they do (see “Adventures”). Relatedly, the categories listing sidebar does not show parent categories as links even when there are entries for them.
- Foursquare integration. Look into this maybe? https://github.com/getpelican/pelican/pull/2922
- Implement thumbnailer for use with Foursquare images, etc.
- Using advthumbnailer.
templates/_blog_age_notice.html
templates/_base.html
is a mess with regard to meta tag images. I can’t believe how complicated my Movable Type template logic was there. I probably should move images to a custom metadata instead of trying to intelligently extract it…`- Some entries have
<pre>
that should be converted to normal code block, like/blog/2017/06/mint-batch-transaction-import-hack.html
. Plus one of the images has HTML inalt
and is getting messed up. - Will need to migrate to YouTube, I think. Or pay for Vimeo forever even if I don’t upload many videos. See https://ymcinema.com/2019/04/03/vimeo-is-deleting-your-videos-when-you-switch-to-basic-account/ ; I already have videos that are “hidden” and maybe have deleted ones, too.