Vote Charlie!

Website and notetaking system

Posted at age 31.
Edited .

While I was traveling in Taiwan in March 2020 as the world was descending into Covid-19 chaos, I had some time to ponder improvements to my blogging and notetaking processes. At the time, my website was happily powered by the mostly static generator Movable Type, and my notetaking was strewn across text files, Google Docs and physical paper. I wanted to try to merge them as much as possible, so it seemed to make sense to do both in the same place: text files on my computer. I resisted moving away from MT for a long time, because I love its super intuitive templating system and ease of logging in and blogging from anywhere. MT has served me well for almost 20 years (wow!), and if this site had to support more than one author, I wouldn’t want to change.

So, I thought it would be worth seeing out what a simpler text file based system could look like. I ended up mostly reimplementing my theme as Jinja templates in Pelican before I got home. Sadly, life events and the pandemic got in the way, and it took me almost 3 years to sit back down and finish the rough edges. In that time I basically haven’t blogged anyway, though I did start keeping a simple bullet-like journal in Google Docs to at least keep organized.

That all said, what follows is the brainstorming document I used for this process, which turned into implementation notes and a checklist for Pelican. It’s probably not useful or interesting to anyone else, but it might be worth reviewing if you feel you are on a similar journey.

Note organization concerns:

  • Dotfiles contain notes in:
  • docs that are combination of:
    • general notes I could pull out.
    • notes that are specific to my configuration and make sense to be alongside the dotfiles.
  • notes that are line items for my Rofi quick lookup hack solution.
  • Old notes Git repository is probably the start of “public” notes.
  • Need space for nonpublic notes, probably what I’m using Vimwiki for now.
  • Would like public blog automatically published from a subset of the Vimwiki.

Probably this means Vimwiki contains a directory that is public, or there’s a way to tag pages such that they become published. Possibly the docs section of my dotfiles are hardlinked into this structure or duplicated (yuck).

Content types

I’m trying to decide a structure for my content. My existing blog has URLs like YYYY/MM/basename.html. I could have a similar file structure for my markdown files for things I would have published as a blog post.

For notes and other non-date-based content, I could treat them like “pages” on my website with a different URL structure simply based on the page name and possibly an arbitrary category.

There seems to be a grey area where I would want date created and modified on the pages, but not have the date in the URL. But over many years, this might become a mess. So maybe I just want everything under the same date based URL scheme, like everything is a blog entry, and the date corresponds to date created.

Relatedly, on my existing blog, for the regular blog entries, I have some categories where I tried to capture different classes of content: healthlog, reeflog (regarding my aquarium), notes (on videos or books), records, projects, essays, conversations (chat logs), reports, schoolwork, personal and writing.

Possible structure:

  • YYYY/MM/basename (blogs)
  • notes/books/basename
  • notes/videos/basename
  • notes/articles/basename
  • notes/topics/basename (this seems awkward)

Yeah I’m not sure I like separating out “notes”, at least the “one time” notes on a book or talk or article seem like they could go in the date based structure just fine. The thing I’m struggling with is where I’d put long running notes such as “Python”, etc. Those could still go into the date based structure, and I could use fuzzy find to locate them when I need, but it seems strange for my Python notes to be located at something like /2009/11/python and be updated for decades to come. (Though now that I have been editing this very blog entry for almost 3 years, I may just need to give up trying to make this perfect!)

Rethinking this, my content might actually be:

  • diaries
  • documentation (errors, emails, interesting things to remember) possibly with commentary or solutions
  • one time notes on some fixed piece of content
  • long running notes on a topic based on many sources

It feels like some content is “wiki-like”, which would not have a date based URL, and other content is “publication-like” (could be journal or an article that’s not intended to be revised much). The latter makes more sense to have a posting date, and thus a date based URL. The date a wiki page for some topic first was created is possibly interesting but I would not expect that date to affect the URL to the page.

There’s also the consideration that my local file structure need not match the URL structure. Possibly I want to organize my files into the non-date based hierarchy but still have the URLs be date based. Hmm. Maybe I’ll start by organizing my filesystem until it feels right, and worry about URLs later. I don’t even have a publishing flow set up yet, anyway.

Platforms research

10 best static site generators | Creative Bloq

01. Jekyll: Ruby, Liquid templating engine, Sass, earliest and most popular.
04. Hugo: Go, up and coming most popular, faster.
07. Pelican: Python, Jinja2 templating.

What are people using for their personal blogs? | Lobsters

  • zimbatm/wiki: Personal vimwiki + GitHub Pages
  • Pelican because it has a good design and staticsite because it does not force you to follow a rigid directory layout.
  • I write in Markdown and use pandoc to convert it to HTML. I made a couple of scripts (“panpipe” and “panhandle”) which allow code embedded in the source to be executed during rendering. I generally write about programming stuff, so this is useful to ensure that code snippets actually work (execute them during rendering and abort if they’re broken) and that example output is accurate (generate the example output from the actual code during rendering).

Trying out Pelican

mkdir -p ~/GDrive/Main/site
cd ~/GDrive/Main/site
ln -s ../Notes content
virtualenv venv
source venv/bin/activate
pip install --upgrade pip
python -m pip install git+https://github.com/getpelican/pelican.git@master
pip install pelican[Markdown] typogrify beautifulsoup4 pillow
# Error with advthumbnailer, see https://github.com/AlexJF/pelican-advthumbnailer/pull/9
# pip install pelican[Markdown] typogrify beautifulsoup4 pillow pelican-advthumbnailer
# testing following for foursquare
pip install pelican-data-files
wget https://github.com/fle/pelican-simplegrey/archive/257e30c7e0091df2198b2c778754cd6f23112068.zip
# unzip pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068.zip
unzip 257e30c7e0091df2198b2c778754cd6f23112068
mv pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068 pelican-simplegrey
pelican-themes --install pelican-simplegrey
# summary plugin
git clone --recursive https://github.com/getpelican/pelican-plugins
echo "PLUGIN_PATHS = ['pelican-plugins']\nPLUGINS = ['summary']" >> pelicanconf.py

When resuming testing:

(cd ~/GDrive/Main/site && source venv/bin/activate && rm -rf "output/*" && pelican --listen --ignore-cache --autoreload --debug)

To get entries from the server to local:

cd ~/GDrive/Main/site
rsync -azi charlie:www/blog/pelican/ content/blog
# DONE: rsync -azi charlie:www/issues/pelican/ content/pages/issues
# DONE: rsync -azi charlie:www/pelican/ content/pages

OK now I’m going to set up a more permanent development environment following https://docs.getpelican.com/en/latest/contribute.html . I had been putting my virtual environment folders inside the source folders for other small projects, but I’ll go with their suggested location in ~/virtualenvs for now since I’ll be dealing with multiple repositories for plugins and Pelican itself.

git clone git@github.com:CNG/pelican.git ~/projects/pelican
cd ~/projects/pelican
git remote add upstream https://github.com/getpelican/pelican.git

mkdir ~/virtualenvs && cd ~/virtualenvs
python3 -m venv pelican
source ~/virtualenvs/pelican/*/activate
python -m pip install --upgrade pip
python -m pip install invoke
cd ~/projects/website/pelican
rm poetry.lock
invoke setup
# If you forget to cd to `pelican` you'll get: Can't find any collection named 'tasks'!

# `invoke setup` seems to install `pelican`, I assumed from PyPI or something
# due to the instructions at
# https://docs.getpelican.com/en/latest/contribute.html#setting-up-the-development-environment
# saying to run the following command next... but testing if this is necessary
# because I suspect something is getting screwed up later when I Pip install two
# addons.
python -m pip install -e ~/projects/website/pelican

# Later after development:
# invoke tests
# invoke lint
# invoke docserve # Then localhost:8000

git clone git@github.com:CNG/pelican-plugins.git ~/projects/pelican-plugins
cd ~/projects/pelican-plugins
# Forgot to --recursive, so:
git submodule update --init --recursive
git remote add upstream https://github.com/getpelican/pelican-plugins.git

# Running through this again 2022-11-29, my `make html` started failing after
running these two commands:
python -m pip install pillow
python -m pip install pelican-data-files pelican-advthumbnailer

# Ran once, which also sets path in $VIRTUAL_ENV/.project
# pelican-quickstart --path ~/projects/pelican-site

For now I’m going with these folders inside my existing Projects folder:

  • pelican-site: Pelican configuration, my theme, and the output folder, which I may move.
  • pelican-plugins: Git clone so I can edit plugins and such.
  • pelican: Git cloned source of Pelican itself.
  • pelican-content: The original Markdown files and images.

So the new command for resuming testing:

(cd ~/projects/pelican-site && source ~/virtualenvs/pelican/*/activate && make clean && make html)

Update 2023-06-08

I really need to invest enough time to get this system solidified so it doesn’t break each time I try to write a blog post! Now I came back and wanted to write about our new dog Ruffie and struggled to get the site working based on what I wrote above. I was getting output like this:

$ pelican --version
Traceback (most recent call last):
  File "/home/cgorichanaz/virtualenvs/pelican/bin/pelican", line 5, in <module>
    from pelican.__main__ import main
  File "/mnt/docs/GDrive/Main/Projects/website/pelican/pelican/__main__.py", line 5, in <module>
    from . import main
ImportError: cannot import name 'main' from 'pelican' (unknown location)

The problem turned out to be that pelican worked right after running invoke setup the first time, but then stopped working after I ran python -m pip install -e ~/projects/website/pelican. Reading pypa/setuptools/issues/3548 clued me into a fix. I added to my pelican-site/Makefile this:

export PYTHONPATH = /home/cgorichanaz/projects/website/pelican

Having PYTHONPATH pointing at my local Pelican source directory fixed the issue. I also have PELICAN?=/home/cgorichanaz/virtualenvs/pelican/bin/pelican in there, so I don’t need to activate the virtualenv manually either. My future testing command should be:

(cd ~/projects/pelican-site && make clean && make html)
# OR
cd ~/projects/pelican-site && make devserver

Code blocks

There are two ways to specify the identifier:

:::python
print("The triple-colon syntax will *not* show line numbers.")

To display line numbers, use a path-less shebang instead of colons:

#!python
print("The path-less shebang syntax *will* show line numbers.")

Need to check if I can use the backticks style code blocks with Pelican or make whatever Vim plugin I have formatting my *.md files learn about space indented code blocks. Because right now in this file everything is italic after a code block that has an asterisk. Edit: Pelican seems to support both ways.

Implementation issues

Note that even when using cached content, all output is always written, so the modification times of the generated *.html files will always change. Therefore, rsync-based uploading may benefit from the --checksum option.

Summaries

I’m using the summary plugin to allow for inserting a delimiter into the entry body marking where the summary should end so I don’t have to repeat content in the metadata. My existing website places an anchor at the end of the summary in the full entry page (and sometimes an ad), but there’s no way by default in Pelican to output the full entry minus the summary.

I couldn’t get a basic replace function to work, and also tried implementing a regex_replace custom filter to no avail. That’s when I looked closer at the Summary plugin implementation and found it was actually mutating the summary in summary.py#L81, and I’m not sure why it’s needed:

summary = str(BeautifulSoup(summary, 'html.parser'))

This results, for example, in a &nbsp; changing to a space and one of my image tags getting attributes rearranged, changing from:

<img alt="FCNG6780.jpg" height="413" src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" title="Saturday Sunset" width="620"/>

to:

<img src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" width="620" height="413" alt="FCNG6780.jpg" title="Saturday Sunset">

Maybe I need another custom filter to run the whole entry through BeautifulSoup… hmm. Though this would be better implemented by extending the summary plugin to provide a article.more or something analogous to MovableType’s EntryMore tag.

OK for now I patched the summary plugin. TODO: Look into upstreaming / better solution:

summary = content[begin_summary:end_summary]
pre = content[:begin_summary]
more = content[end_summary:]

if remove_markers:
    # remove the markers from the content
    if begin_summary:
        pre = pre.replace(begin_marker, '', 1)
        content = content.replace(begin_marker, '', 1)
    if end_summary:
        more = more.replace(end_marker, '', 1)
        content = content.replace(end_marker, '', 1)

summary = str(BeautifulSoup(summary, 'html.parser'))
pre = str(BeautifulSoup(pre, 'html.parser'))
more = str(BeautifulSoup(more, 'html.parser'))

instance._content = content
# default_status was added to Pelican Content objects after 3.7.1.
# Its use here is strictly to decide on how to set the summary.
# There's probably a better way to do this but I couldn't find it.
if hasattr(instance, 'default_status'):
    instance.metadata['summary'] = summary
else:
    instance._summary = summary
instance.has_summary = True
instance.pre = pre
instance.more = more

Dates

OK, I realize now Movable Type actually has three entry dates:

  • EntryDate: the “publish” timestamp manually set in the entry editing screen, which defaults to the time the entry editing page was first opened by hitting Create Entry.
  • EntryModifiedDate: the last save date. not user editable.
  • EntryCreatedDate: the timestamp the entry was first saved. not user editable.

The minimal implementation in Pelican would have modified date detected by the mtime from the filesystem. There’s no support for a “created date” based on the filesystem, which is not consistently implemented in filesystems anyway. So, I think I’ll need to specify all three types as metadata in the top of my file. If there’s no “Modified”, then we can fall back to mtime. If there’s no “Created”, we can fall back to “Date”?

Wow, it’s even worse than I thought. I’ll definitely need to specify all three dates manually because I just saved this file after a small update, and now apparently my filesystem only has the current date associated with the file instead of the 2020-03 date I started working on this (“Birth” updated to match “Modify”):

$ stat Notes/blog/2020/03/Website\ and\ notetaking\ system.md
  File: Notes/blog/2020/03/Website and notetaking system.md
  Size: 12822           Blocks: 32         IO Block: 4096   regular file
Device: 9,127   Inode: 166789121   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/cgorichanaz)   Gid: ( 1000/cgorichanaz)
Access: 2022-01-30 16:26:34.819133202 -0800
Modify: 2022-01-30 16:24:33.785325347 -0800
Change: 2022-01-30 16:24:33.791992040 -0800
 Birth: 2022-01-30 16:24:33.785325347 -0800

It would be great if I could have a source control based “modified” date, and better still if I could show the list of modification dates and what changed, like a wiki.

Wow, and now I realize by looking over Writing content there is apparently no created metadata item in Pelican. I started using it by accident because I was implementing my Movable Type templates, and my code in my pelican-site/votecharlie/templates/_blog_meta.html referencing article.created works just fine because I added Created: ... metadata to my entries when exporting from Movable Type. I guess I’ll keep it for now. It seems you might only need the post date and the modified date, but sometimes I liked to indicate generally when I wrote a post about a past event. Sometimes I would write my journals about a trip months or a year afterward, once I got around to sorting the photos. Having a modified date could help, but if I later modified the entry years later, I would lose that “I wrote this about a year after the event in question” notion.

Categories

Vanilla Pelican does X.

I’m using the more-categories plugin to support multiple and nested categories:

Category: foo/bar/bazz, bing, bell

TODO

Blockers

  • Unpublished / draft blog entries
  • Private tags with @ not captured by mt:EntryTags

Nonblockers

  • Implement Pelican Search instead of the Google Custom Search box that’s full of ads.
  • Remove all the Google ads, they weren’t worth dealing with.
  • Move the signup off each entry to a static page. Let’s trim down…
  • Consider removing the Amazon referral links, are they even working?
  • Review https://docs.getpelican.com/en/latest/content.html#ref-linking-to-internal-content
  • Review https://docs.getpelican.com/en/latest/settings.html#reading-only-modified-content
  • Feeds
  • Some pages have many URL permutations? Like /category/adventures/argentina and /category/adventures/argentina/index.html.
    • I think I fixed this with the CATEGORY_SAVE_AS or similar.
  • Subcategories: On MT, entries did not appear on category listings for parent categories to the actually selected categories, whereas in Pelican they do (see “Adventures”). Relatedly, the categories listing sidebar does not show parent categories as links even when there are entries for them.
  • Foursquare integration. Look into this maybe? https://github.com/getpelican/pelican/pull/2922
  • Implement thumbnailer for use with Foursquare images, etc.
    • Using advthumbnailer.
  • templates/_blog_age_notice.html
  • templates/_base.html is a mess with regard to meta tag images. I can’t believe how complicated my Movable Type template logic was there. I probably should move images to a custom metadata instead of trying to intelligently extract it…`
  • Some entries have <pre> that should be converted to normal code block, like /blog/2017/06/mint-batch-transaction-import-hack.html. Plus one of the images has HTML in alt and is getting messed up.
  • Will need to migrate to YouTube, I think. Or pay for Vimeo forever even if I don’t upload many videos. See https://ymcinema.com/2019/04/03/vimeo-is-deleting-your-videos-when-you-switch-to-basic-account/ ; I already have videos that are “hidden” and maybe have deleted ones, too.