Website and notetaking system

Posted 25 March 2020 at 00:00 at age 31.
Edited 8 June 2023 at 00:00.

Technology

While I was traveling in Taiwan in March 2020 as the world was descending into Covid-19 chaos, I had some time to ponder improvements to my blogging and notetaking processes. At the time, my website was happily powered by the mostly static generator Movable Type, and my notetaking was strewn across text files, Google Docs and physical paper. I wanted to try to merge them as much as possible, so it seemed to make sense to do both in the same place: text files on my computer. I resisted moving away from MT for a long time, because I love its super intuitive templating system and ease of logging in and blogging from anywhere. MT has served me well for almost 20 years (wow!), and if this site had to support more than one author, I wouldn’t want to change.

So, I thought it would be worth seeing out what a simpler text file based system could look like. I ended up mostly reimplementing my theme as Jinja templates in Pelican before I got home. Sadly, life events and the pandemic got in the way, and it took me almost 3 years to sit back down and finish the rough edges. In that time I basically haven’t blogged anyway, though I did start keeping a simple bullet-like journal in Google Docs to at least keep organized.

That all said, what follows is the brainstorming document I used for this process, which turned into implementation notes and a checklist for Pelican. It’s probably not useful or interesting to anyone else, but it might be worth reviewing if you feel you are on a similar journey.

Note organization concerns:

Dotfiles contain notes in:
docs that are combination of:
- general notes I could pull out.
- notes that are specific to my configuration and make sense to be alongside the dotfiles.
notes that are line items for my Rofi quick lookup hack solution.
Old notes Git repository is probably the start of “public” notes.
Need space for nonpublic notes, probably what I’m using Vimwiki for now.
Would like public blog automatically published from a subset of the Vimwiki.

Probably this means Vimwiki contains a directory that is public, or there’s a way to tag pages such that they become published. Possibly the docs section of my dotfiles are hardlinked into this structure or duplicated (yuck).

Content types

I’m trying to decide a structure for my content. My existing blog has URLs like YYYY/MM/basename.html. I could have a similar file structure for my markdown files for things I would have published as a blog post.

For notes and other non-date-based content, I could treat them like “pages” on my website with a different URL structure simply based on the page name and possibly an arbitrary category.

There seems to be a grey area where I would want date created and modified on the pages, but not have the date in the URL. But over many years, this might become a mess. So maybe I just want everything under the same date based URL scheme, like everything is a blog entry, and the date corresponds to date created.

Relatedly, on my existing blog, for the regular blog entries, I have some categories where I tried to capture different classes of content: healthlog, reeflog (regarding my aquarium), notes (on videos or books), records, projects, essays, conversations (chat logs), reports, schoolwork, personal and writing.

Possible structure:

YYYY/MM/basename (blogs)
notes/books/basename
notes/videos/basename
notes/articles/basename
notes/topics/basename (this seems awkward)

Yeah I’m not sure I like separating out “notes”, at least the “one time” notes on a book or talk or article seem like they could go in the date based structure just fine. The thing I’m struggling with is where I’d put long running notes such as “Python”, etc. Those could still go into the date based structure, and I could use fuzzy find to locate them when I need, but it seems strange for my Python notes to be located at something like /2009/11/python and be updated for decades to come. (Though now that I have been editing this very blog entry for almost 3 years, I may just need to give up trying to make this perfect!)

Rethinking this, my content might actually be:

diaries
documentation (errors, emails, interesting things to remember) possibly with commentary or solutions
one time notes on some fixed piece of content
long running notes on a topic based on many sources

It feels like some content is “wiki-like”, which would not have a date based URL, and other content is “publication-like” (could be journal or an article that’s not intended to be revised much). The latter makes more sense to have a posting date, and thus a date based URL. The date a wiki page for some topic first was created is possibly interesting but I would not expect that date to affect the URL to the page.

There’s also the consideration that my local file structure need not match the URL structure. Possibly I want to organize my files into the non-date based hierarchy but still have the URLs be date based. Hmm. Maybe I’ll start by organizing my filesystem until it feels right, and worry about URLs later. I don’t even have a publishing flow set up yet, anyway.

Platforms research

10 best static site generators | Creative Bloq

01. Jekyll: Ruby, Liquid templating engine, Sass, earliest and most popular.
04. Hugo: Go, up and coming most popular, faster.
07. Pelican: Python, Jinja2 templating.

What are people using for their personal blogs? | Lobsters

zimbatm/wiki: Personal vimwiki + GitHub Pages
Pelican because it has a good design and staticsite because it does not force you to follow a rigid directory layout.
I write in Markdown and use pandoc to convert it to HTML. I made a couple of scripts (“panpipe” and “panhandle”) which allow code embedded in the source to be executed during rendering. I generally write about programming stuff, so this is useful to ensure that code snippets actually work (execute them during rendering and abort if they’re broken) and that example output is accurate (generate the example output from the actual code during rendering).

Trying out Pelican

mkdir -p ~/GDrive/Main/site
cd ~/GDrive/Main/site
ln -s ../Notes content
virtualenv venv
source venv/bin/activate
pip install --upgrade pip
python -m pip install git+https://github.com/getpelican/pelican.git@master
pip install pelican[Markdown] typogrify beautifulsoup4 pillow
# Error with advthumbnailer, see https://github.com/AlexJF/pelican-advthumbnailer/pull/9
# pip install pelican[Markdown] typogrify beautifulsoup4 pillow pelican-advthumbnailer
# testing following for foursquare
pip install pelican-data-files
wget https://github.com/fle/pelican-simplegrey/archive/257e30c7e0091df2198b2c778754cd6f23112068.zip
# unzip pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068.zip
unzip 257e30c7e0091df2198b2c778754cd6f23112068
mv pelican-simplegrey-257e30c7e0091df2198b2c778754cd6f23112068 pelican-simplegrey
pelican-themes --install pelican-simplegrey
# summary plugin
git clone --recursive https://github.com/getpelican/pelican-plugins
echo "PLUGIN_PATHS = ['pelican-plugins']\nPLUGINS = ['summary']" >> pelicanconf.py

When resuming testing:

(cd ~/GDrive/Main/site && source venv/bin/activate && rm -rf "output/*" && pelican --listen --ignore-cache --autoreload --debug)

To get entries from the server to local:

cd ~/GDrive/Main/site
rsync -azi charlie:www/blog/pelican/ content/blog
# DONE: rsync -azi charlie:www/issues/pelican/ content/pages/issues
# DONE: rsync -azi charlie:www/pelican/ content/pages

OK now I’m going to set up a more permanent development environment following https://docs.getpelican.com/en/latest/contribute.html . I had been putting my virtual environment folders inside the source folders for other small projects, but I’ll go with their suggested location in ~/virtualenvs for now since I’ll be dealing with multiple repositories for plugins and Pelican itself.

git clone git@github.com:CNG/pelican.git ~/projects/pelican
cd ~/projects/pelican
git remote add upstream https://github.com/getpelican/pelican.git

mkdir ~/virtualenvs && cd ~/virtualenvs
python3 -m venv pelican
source ~/virtualenvs/pelican/*/activate
python -m pip install --upgrade pip
python -m pip install invoke
cd ~/projects/website/pelican
rm poetry.lock
invoke setup
# If you forget to cd to `pelican` you'll get: Can't find any collection named 'tasks'!

# `invoke setup` seems to install `pelican`, I assumed from PyPI or something
# due to the instructions at
# https://docs.getpelican.com/en/latest/contribute.html#setting-up-the-development-environment
# saying to run the following command next... but testing if this is necessary
# because I suspect something is getting screwed up later when I Pip install two
# addons.
python -m pip install -e ~/projects/website/pelican

# Later after development:
# invoke tests
# invoke lint
# invoke docserve # Then localhost:8000

git clone git@github.com:CNG/pelican-plugins.git ~/projects/pelican-plugins
cd ~/projects/pelican-plugins
# Forgot to --recursive, so:
git submodule update --init --recursive
git remote add upstream https://github.com/getpelican/pelican-plugins.git

# Running through this again 2022-11-29, my `make html` started failing after
running these two commands:
python -m pip install pillow
python -m pip install pelican-data-files pelican-advthumbnailer

# Ran once, which also sets path in $VIRTUAL_ENV/.project
# pelican-quickstart --path ~/projects/pelican-site

For now I’m going with these folders inside my existing Projects folder:

pelican-site: Pelican configuration, my theme, and the output folder, which I may move.
pelican-plugins: Git clone so I can edit plugins and such.
pelican: Git cloned source of Pelican itself.
pelican-content: The original Markdown files and images.

So the new command for resuming testing:

(cd ~/projects/pelican-site && source ~/virtualenvs/pelican/*/activate && make clean && make html)

Update 2023-06-08

I really need to invest enough time to get this system solidified so it doesn’t break each time I try to write a blog post! Now I came back and wanted to write about our new dog Ruffie and struggled to get the site working based on what I wrote above. I was getting output like this:

$ pelican --version
Traceback (most recent call last):
  File "/home/cgorichanaz/virtualenvs/pelican/bin/pelican", line 5, in <module>
    from pelican.__main__ import main
  File "/mnt/docs/GDrive/Main/Projects/website/pelican/pelican/__main__.py", line 5, in <module>
    from . import main
ImportError: cannot import name 'main' from 'pelican' (unknown location)

The problem turned out to be that pelican worked right after running invoke setup the first time, but then stopped working after I ran python -m pip install -e ~/projects/website/pelican. Reading pypa/setuptools/issues/3548 clued me into a fix. I added to my pelican-site/Makefile this:

export PYTHONPATH = /home/cgorichanaz/projects/website/pelican

Having PYTHONPATH pointing at my local Pelican source directory fixed the issue. I also have PELICAN?=/home/cgorichanaz/virtualenvs/pelican/bin/pelican in there, so I don’t need to activate the virtualenv manually either. My future testing command should be:

(cd ~/projects/pelican-site && make clean && make html)
# OR
cd ~/projects/pelican-site && make devserver

Update 2025-03-05

I’ve been publishing a bunch of posts last year as recently as September, which was 5 months ago now. How does time fly! But anyway, trying to make a post now and my script is giving me:

Traceback (most recent call last):
  File "/home/cgorichanaz/virtualenvs/pelican/bin/pelican", line 5, in <module>
    from pelican.__main__ import main
  File "/home/cgorichanaz/projects/website/pelican/pelican/__init__.py", line 19, in <module>
    from pelican.log import console
  File "/home/cgorichanaz/projects/website/pelican/pelican/log.py", line 4, in <module>
    from rich.console import Console
ModuleNotFoundError: No module named 'rich'
make: *** [Makefile:72: publish] Error 1

I did a bit of poking around and found, when I activated the virtualenv, my reported Python version was 3.13.2, despite the folders in the virtualenv having names like lib/python3.12 and the contents of pyvenv.cfg being like:

version = 3.12.3
executable = /usr/bin/python3.12

I also saw I did not have that file in /usr/bin anymore. So I manually renamed the two folders in lib and include from 3.12 to 3.13 and in that config file. It seemed to work after that… I should go through and try updating my Pelican to latest one of these days though. Looks like since my current 4.8.0 to the latest 4.11.0 they changed the build system from Poetry to something called PDM.

Code blocks

There are two ways to specify the identifier:

:::python
print("The triple-colon syntax will *not* show line numbers.")

To display line numbers, use a path-less shebang instead of colons:

#!python
print("The path-less shebang syntax *will* show line numbers.")

Need to check if I can use the backticks style code blocks with Pelican or make whatever Vim plugin I have formatting my *.md files learn about space indented code blocks. Because right now in this file everything is italic after a code block that has an asterisk. Edit: Pelican seems to support both ways.

Implementation issues

Note that even when using cached content, all output is always written, so the modification times of the generated *.html files will always change. Therefore, rsync-based uploading may benefit from the --checksum option.

Summaries

I’m using the summary plugin to allow for inserting a delimiter into the entry body marking where the summary should end so I don’t have to repeat content in the metadata. My existing website places an anchor at the end of the summary in the full entry page (and sometimes an ad), but there’s no way by default in Pelican to output the full entry minus the summary.

I couldn’t get a basic replace function to work, and also tried implementing a regex_replace custom filter to no avail. That’s when I looked closer at the Summary plugin implementation and found it was actually mutating the summary in summary.py#L81, and I’m not sure why it’s needed:

summary = str(BeautifulSoup(summary, 'html.parser'))

This results, for example, in a   changing to a space and one of my image tags getting attributes rearranged, changing from:

<img alt="FCNG6780.jpg" height="413" src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" title="Saturday Sunset" width="620"/>

to:

<img src="https://lh3.googleusercontent.com/-7LyWV465q6k/V-IWou7euLI/AAAAAAAA82g/Mq0kxENtIWQLGwvKKYoRa4e9zkukYrb7QCHM/w992/FCNG6780.jpg" width="620" height="413" alt="FCNG6780.jpg" title="Saturday Sunset">

Maybe I need another custom filter to run the whole entry through BeautifulSoup… hmm. Though this would be better implemented by extending the summary plugin to provide a article.more or something analogous to MovableType’s EntryMore tag.

OK for now I patched the summary plugin. TODO: Look into upstreaming / better solution:

summary = content[begin_summary:end_summary]
pre = content[:begin_summary]
more = content[end_summary:]

if remove_markers:
    # remove the markers from the content
    if begin_summary:
        pre = pre.replace(begin_marker, '', 1)
        content = content.replace(begin_marker, '', 1)
    if end_summary:
        more = more.replace(end_marker, '', 1)
        content = content.replace(end_marker, '', 1)

summary = str(BeautifulSoup(summary, 'html.parser'))
pre = str(BeautifulSoup(pre, 'html.parser'))
more = str(BeautifulSoup(more, 'html.parser'))

instance._content = content
# default_status was added to Pelican Content objects after 3.7.1.
# Its use here is strictly to decide on how to set the summary.
# There's probably a better way to do this but I couldn't find it.
if hasattr(instance, 'default_status'):
    instance.metadata['summary'] = summary
else:
    instance._summary = summary
instance.has_summary = True
instance.pre = pre
instance.more = more

Dates

OK, I realize now Movable Type actually has three entry dates:

EntryDate: the “publish” timestamp manually set in the entry editing screen, which defaults to the time the entry editing page was first opened by hitting Create Entry.
EntryModifiedDate: the last save date. not user editable.
EntryCreatedDate: the timestamp the entry was first saved. not user editable.

The minimal implementation in Pelican would have modified date detected by the mtime from the filesystem. There’s no support for a “created date” based on the filesystem, which is not consistently implemented in filesystems anyway. So, I think I’ll need to specify all three types as metadata in the top of my file. If there’s no “Modified”, then we can fall back to mtime. If there’s no “Created”, we can fall back to “Date”?

Wow, it’s even worse than I thought. I’ll definitely need to specify all three dates manually because I just saved this file after a small update, and now apparently my filesystem only has the current date associated with the file instead of the 2020-03 date I started working on this (“Birth” updated to match “Modify”):

$ stat Notes/blog/2020/03/Website\ and\ notetaking\ system.md
  File: Notes/blog/2020/03/Website and notetaking system.md
  Size: 12822           Blocks: 32         IO Block: 4096   regular file
Device: 9,127   Inode: 166789121   Links: 1
Access: (0644/-rw-r--r--)  Uid: ( 1000/cgorichanaz)   Gid: ( 1000/cgorichanaz)
Access: 2022-01-30 16:26:34.819133202 -0800
Modify: 2022-01-30 16:24:33.785325347 -0800
Change: 2022-01-30 16:24:33.791992040 -0800
 Birth: 2022-01-30 16:24:33.785325347 -0800

It would be great if I could have a source control based “modified” date, and better still if I could show the list of modification dates and what changed, like a wiki.

Wow, and now I realize by looking over Writing content there is apparently no created metadata item in Pelican. I started using it by accident because I was implementing my Movable Type templates, and my code in my pelican-site/votecharlie/templates/_blog_meta.html referencing article.created works just fine because I added Created: ... metadata to my entries when exporting from Movable Type. I guess I’ll keep it for now. It seems you might only need the post date and the modified date, but sometimes I liked to indicate generally when I wrote a post about a past event. Sometimes I would write my journals about a trip months or a year afterward, once I got around to sorting the photos. Having a modified date could help, but if I later modified the entry years later, I would lose that “I wrote this about a year after the event in question” notion.

TODO

Blockers

Unpublished / draft blog entries
Private tags with @ not captured by mt:EntryTags

Nonblockers

Implement Pelican Search instead of the Google Custom Search box that’s full of ads.
Remove all the Google ads, they weren’t worth dealing with.
Move the signup off each entry to a static page. Let’s trim down…
Consider removing the Amazon referral links, are they even working?
Review https://docs.getpelican.com/en/latest/content.html#ref-linking-to-internal-content
Review https://docs.getpelican.com/en/latest/settings.html#reading-only-modified-content
Feeds
Some pages have many URL permutations? Like /category/adventures/argentina and /category/adventures/argentina/index.html.
- I think I fixed this with the CATEGORY_SAVE_AS or similar.
Subcategories: On MT, entries did not appear on category listings for parent categories to the actually selected categories, whereas in Pelican they do (see “Adventures”). Relatedly, the categories listing sidebar does not show parent categories as links even when there are entries for them.
Foursquare integration. Look into this maybe? https://github.com/getpelican/pelican/pull/2922
Implement thumbnailer for use with Foursquare images, etc.
- Using advthumbnailer.
templates/_blog_age_notice.html
templates/_base.html is a mess with regard to meta tag images. I can’t believe how complicated my Movable Type template logic was there. I probably should move images to a custom metadata instead of trying to intelligently extract it…`
Some entries have <pre> that should be converted to normal code block, like /blog/2017/06/mint-batch-transaction-import-hack.html. Plus one of the images has HTML in alt and is getting messed up.
Will need to migrate to YouTube, I think. Or pay for Vimeo forever even if I don’t upload many videos. See https://ymcinema.com/2019/04/03/vimeo-is-deleting-your-videos-when-you-switch-to-basic-account/ ; I already have videos that are “hidden” and maybe have deleted ones, too.

Who am I?

I am me! I’m also a scientist minded software engineer who loves reading, running, listening to music, and recording photos and videos and data of all sorts. After earning a biochemistry degree, I lived in San Francisco and Tokyo, and now I find it difficult to stay put. Read more about me and my online life.

The Campaign Trail

SF: 87% complete; Oakland: 27%

San Francisco Bay Area running progress

Charlie says: “What a wonderful region!”

30 November 2022
In-N-Out Burger (Fast Food Restaurant)

333 Jefferson St , San Francisco , CA

Charlie says: “It was super hot and we got a taste for salty fries, but by the time we walked there it dropped 35° and was cold. Still tasted good!”

28 September 2020 at 20:32
Taqueria Zorro (Mexican Restaurant)

308 Columbus Ave , San Francisco , CA

Charlie says: “Restaurants are hoppin’ around here, feels weird.”

26 September 2020 at 19:42
碼頭老火鍋 (Hotpot Restaurant)

仁愛路四段409-1號 , Da’an District , T’ai-pei Shih

Charlie says: “Delicious spicy hot pot with Harry. I am so full!!”

25 March 2020 at 08:40
桶好呷滷味 (Asian Restaurant)

, Taipei

Charlie says: “We pick a representative set of ingredients and they build out the rest into a braised soup like thing over noodles.”

23 March 2020 at 06:46
Addiction Aquatic Development (上引水產) (Fish Market)

民族東路410巷2弄18號 , Taipei

Charlie says: “Standing sushi bar at a fish market.”

21 March 2020 at 07:03
ACME Breakfast CLUB (Breakfast Spot)

3F., No. 10, Ln. 27, Chengdu Rd., , Taipei

Charlie says: “Brunch w/ Shawn! Was tempted to get the avocado toast kind of as a joke since I never get it in SF, but resisted, sourdough was good. :-)”

20 March 2020 at 22:27
三甲和風創意料理 (Japanese Restaurant)

Charlie says: “Late dinner with Shawn, at a lovely place!”

20 March 2020 at 09:28
中央藝文公園 Central Culture Park (Park)

北平東路與紹興北街口 , Taipei

Charlie says: “Social distance.”

20 March 2020 at 03:42
虎頭山環保公園 (Scenic Lookout)

Charlie says: “Exploring the hillside in Taoyuan City.”

14 March 2020 at 23:49
Abura-Ya (Japanese Restaurant)

362 17th St , Oakland , CA

Charlie says: “Dinner with Beam before Sarah McLachlan!”

24 February 2020 at 18:55
Ramen Yamadaya (Ramen Restaurant)

1728 Buchanan St , San Francisco , CA

Charlie says: “Dinner with John and Alan”

04 January 2020 at 19:19
Taraval Okazu Ya Restaurant (Sushi Restaurant)

1735 Taraval St , San Francisco , CA

Charlie says: “Dinner with Alan and Emre”

28 December 2019 at 21:59
Tselogs (Filipino Restaurant)

11B San Pedro Rd , Daly City , CA

Charlie says: “John wanted to take me to a Filipino place. It was a quiet night but good food!”

30 November 2019 at 17:43
Golden Gate Bridge (Bridge)

Golden Gate Brg S , San Francisco , CA

Charlie says: “Visiting the bridge with Jay, whom I have not seen in years. Time flies when you don’t slow it down.”

16 November 2019 at 16:22
Buckhorn Grill (BBQ Joint)

619 Market St , San Francisco , CA

Charlie says: “Dinner with Beam! And needed somewhere I can pull out my laptop since I'm on call today and it's been a bit crazy.”

19 September 2019 at 18:54
Tank Hill Park (Park)

Clarendon Ave , San Francisco , CA

23 June 2019 at 19:26
Cafe Bavaria (German Restaurant)

7700 Harwood Ave , Wauwatosa , WI

Charlie says: “Nice puffy pot pie dinner with Tim and Mom”

19 June 2019 at 16:50
Spring Shabu Shabu (Hotpot Restaurant)

, Boston , MA

Charlie says: “Delicious last night in Boston!”

14 June 2019 at 19:14
Taiyaki NYC - Boston (Ice Cream Shop)

119 Seaport Blvd Ste B , Boston , MA

Charlie says: “Post team lunch snack.”

12 June 2019 at 10:11
Aceituna Grill (Mediterranean Restaurant)

57 Boston Wharf Rd , Boston , MA

Charlie says: “Falafel plate with tabbouleh and moussaka”

11 June 2019 at 09:35
Twin Peaks Summit (Hill)

100 Christmas Tree Point Rd , San Francisco , CA

Charlie says: “#walkSF to work day!”

10 April 2019 at 08:51