Backup chronicles: Google Drive client constantly crashing
Since Amazon canceled the unlimited storage I planned to use to replace BackBlaze, I have been weighing my options. I decided to pay $100 to upgrade my Google Drive to 1TB for one year and try syncing my documents (~900GB) there, but after three days I have not managed to upload much due to the Mac client crashing frequently. Not promising!
My 17TB of data
I keep my data on a 20TB volume comprised of two 10TB Seagate hard drives “Seagate Enterprise Capacity 3.5 HDD 10TB (Helium) 7200RPM SATA 6Gb/s 256 MB Cache Internal Bare Drive (ST10000NM0016)”. The volume is vulnerable to either disk failing, but I justified this risk by my use of Backblaze to nearly constantly back everything up to the cloud. I also periodically sync to my Drobo 5D that has five 6TB hard drives providing a volume of about 18TB that can supposedly survive two simultaneous hard drive failures. The only problem there is the enclosure has been getting very hot and making more noise than I would like in my bedroom, so I keep it off except when I remember to do a backup. Not ideal.
My current data spread is as follows:
➜ Striped sudo du -sh ./*
869G ./Backups
285G ./Bundle
825G ./Documents
172G ./Music
611G ./Pictures
236G ./Reading
58G ./Software
13T ./Videos
I had been trying to get everything backed up except for most of the videos folder, but the furthest I ever got with Backblaze was about 4TB before I needed to start over due to a system change on my end. So for now, I am just trying to get the documents and pictures folders backed up at the very least.
Google Drive versus Amazon Drive
It looks like Google Drive costs $100 for 1TB for one year, but I would need at least 2TB. The 20% discount for paying yearly apparently is only for 1TB, so I would need to pay $20 per month.
Amazon Prime comes with unlimited photo storage, so theoretically I could back up most of that 600GB of photos there for no extra cost. (I would of course not be surprised if this unlimited option disappeared once I was well on the way to uploading all my photos!) It is unclear whether the unlimited option includes any videos, but at the very least it does not include videos larger than 2GB. Note this email I received:
Welcome to Prime Photos
Unlimited online photo storage, plus 5 GB for videos and files. …
Since a few years ago, I switched from storing my videos in a separate folder structure to storing them alongside my photos, as often they are directly related. (I wish Facebook allowed organizing videos alongside photos in albums!) It seems Amazon will count RAW photos as photos at least. So, probably, if I used the unlimited photos option, I would end up paying for 1TB Amazon Drive space as well, which would cost $60 per year.
I would be tempted to store everything on Amazon Drive since it’s cheaper than Google Drive, but the client I would use, Arq, only does encrypted backups. This is probably a smart move in general, but I decided the ability to browse online and share specific files was going to likely be more convenient and valuable, and the Google Drive interface for that made the most sense since it would integrate well with my Google Documents.
Google Documents backup
I am currently using the app Cloudpull to back up my Google Apps data to a local backups folder. That includes my email, calendar, contacts and documents. Since I would now be syncing all my local documents to my Google Drive, it would be silly and wasteful to have Cloudpull redownload all that data that originated on my local computer. I could disable “docs” for my main Google account, but then I would not get backups of my Google Docs, Google Sheets, etc., files.
With the Google Drive client, those files are apparently not actually synced, but rather the client creates files that simply link to the documents online. This is fine if you trust Google to always be available and never lose a file. I also have a bridge to sell you.
Cloudpull manages to convert all the proprietary Google docs files to Microsoft Office variants in the backup. I want to maintain this behavior, but don’t want to segregate all the Google format files into special folders to track separately. Cloudpull has an option under Preferences > Google Accounts > Just Google Docs, Sheets, and Slides. This seems be what I want, but when I tried it, the backup size seemed to be clearly larger than it should be were it solely backing up the Google format files. I will need to revisit that. For now I will let it back up everything it’s backing up, erring on the side of safety.
On a related note, Google provides an option to edit Google Docs files offline, which apparently must be handled through a magical back end in Chrome browser using the Google Docs offline Chrome extension. It’s not clear to me what files will be stored locally and in what format. I would guess just the Google format documents are downloaded in some format, but I would also not be surprised if I ended up with Chrome using terabytes of space as a result. Since I am mainly concerned with having an editable backup in the event Google gets wiped off the face of the earth, I am not going to bother investigating the offline editing capability now, either.
Google Drive client problems
For my initial setup, I installed the Google Drive Mac client and saw it created a Google Drive
folder under my user folder. My data is all on a separate volume from my main system volume, and I wanted to change the name. I was able to delete the Google Drive
folder, quit and reopen the Google Drive app, and then select from the prompt a new folder to use. I picked my Documents
folder on my data volume, and I moved a few other folders into Documents
for simplicity. (I struggle to decide the most logical folder structure, but since Google Drive must sync one root folder, my mind was made up on those folders still hanging out next to Documents
!)
The client started downloading my existing Google Drive files to the Documents
folder. I changed the settings to exclude two larger folders, my TitaniumBackup
folder my Nexus 6P backs up to and another folder I use to store files I shared with others, most of which are redundant with what I would be soon uploading, and which I looked forward to streamlining. The client also indexed all the files to be uploaded, of which there are apparently about 500,000. At a rate of about 50GB per day, I figured my documents would be uploaded in about two weeks.
The first morning after, I was disappointed to see the client was not running. I restarted it, and it initially reported being completely synced, even though I knew that was impossible. For a minute I feared all my documents had been deleted (I made sure to sync to my Drobo first.), but they were still there. After about 10 minutes, the Google Drive client reported it was uploading files. I guess it takes some time to switch from thinking it is done to scanning for files. It would be nice if it instead reported it was scanning instead of claiming completion.
About a day later, I found this message on the screen:
Sorry, Google Drive needs to quit.
An unknown issue occurred and Google Drive needs to quit. Error: 6248
The “Learn more” link was unhelpful.
I restarted the client and gave it another chance. The same thing happened after some number of hours, several times. Several days later, I think I managed to upload only a gigabyte or two of files.
Moving virtual machine images to sparse bundles
I suspected the Google Drive client might be somehow choking on my Windows 7 virtual machine, which is stored in a roughly 40GB Parallels image. I was not sure if that was a single large file or already distributed among smaller files. A quick investigation showed my Windows 7.pvm
file was in fact a directory of smaller files, but the bulk of the 40GB was a single file representing the Windows filesystem. For what it’s worth, the Google’s “Files you can store in Google Drive” says it supports “All other files: Up to 5 TB.”
It occurred to me I should be able to move the entire PVM file to a sparse bundle backed volume, which would turn that huge file into many 8MB files and hopefully be easier for the Google Drive client to manage. I had no idea what performance implications that might have, so I did a quick search to see if this strategy was standard or at least done before. I found the blog post “Efficient backups: storing VMs in a sparse bundle” that describes exactly what I had in mind, so that was good enough for me.
I’ll see if the Google Drive client works better now.
Update
Before I finished writing this post, the client already crashed. So apparently it was not the large virtual machine file it was choking on.
I checked the Console, and there are many items under System Reports named like Google Drive_2017-06-18+195436_CharlieDesktop.cpu_resource.diag
containing similar content, notably a line similar to:
90s cpu time over 112 seconds (80% cpu average), exceeding limit of 50% cpu over 180 seconds
For the record, here is more of the error report:
[SPOILER=Google Drive_2017-06-18+195436_CharlieDesktop.cpu_resource.diag]
Date/Time: 2017-06-18 19:52:43.366294 -0700
OS Version: Mac OS X 10.12.5 (Build 16F73)
Architecture: x86_64
Report Version: 19
Command: Google Drive
Path: /Applications/Google Drive.app/Contents/MacOS/Google Drive
Version: 2.34 (2.34.5075.1619)
Parent: launchd [1]
PID: 94697
Event: cpu usage
CPU: 90s cpu time over 112 seconds (80% cpu average), exceeding limit of 50% cpu over 180 seconds
Duration: 112.14s
Steps: 80
Hardware model: iMac14,2
Active cpus: 8
Fan speed: 836 rpm
Powerstats for: Google Drive [94697]
UUID: FFD751CA-37B6-3FAC-9CF0-413E01DAA81D
Start time: 2017-06-18 19:52:44 -0700
End time: 2017-06-18 19:54:34 -0700
Microstackshots: 80 samples (100%)
Primary state: 56 samples Non-Frontmost App, User mode, Effective Thread QoS Default, Requested Thread QoS Default, Override Thread QoS Unspecified
User Activity: 0 samples Idle, 80 samples Active
Power Source: 0 samples on Battery, 80 samples on AC
74 thread_start + 13 (libsystem_pthread.dylib) [0x7fffa972a08d]
74 _pthread_start + 286 (libsystem_pthread.dylib) [0x7fffa972a887]
74 _pthread_body + 180 (libsystem_pthread.dylib) [0x7fffa972a93b]
74 ??? (Python + 966710) [0x1005ec036]
74 PyEval_CallObjectWithKeywords + 93 (Python) [0x1005b1f6d]
74 PyObject_Call + 99 (Python) [0x10050e713]
74 ??? (Python + 113079) [0x10051b9b7]
74 PyObject_Call + 99 (Python) [0x10050e713]
...dozens of lines like the above...
Binary Images:
0x100000000 - 0x100009fff com.google.GoogleDrive 2.34 (2.34.5075.1619) <FFD751CA-37B6-3FAC-9CF0-413E01DAA81D> /Applications/Google Drive.app/Contents/MacOS/Google Drive
0x100500000 - 0x100649ff7 org.python.python 2.7.10, (c) 2001-2015 Python Software Foundation. (2.7.10) <69F40DB2-A2CA-3C19-B715-6FE0F9FEFD3E> /Applications/Google Drive.app/Contents/Frameworks/Python.framework/Versions/2.7/Python
0x1007e1000 - 0x1007e4fff zlib.so <A49D7D7B-D86A-3872-8A88-A55A07B9C5EC> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/zlib.so
0x1007f5000 - 0x1007f7fff time.so <F0B87F3A-295E-393E-B800-7CD7E2212792> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/time.so
0x103a80000 - 0x103a93ff7 _ctypes.so <7D9B7F39-A77A-3781-BEA5-3CA8BE600496> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/_ctypes.so
0x103a9c000 - 0x103aa2fff itertools.so <E7F28918-B9A6-38CC-BDFC-52F9016385B0> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/itertools.so
0x103c76000 - 0x103cc7fff _objc.so <A68C8EF8-F430-06D1-A94F-19B52B5F21F3> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/objc/_objc.so
0x105ff4000 - 0x105ff4ff7 _appmain.so <5E69F45B-8EF7-D5C2-0ED5-3D7D65899213> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/AppKit/_appmain.so
0x108481000 - 0x108638ff7 _core_.so <3CD51016-2552-F3A8-EB67-F4F0654461DF> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/wx/_core_.so
0x108694000 - 0x109541fef libwx_osx_cocoau-3.0.0.1.0.dylib (2) <F29A92D9-4D9F-15B4-7EA2-353F00C38CCE> /Applications/Google Drive.app/Contents/Frameworks/libwx_osx_cocoau-3.0.0.1.0.dylib
0x109b30000 - 0x109b33ff7 select.so <E267C7CE-92B0-3959-B23A-36316C90E548> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/select.so
0x10a460000 - 0x10a580ff7 _misc_.so <93A81DF3-D4F4-D294-97E3-3521368467B0> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/wx/_misc_.so
0x10a664000 - 0x10a79aff7 _sqlite.so <3F51BB0B-8756-3752-80CE-EC52CEBFBCF4> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/pysqlite2/_sqlite.so
0x10aae7000 - 0x10aaf1fff _ssl.so <DB62D1F5-CE41-3995-857C-60767C0974F6> /Applications/Google Drive.app/Contents/Resources/lib/python2.7/lib-dynload/_ssl.so
0x7fff9184b000 - 0x7fff92624ff3 com.apple.AppKit 6.9 (1504.83.101) <EC7BD195-F9E1-3E43-820A-5FDD0B2B0B67> /System/Library/Frameworks/AppKit.framework/Versions/C/AppKit
0x7fff932c8000 - 0x7fff935d1fff com.apple.HIToolbox 2.1.1 <CAB143FE-AEAF-3EDE-AD7B-C04E1B7C5615> /System/Library/Frameworks/Carbon.framework/Versions/A/Frameworks/HIToolbox.framework/Versions/A/HIToolbox
0x7fff93d10000 - 0x7fff941a9ff7 com.apple.CoreFoundation 6.9 (1349.8) <09ED473E-5DE8-307F-B55C-16F6419236D5> /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
0x7fffa8209000 - 0x7fffa82eeff7 libcrypto.0.9.8.dylib (64.50.6) <D34E16A7-990A-37A9-933A-DFAA46554EAA> /usr/lib/libcrypto.0.9.8.dylib
0x7fffa921b000 - 0x7fffa9253ff3 libssl.0.9.8.dylib (64.50.6) <9A9C9D9A-7948-3412-ABE4-01FCC4A72CD2> /usr/lib/libssl.0.9.8.dylib
0x7fffa9626000 - 0x7fffa9648ff7 libsystem_kernel.dylib (3789.60.24) <6E9E485F-91F6-36B7-A125-AE91DC978BCC> /usr/lib/system/libsystem_kernel.dylib
0x7fffa9691000 - 0x7fffa96afff7 libsystem_malloc.dylib (116.50.8) <A3D15F17-99A6-3367-8C7E-4280E8619C95> /usr/lib/system/libsystem_malloc.dylib
0x7fffa971e000 - 0x7fffa9726fe7 libsystem_platform.dylib (126.50.8) <897462FD-B318-321B-A554-E61982630F7E> /usr/lib/system/libsystem_platform.dylib
0x7fffa9727000 - 0x7fffa9731ff7 libsystem_pthread.dylib (218.60.3) <B8FB5E20-3295-39E2-B5EB-B464D1D4B104> /usr/lib/system/libsystem_pthread.dylib
[/SPOILER]
Amazon Drive client
I was disappointed I apparently could not do something similar to what I did for the Google Drive client to get it to make my Documents
folder its root. The Amazon Drive client can select an arbitrary location to store the synced folder, which is more user friendly than the Google Drive client’s assumption you want a Google Drive
folder directly inside your user folder. But, no matter the folder you select in the Amazon client, it creates a new subfolder called Amazon Drive
. So it seems if I am to use the unlimited photo storage option, I will need to move my photos folder inside that Amazon Drive
folder.
I have not yet dealt with that, as I want to get my documents strategy straightened out first, but I suspect I will be able to maintain my current directory tree (and my Adobe Lightroom setup) by either making /Volumes/Striped/Pictures
a hard link to /Volumes/Striped/Amazon Drive/Pictures
or more likely by making the former a mount point of the latter.
Strategies for folder structure:
-
Soft link to master folder within synced folder
Google Drive client will not copy a folder that is a symlink, so you cannot back up an external folder by creating a symlink in the Google Drive folder. One workaround is doing the opposite. (I have not checked Amazon Drive’s handling of symlinks.) * Mount master folder within synced folder to external mount point
User ultra points out:
Another workaround is mounting your directory to the Google Drive folder. You can use the
bind
option in/etc/fstab
/your/directory /home/user/Google\040Drive/directory none bind 0 0
(
\040
is the space character in fstab) * Hard linked foldersA hard linked folder is a pointer to the same disk location as another folder. They appear to be two separate folders, but the contents are in fact automatically synchronized since they are physically the same folders. Hard links are difficult to make on Macs since Apple crippled the
ln
utility. It can apparently be done (hardlink-osx), but there are problems with deleting files and more.
Once I figure out my documents, I intend to try the folder mounting method to deal with my photos, and then I will see about how Amazon Drive counts video files and decide what to do with the rest.
All the other data
For the remaining 13+ terabytes of video and backups, I intend to keep syncing to my Drobo and to at least one offsite backup that I will handle myself instead of in the cloud. I am thinking I’ll clone my hard drives and send them to my mom to put in a NAS attached to her router. Then I can simply rsync
to it or perhaps use Arq and do those backups in an encrypted fashion (though if I do that, I should first do it locally on my network of course).
Life is wonderful!