bin/backup script: sensible defaults around bin/repozo
Project description
Easy Zope backup/restore recipe for buildout
Introduction
This recipe is mostly a wrapper around the bin/repozo script in your Zope buildout. It requires that this script is already made available. If this is not the case, you will get an error like this when you run one of the scripts: bin/repozo: No such file or directory. You should be fine when you are on Plone 3 or when you are on Plone 4 and are using plone.recipe.zeoserver. If this is not the case, the easiest way of getting a bin/repozo script is to add a new section in your buildout.cfg (do not forget to add it in the parts directive):
[repozo] recipe = zc.recipe.egg eggs = ZODB3 scripts = repozo
bin/repozo is a Zope script to make backups of your Data.fs. Looking up the settings can be a chore. And you have to pick a directory where to put the backups. This recipe provides sensible defaults for your common backup tasks. Making backups a piece of cake is important!
bin/backup makes an incremental backup.
bin/fullbackup always makes a full backup.
bin/restore restores the latest backup.
bin/snapshotbackup makes a full backup, separate from the regular backups. Handy for copying the current production database to your laptop or right before a big change in the site.
Development
Code repository: https://github.com/collective/collective.recipe.backup
Small fixes are fine on master, for larger changes or if you are unsure, please create a branch or a pull request.
The code comes with a buildout.cfg. Please bootstrap the buildout and run the created bin/test to see if the tests still pass. Please try to add tests if you add code.
The long description of this package (as shown on PyPI), used to contain a big file with lots of test code that showed how to use the recipe. This grew too large, so we left it out. It is probably still good reading if you are wondering about the effect some options have. See src/collective/recipe/backup/README.txt.
Questions and comments to the Plone product-developers list or to mailto:maurits@vanrees.org and mailto:reinout@vanrees.org.
Example usage
The simplest way to use this recipe is to add a part in buildout.cfg like this:
[buildout] parts = backup [backup] recipe = collective.recipe.backup
You can set lots of extra options, but the recipe authors like to think they have created sane defaults, so this single line stating the recipe name should be enough in most cases.
Running the buildout adds the backup, fullbackup, snapshotbackup, restore and snapshotrestore scripts to the bin/ directory of the buildout and, by default, it creates the var/backups and var/snapshotbackups directories in that same buildout.
Backed up data
Which data does this recipe backup?
The Zope Object DataBase (ZODB) filestorage, by default located at var/filestorage/Data.fs.
Possibly additional filestorages, see the additional_filestorages option.
The blobstorage (since version 2.0) if your buildout uses it, by default located at var/blobstorage.
Data that is not backed up
Which data does this recipe not backup? Everything else of course, but specifically:
Data stored in RelStorage will not be backed up. (You could still use this recipe to back up the filesystem blobstorage, possibly with the only_blobs option.)
Other data stored in SQL, perhaps via SQLAlchemy, will not be backed up.
It does not create a backup of your entire buildout directory.
Is your backup backed up?
Note that the backups are by default created in the var directory of the buildout, so if you accidentally remove the entire buildout, you also lose your backups. It should be standard practice to use the location option to specify a backup location in for example the home directory of the user. You should also arrange to copy that backup to a different machine/country/continent/planet.
Backup
Calling bin/backup results in a normal incremental repozo backup that creates a backup of the Data.fs in var/backups. When you have a blob storage it is by default backed up to var/blobstoragebackups.
Calling bin/fullbackup results in a normal FULL repozo backup that creates a backup of the Data.fs in var/backups. When you have a blob storage it is by default backed up to var/blobstoragebackups. This script is provided so that you can set different cron jobs for full and incremental backups. You may want to have incrementals done daily, with full backups done weekly. Now you can!
You should normally do a bin/zeopack regularly, say once a week, to remove unused objects from your Zope Data.fs. The next time bin/backup is called, a complete fresh backup is made, because an incremental backup is not possible anymore. This is standard bin/repozo behaviour.
Snapshots
For quickly grabbing the current state of a production database so you can download it to your development laptop, you want a full backup. But you shouldn’t interfere with the regular backup regime. Likewise, a quick backup just before updating the production server is a good idea. For that, the bin/snapshotbackup is great. It places a full backup in, by default, var/snapshotbackups.
Restore
Calling bin/restore restores the very latest normal incremental repozo backup and restores the blobstorage if you have that.
You can restore the very latest snapshotbackup with bin/snapshotrestore.
You can also restore the backup as of a certain date. Just pass a date argument. According to repozo: specify UTC (not local) time. The format is yyyy-mm-dd[-hh[-mm[-ss]]]. So as a simple example:
bin/restore 1972-12-25
Since version 2.3 this also works for restoring blobs. We simply restore the directory from the first backup after the specified date.
Since version 2.0, the restore scripts ask for confirmation before starting the restore, as this is a potentially dangerous command. (“Oops, I have restored the live site but I meant to restore the test site.”) You need to explicitly type ‘yes’:
This will replace the filestorage (Data.fs). This will replace the blobstorage. Are you sure? (yes/No)?
Names of created scripts
A backup part will normally be called [backup], leading to a bin/backup and bin/snapshotbackup. Should you name your part something else, the script names will also be different, as will the created var/ directories (since version 1.2):
[buildout] parts = plonebackup [plonebackup] recipe = collective.recipe.backup
That buildout snippet will create these directories:
var/plonebackups var/plonebackup-snapshots
and these scripts:
bin/plonebackup bin/plonebackup-full bin/plonebackup-snapshot bin/plonebackup-restore bin/plonebackup-snapshotrestore
Supported options
The recipe supports the following options, none of which are needed by default. The most common ones to change are location and blobbackuplocation, as those allow you to place your backups in some system-wide directory like /var/zopebackups/instancename/ and /var/zopebackups/instancename-blobs/.
- location
Location where backups are stored. Defaults to var/backups inside the buildout directory.
- blobbackuplocation
Directory where the blob storage will be backed up to. Defaults to var/blobstoragebackups inside the buildout directory.
- keep
Number of full backups to keep. Defaults to 2, which means that the current and the previous full backup are kept. Older backups are removed, including their incremental backups. Set it to 0 to keep all backups.
- keep_blob_days
Number of days of blob backups to keep. Defaults to 14, so two weeks. This is only used for partial (full=False) backups, so this is what gets used normally when you do a bin/backup. This option has been added in 2.2. For full backups (snapshots) we just use the keep option. Recommended is to keep these values in sync with how often you do a zeopack on the Data.fs, according to the formula keep * days_between_zeopacks = keep_blob_days. The default matches one zeopack per seven days (2*7=14).
- datafs
In case the Data.fs isn’t in the default var/filestorage/Data.fs location, this option can overwrite it.
- full
By default, incremental backups are made. If this option is set to true, bin/backup will always make a full backup. This option is (obviously) the default when using the fullbackup script.
- debug
In rare cases when you want to know exactly what’s going on, set debug to true to get debug level logging of the recipe itself. repozo is also run with --verbose if this option is enabled.
- snapshotlocation
Location where snapshot backups of the filestorage are stored. Defaults to var/snapshotbackups inside the buildout directory.
- gzip
Use repozo’s zipping functionality. true by default. Set it to false and repozo will not gzip its files. Note that gzipped databases are called *.fsz, not *.fs.gz. Changed in 0.8: the default used to be false, but it so totally makes sense to gzip your backups that we changed the default.
- additional_filestorages
Advanced option, only needed when you have split for instance a catalog.fs out of the regular Data.fs. Use it to specify the extra filestorages. (See explanation further on).
- enable_snapshotrestore
Having a snapshotrestore script is very useful in development environments, but can be harmful in a production buildout. The script restores the latest snapshot directly to your filestorage and it used to do this without asking any questions whatsoever (this has been changed to require an explicit yes as answer). If you don’t want a snapshotrestore script, set this option to false.
- blob_storage
Location of the directory where the blobs (binary large objects) are stored. This is used in Plone 4 and higher, or on Plone 3 if you use plone.app.blob. This option is ignored if backup_blobs is false. The location is not set by default. When there is a part using plone.recipe.zeoserver, plone.recipe.zope2instance or plone.recipe.zope2zeoserver, we check if that has a blob-storage option and use that as default. Note that we pick the first one that has this option and we do not care about shared-blob settings, so there are probably corner cases where we do not make the best decision here. Use this option to override it in that case.
- blob-storage
Alternative spelling for the preferred blob_storage, as plone.recipe.zope2instance spells it as blob-storage and we are using underscores in all the other options. Pick one.
- backup_blobs
Backup the blob storage. This requires the blob_storage location to be set. If no blob_storage location has been set and we cannot find one by looking in the other buildout parts, we default to False, otherwise to True.
- blobsnapshotlocation
Directory where the blob storage snapshots will be created. Defaults to var/blobstoragesnapshots inside the buildout directory.
- only_blobs
Only backup the blobstorage, not the Data.fs filestorage. False by default. May be a useful option if for example you want to create one bin/filestoragebackup script and one bin/blobstoragebackup script, using only_blobs in one and backup_blobs in the other.
- use_rsync
Use rsync with hard links for backing up the blobs. Default is true. rsync is probably not available on all machines though, and I guess hard links will not work on Windows. When you set this to false, we fall back to a simple copy (shutil.copytree from Python in fact).
- gzip_blob
Use tar archiving functionality. false by default. Set it to true and backup/restore will be done with tar command. Note that tar commmand must be available on machine if this option is set to true. This option also works with snapshot backup/restore commands. As this counts as a full backup keep_blob_days is ignored.
- pre_command
Command to execute before starting the backup. One use case would be to mount a remote file system using NFS or sshfs and put the backup there. Any output will be printed. If you do not like that, you can always redirect output somewhere else (mycommand > /dev/null on Unix). Refer to your local Unix guru for more information. If the command fails, the backup script quits with an error. You can specify multiple commands.
- post_command
Command to execute after the backup has finished. One use case would be to unmount the remote file system that you mounted earlier using the pre_command. See that pre_command above for more info.
An example buildout snippet using most options, except the blob options, would look like this:
[backup] recipe = collective.recipe.backup location = ${buildout:directory}/myproject keep = 2 datafs = subfolder/myproject.fs full = true debug = true snapshotlocation = snap/my gzip = false enable_snapshotrestore = true pre_command = echo 'Can I have a backup?' post_command = echo 'Thanks a lot for the backup.' echo 'We are done.'
Paths in directories or files can use relative (../) paths, and ~ (home dir) and $BACKUP-style environment variables are expanded.
Cron job integration
bin/backup is of course ideal to put in your cronjob instead of a whole bin/repozo .... line. But you don’t want the “INFO” level logging that you get, as you’ll get that in your mailbox. In your cronjob, just add -q or --quiet and bin/backup will shut up unless there’s a problem.
Speaking of cron jobs? Take a look at zc.recipe.usercrontab if you want to handle cronjobs from within your buildout. For example:
[backupcronjob] recipe = z3c.recipe.usercrontab times = 0 12 * * * command = ${buildout:directory}/bin/backup
Advanced usage: multiple Data.fs files
Sometimes, a filestorage is split into several files. Most common reason is to have a regular Data.fs and a catalog.fs which contains the portal_catalog. This is supported with the additional_filestorages option:
[backup] recipe = collective.recipe.backup additional_filestorages = catalog another
This means that, with the standard Data.fs, the bin/backup script will now backup three filestorages:
var/filestorage/Data.fs var/filestorage/catalog.fs var/filestorage/another.fs
The additional backups have to be stored separate from the Data.fs backup. That’s done by appending the file’s name and creating extra backup directories named that way:
var/backups_catalog var/snapshotbackups_catalog var/backups_another var/snapshotbackups_another
The various backups are done one after the other. They cannot be done at the same time with repozo. So they are not completely in sync. The “other” databases are backed up first as a small difference in the catalog is just mildly irritating, but the other way around users can get real errors.
If you want more control of the filestorage source path, you can explicitly set it (with or without the blobstorage path). For example:
[backup] recipe = collective.recipe.backup additional_filestorages = foo ${buildout:directory}/var/filestorage/foo/foo.fs ${buildout:directory}/var/blobstorage-foo bar ${buildout:directory}/var/filestorage/bar/bar.fs
In the additional_filestorages option you can define different filestorage using this syntax:
additional_filestorages = storagename1 [datafs1_path [blobdir1]] storagename2 [datafs2_path [blobdir2]]
If the datafs_path is missing, then the default value will be used (var/filestorage/storagename1.fs). If you do not specify a blobdir, then this means no blobs will be backed up for that storage. Note that if you specify blobdir you must specify datafs_path as well.
Note that collective.recipe.filestorage creates additional filestorages in a slightly different location, but you can explictly define the paths of filestorage and blobstorage for all the parts defined in the recipe. Work is in progress to improve this.
Blob storage
New in this recipe (since version 2.0) is that we backup the blob storage. Plone 4 uses a blob storage to store files (Binary Large OBjects) on the file system. In Plone 3 this is optional. When this is used, it should be backed up of course. You must specify the source blob_storage directory where Plone (or Zope) stores its blobs. As indicated earlier, when we do not set it specifically, we try to get the location from other parts, for example the plone.recipe.zope2instance recipe:
[buildout] parts = instance backup [instance] recipe = plone.recipe.zope2instance user = admin:admin blob-storage = ${buildout:directory}/var/somewhere [backup] recipe = collective.recipe.backup
If needed, we can tell buildout that we only want to backup blobs or specifically do not want to backup the blobs. Specifying this using the backup_blobs and only_blobs options might be useful in case you want to separate this into several scripts:
[buildout] newest = false parts = filebackup blobbackup [filebackup] recipe = collective.recipe.backup backup_blobs = false [blobbackup] recipe = collective.recipe.backup blob_storage = ${buildout:directory}/var/blobstorage only_blobs = true
With this setup bin/filebackup now only backs up the filestorage and bin/blobbackup only backs up the blobstorage.
rsync
By default we use rsync to create backups. We create hard links with this tool, to save disk space and still have incremental backups. This probably requires a unixy (Linux, Mac OS X) operating system. It is based on this article by Mike Rubel: http://www.mikerubel.org/computers/rsync_snapshots/
We have not tried this on Windows. Reports are welcome, but best is probably to set the use_rsync = false option in the backup part. Then we simply copy the blobstorage directory.
Contributors
collective.recipe.backup is basically a port of ye olde instancemanager’s backup functionality. That backup functionality was coded mostly by Reinout van Rees and Maurits van Rees, both from Zest software
Creating the buildout recipe was done by Reinout with some fixes by Maurits.
The snapshotrestore script was added by Nejc Zupan (niteoweb).
The fullbackup script was added by Tom ‘Spanky’ Kapanka.
Archive blob backups feature added by Matej Cotman (niteoweb).
Change history
2.14 (2013-09-09)
Archive blob backups with buildout option gzip_blob. [matejc]
2.13 (2013-07-15)
When printing that we halt the execution due to an error running repozo, actually halt the execution. [maurits]
2.12 (2013-06-28)
Backup directories are now created when we launch backup or fullbackup or snapshotbackup scripts, no more during initialization. [bsuttor]
2.11 (2013-05-06)
Print the names of filestorages and blobstorages that will be restored. Issue #8. [maurits]
Added a new command-line argument : --no-prompt disables user input when restoring a backup or snapshot. Useful for shell scripts. [bouchardsyl]
Fixed command-line behavior with many arguments and not only a date. [bouchardsyl]
2.10 (2013-03-30)
Added fullbackup script that defaults to full=true. This could have been handled by making a new part, but it seemed like overkill to have to generate a complete new set of backup scripts, just to get one for full. [spanky]
2.9 (2013-03-06)
Fixed possible KeyError: blob_snapshot_location. [gforcada]
2.8 (2012-11-13)
Fixed possible KeyError: blob_backup_location. https://github.com/collective/collective.recipe.backup/issues/3 [maurits]
2.7 (2012-09-27)
additional_filestorages improved: blob support and custom location. [mamico]
2.6 (2012-08-29)
Added pre_command and post_command options. See the documentation. [maurits]
2.5 (2012-08-08)
Moved code to github: https://github.com/collective/collective.recipe.backup [maurits]
2.4 (2011-12-20)
Fixed silly indentation error that prevented old blob backups from being deleted when older than keep_blob_days days. [maurits]
2.3 (2011-10-05)
Quit the rest of the backup or restore when a repozo call gives an error. Main use case: when restoring to a specific date repozo will quit with an error when no files can be found, so we should also not try to restore blobs then. [maurits]
Allow restoring the blobs to the specified date as well. [maurits]
2.2 (2011-09-14)
Refactored script generation to make a split between initialization code and script arguments. This restores compatibility with zc.buildout 1.5 for system pythons. Actually we no longer create so called ‘site package safe scripts’ but just normal scripts that work for all zc.buildout versions. [maurits]
Added option keep_blob_days, which by default specifies that only for partial backups we keep 14 days of backups. See the documentation. [maurits]
Remove old blob backups when doing a snapshot backup. [maurits]
2.1 (2011-09-01)
Raise an error when the four backup location options (blobbackuplocation, blobsnapshotlocation, location and snapshotlocation) are not four distinct locations (or empty strings). [maurits]
Fixed possible TypeError: ‘Option values must be strings’. Found by Alex Clark, thanks. [maurits]
2.0 (2011-08-26)
Backup and restore blobs, using rsync. [maurits]
Ask if the user is sure before doing a restore. [maurits]
1.7 (2010-12-10)
Fix generated repozo commands to work also when recipe is configured to have a non Data.fs main db plus additional filestorages. e.g.: datafs= var/filestorage/main.fs additional = catalog [hplocher]
1.6 (2010-09-21)
Added the option enable_snapshotrestore so that the creation of the script can be removed. Backwards compatible, if you don’t specify it the script will still be created. Rationale: you may not want this script in a production buildout where mistakenly using snapshotrestore instead of snapshotbackup could hurt. [fredvd]
1.5 (2010-09-08)
Fix: when running buildout with a config in a separate directory (like bin/buildout -c conf/prod.cfg) the default backup directories are no longer created inside that separate directory. If you previously manually specified one of the location, snapshotlocation, or datafs parameters to work around this, you can probably remove those lines. So: slightly saner defaults. [maurits]
1.4 (2010-08-06)
Added documentation about how to get the required bin/repozo script in your buildout if for some reason you do not have it yet (like on Plone 4 when you do not have a zeo setup). Thanks to Vincent Fretin for the extra buildout lines. [maurits]
1.3 (2009-12-08)
Added snapshotrestore script. [Nejc Zupan]
1.2 (2009-10-26)
The part name is now reflected in the created scripts and var/ directories. Originally bin/backup, bin/snapshotbackup, bin/restore and var/backups plus var/snapshotbackups were hardcoded. Those are still there when you name your part [backup]. With a part named [NAME], you get bin/NAME, bin/NAME-snapshot, bin/NAME-restore and var/NAMEs plus var/NAME-snapshots. Request by aclark for plone.org. [reinout]
1.1 (2009-08-21)
Run the cleanup script (removing too old backups that we no longer want to keep) for additional file storages as well. Fixes https://bugs.launchpad.net/collective.buildout/+bug/408224 [maurits]
Moved everything into a src/ subdirectory to ease testing on buildbot (which would grab all egss in the eggs/ dir that buildbot’s mechanism creates. [reinout]
1.0 (2009-02-06)
Quote all paths and arguments so that it works on paths that contain spaces (specially on Windows). [sidnei]
0.9 (2008-12-05)
Windows path compatibility fix. [Juan A. Diaz]
0.8 (2008-09-23)
Changed the default for gzipping to True. Adding gzip = true to all our server deployment configs gets tired pretty quickly, so doing it by default is the best default. Stuff like this needs to be changed before a 1.0 release :-) [reinout]
Backup of additional databases (if you have configured them) now takes place before the backup of the main database (same with restore). [reinout]
0.7 (2008-09-19)
Added $BACKUP-style enviroment variable subsitution in addition to the tilde expansion offered by 0.6. [reinout, idea by Fred van Dijk]
0.6 (2008-09-19)
Fixed the test setup so both bin/test and python setup.py test work. [reinout+maurits]
Added support for ~ in path names. And fixed a bug at the same time that would occur if you call the backup script from a different location than your buildout directory in combination with a non-absolute backup location. [reinout]
0.5 (2008-09-18)
Added support for additional_filestorages option, needed for for instance a split-out catalog.fs. [reinout]
Test setup fixes. [reinout+maurits]
0.4 (2008-08-19)
Allowed the user to make the script more quiet (say in a cronjob) by using ‘bin/backup -q’ (or –quiet). [maurits]
Refactored initialization template so it is easier to change. [maurits]
0.3.1 (2008-07-04)
Added ‘gzip’ option, including changes to the cleanup functionality that treats .fsz also as a full backup like .fs. [reinout]
Fixed typo: repoze is now repozo everywhere… [reinout]
0.2 (2008-07-03)
Extra tests and documentation change for ‘keep’: the default is to keep 2 backups instead of all backups. [reinout]
If debug=true, then repozo is also run in –verbose mode. [reinout]
0.1 (2008-07-03)
Added bin/restore. [reinout]
Added snapshot backups. [reinout]
Enabled cleaning up of older backups. [reinout]
First working version that runs repozo and that creates a backup dir if needed. [reinout]
Started project based on zopeskel template. [reinout]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for collective.recipe.backup-2.14.zip
Algorithm | Hash digest | |
---|---|---|
SHA256 | 753212ca39f651079d79580ad1ca4bb54d6cdc9f8b6d25472aad194cc586a809 |
|
MD5 | 83c97a1f17d59cd9b1d63d2054fa12dd |
|
BLAKE2b-256 | e0678604c32a14e6f8c38c0d667846a1ccce17a35b860dad24bf291bdf3c4e3b |