4 KiB
gallery-dl-archive-manager
Scripts to manage a (currently twitter only) archive using gallery-dl. Much of the code came from a need to augment pre-existing, outdated archives that were originally created from the twittermediadownloader
browser extension.
Config
This repo uses its own config.json
in order to save media in the same format as twittermediadownloader
. The scripts depend on the media being saved in this format.
Scripts
node run-downloadDb.js
Runs a full download of all users listed in the db.json of the archive (the provided --path
). If db.json is not present, one will be created. If any user ends on skipped media during the /media
check, the /search
check will be skipped.
Args:
--site={"twitter"|"bluesky"}
--path={/path/to/your/archive}
--threads={#}
--args={gallery-dl args}
--usersPerBatch={#}
--waitTime={#}
--skipMediaAfter={#}
--skipSearchAfter={#}
Example:
node run-downloadDb.js --path=/mnt/data/archive --threads=3 --args="-r 2.5M --no-skip"
will run a full download (/media
followed by/search
starting from the oldest pulled file from/media
) of all the users in the/mnt/data/archive/db.json
file, limiting concurrent download threads to 3. It will pass the additional args-r 2.5M --no-skip
to the gallery-dl bin being executed;-r 2.5M --no-skip
corresponds to limiting the download rate to 2.5M and downloading all files without skipping (for the sake of example).
Adding --usersPerBatch={#}
and --waitTime={#}
together will activate a batching mechanism which will split the userList in the db.json in chunks of the specified usersPerBatch
and then wait waitTime
amount of seconds between each batch in order to throttle any downloads. Without this, 100+ users in a short amount of time could introduce problems, whereas for example ~30 users with ~5 minutes between each batch tends to avoid problems.
run-downloadUsers.js
Adds new user(s) to the db and initiate a full download similar to run-downloadDb.js
. If db.json is not present, one will be created. If any user ends on skipped media during the /media
check, the /search
check will be skipped.
Args:
--users={comma,separated,userlist}
--site={"twitter"|"bluesky"}
--path={/path/to/your/archive}
--threads={#}
--args={gallery-dl args}
--skipMediaAfter={#}
--skipSearchAfter={#}
run-convertDb.js
Converts db.json
to the latest version. See ./lib/schema.js
for full db.json schema.
Args:
--path={/path/to/your/archive}
Historical Versions:
- v0: simple array of users with
user
,lastUpdated
,lastError
fields - v1 (CURRENT): object with
version
anduserList
fields,userList
containing key-value entries where the key is the username, the value is an informational object regarding that username.
Args
Standard args:
--path={/path/to/your/archive}
The path to the archive. This is a parent directory with a list of child directories which correspond to users.
--threads={#}
Max number of concurrent download threads. Only this number of concurrent gallery-dl download threads will run at a given time, other remaining users will be queued.
--args={gallery-dl args}
Additional args to pass to gallery-dl. See gallery-dl CLI options for reference. Note that these aren't currently checked for duplicates that may be used by this repo.
--skipMediaAfter={#}
Appends -A #
to the args of gallery-dl during the /media
round, which stops the download early after # amount of skipped media.
--skipSearchAfter={#}
Appends -A #
to the args of gallery-dl during the /search
round, which stops the download early after # amount of skipped media.
TODO
run-renameUser.js
Should rename an existing user in the db, optionally renaming their existing archive and its contents if --full=true
.
Args:
--from={'username'}
--to={'username'}
--full={true|false}
--path={/path/to/your/archive}
--args={gallery-dl args}