1
0
Fork 0
No description
Find a file
2025-01-21 20:41:45 -05:00
lib begin to support bsky 2025-01-21 20:29:38 -05:00
.editorconfig init 2024-02-09 17:27:28 -05:00
.gitignore init 2024-02-09 17:27:28 -05:00
config.json fix negative running single user db save fail 2024-02-15 23:34:43 -05:00
LICENSE.md docs 2024-02-09 23:03:22 -05:00
package-lock.json init 2024-02-09 17:27:28 -05:00
package.json init 2024-02-09 17:27:28 -05:00
README.md fix missing site param from downloadUsers 2025-01-21 20:41:45 -05:00
run-convertDb.js v0 -> v1 db, support adding multiple users 2024-02-18 23:50:08 -05:00
run-dedupeIds.js use UTC for timestamps and provide deduplicator 2024-02-12 18:45:34 -05:00
run-downloadDb.js download bsky support 2025-01-21 20:30:04 -05:00
run-downloadUsers.js fix missing site param from downloadUsers 2025-01-21 20:41:45 -05:00

gallery-dl-archive-manager

Scripts to manage a (currently twitter only) archive using gallery-dl. Much of the code came from a need to augment pre-existing, outdated archives that were originally created from the twittermediadownloader browser extension.

Config

This repo uses its own config.json in order to save media in the same format as twittermediadownloader. The scripts depend on the media being saved in this format.

Scripts

node run-downloadDb.js

Runs a full download of all users listed in the db.json of the archive (the provided --path). If db.json is not present, one will be created. If any user ends on skipped media during the /media check, the /search check will be skipped.

Args:

  • --site={"twitter"|"bluesky"}
  • --path={/path/to/your/archive}
  • --threads={#}
  • --args={gallery-dl args}
  • --usersPerBatch={#}
  • --waitTime={#}
  • --skipMediaAfter={#}
  • --skipSearchAfter={#}

Example:

  • node run-downloadDb.js --path=/mnt/data/archive --threads=3 --args="-r 2.5M --no-skip" will run a full download (/media followed by /search starting from the oldest pulled file from /media) of all the users in the /mnt/data/archive/db.json file, limiting concurrent download threads to 3. It will pass the additional args -r 2.5M --no-skip to the gallery-dl bin being executed; -r 2.5M --no-skip corresponds to limiting the download rate to 2.5M and downloading all files without skipping (for the sake of example).

Adding --usersPerBatch={#} and --waitTime={#} together will activate a batching mechanism which will split the userList in the db.json in chunks of the specified usersPerBatch and then wait waitTime amount of seconds between each batch in order to throttle any downloads. Without this, 100+ users in a short amount of time could introduce problems, whereas for example ~30 users with ~5 minutes between each batch tends to avoid problems.

run-downloadUsers.js

Adds new user(s) to the db and initiate a full download similar to run-downloadDb.js. If db.json is not present, one will be created. If any user ends on skipped media during the /media check, the /search check will be skipped.

Args:

  • --users={comma,separated,userlist}
  • --site={"twitter"|"bluesky"}
  • --path={/path/to/your/archive}
  • --threads={#}
  • --args={gallery-dl args}
  • --skipMediaAfter={#}
  • --skipSearchAfter={#}

run-convertDb.js

Converts db.json to the latest version. See ./lib/schema.js for full db.json schema.

Args:

  • --path={/path/to/your/archive}

Historical Versions:

  • v0: simple array of users with user, lastUpdated, lastError fields
  • v1 (CURRENT): object with version and userList fields, userList containing key-value entries where the key is the username, the value is an informational object regarding that username.

Args

Standard args:

--path={/path/to/your/archive}

The path to the archive. This is a parent directory with a list of child directories which correspond to users.

--threads={#}

Max number of concurrent download threads. Only this number of concurrent gallery-dl download threads will run at a given time, other remaining users will be queued.

Additional args to pass to gallery-dl. See gallery-dl CLI options for reference. Note that these aren't currently checked for duplicates that may be used by this repo.

--skipMediaAfter={#}

Appends -A # to the args of gallery-dl during the /media round, which stops the download early after # amount of skipped media.

--skipSearchAfter={#}

Appends -A # to the args of gallery-dl during the /search round, which stops the download early after # amount of skipped media.

TODO

run-renameUser.js

Should rename an existing user in the db, optionally renaming their existing archive and its contents if --full=true.

Args:

  • --from={'username'}
  • --to={'username'}
  • --full={true|false}
  • --path={/path/to/your/archive}
  • --args={gallery-dl args}