1
0
Fork 0
No description
Find a file
2024-02-19 12:32:30 -05:00
lib log typo 2024-02-19 00:05:45 -05:00
.editorconfig init 2024-02-09 17:27:28 -05:00
.gitignore init 2024-02-09 17:27:28 -05:00
config.json fix negative running single user db save fail 2024-02-15 23:34:43 -05:00
LICENSE.md docs 2024-02-09 23:03:22 -05:00
package-lock.json init 2024-02-09 17:27:28 -05:00
package.json init 2024-02-09 17:27:28 -05:00
README.md Update README.md 2024-02-19 12:32:30 -05:00
run-convertDb.js v0 -> v1 db, support adding multiple users 2024-02-18 23:50:08 -05:00
run-dedupeIds.js use UTC for timestamps and provide deduplicator 2024-02-12 18:45:34 -05:00
run-downloadDb.js v0 -> v1 db, support adding multiple users 2024-02-18 23:50:08 -05:00
run-downloadUsers.js v0 -> v1 db, support adding multiple users 2024-02-18 23:50:08 -05:00
run-initDb.js v0 -> v1 db, support adding multiple users 2024-02-18 23:50:08 -05:00

gallery-dl-archive-manager

Scripts to manage a (currently twitter only) archive using gallery-dl. Much of the code came from a need to augment pre-existing, outdated archives that were originally created from the twittermediadownloader browser extension.

Config

This repo uses its own config.json in order to save media in the same format as twittermediadownloader. The scripts depend on the media being saved in this format.

Scripts

node run-initDb.js

Initializes a user database from existing folders. Useful if you have a pre-existing archive of users.

Args:

  • --path={/path/to/your/archive}

Example:

  • node run-initDb.js --path=/mnt/data/archive will read all child directories in /mnt/data/archive (e.g. /mnt/data/archive/userA, /mnt/data/archive/userB, etc.) and create a db.json file in /mnt/data/archive listing the users.

node run-downloadDb.js

Runs a full download of all users listed in the db.json of the archive (the provided --path).

Args:

  • --path={/path/to/your/archive}
  • --threads={#}
  • --args={gallery-dl args}

Example:

  • node run-downloadDb.js --path=/mnt/data/archive --threads=3 --args="-r 2.5M" will run a full download (/media followed by /search starting from the oldest pulled file from /media) of all the users in the /mnt/data/archive/db.json file, limiting concurrent download threads to 3. It will pass the additional args -r 2.5M --no-skip to the gallery-dl bin being executed; -r 2.5M --no-skip corresponds to limiting the download rate to 2.5M and downloading all files without skipping (for the sake of example).

run-downloadUsers.js

Should add a new user to the db and initiate a full download similar to run-downloadDb.js

Args:

  • --path={/path/to/your/archive}
  • --users={comma,separated,userlist}
  • --threads={#}

run-convertDb.js

Converts db.json to the latest version. See ./lib/schema.js for full db.json schema.

Args:

  • --path={/path/to/your/archive}

Historical Versions:

  • v0: simple array of users with user, lastUpdated, lastError fields
  • v1 (CURRENT): object with version and userList fields, userList containing key-value entries where the key is the username, the value is an informational object regarding that username.

Args

Standard args:

--path={/path/to/your/archive}

The path to the archive. This is a parent directory with a list of child directories which correspond to users.

--threads={#}

Max number of concurrent download threads. Only this number of concurrent gallery-dl download threads will run at a given time, other remaining users will be queued.

Additional args to pass to gallery-dl. See gallery-dl CLI options for reference. Note that these aren't currently checked for duplicates that may be used by this repo.

TODO

run-updateDb.js

Should pull from the user database and update the archive without doing a full download. The DB should save with a lastUpdated field. This should be used as a date for the /search API. Preferred if it's been a long time since an update has happened for a user and/or the user has uploaded a significant amount of media since lastUpdated. Note: if you've updated the DB recently, it may be more performant to run node run-downloadDb.js with --args="-A {#}" to simply run the /media check instead of the /search check, where -A will abort the user after {#} tries.

Args:

  • --path={/path/to/your/archive}
  • --threads={#}
  • --args={gallery-dl args}

run-renameUser.js

Should rename an existing user in the db, optionally renaming their existing archive and its contents if --full=true.

Args:

  • --from={'username'}
  • --to={'username'}
  • --full={true|false}
  • --path={/path/to/your/archive}
  • --args={gallery-dl args}

Rename Detection

Should detect renames when running /search. Occasionally /media will fail due to the rename, but /search will return results, causing a full download from /search and adding the user to the db without notice. This should stop the download and print out an error that they've been updated from username to username. Because the command will have already finished /media and thus be halfway through the process, this should be done manually after the command has finished.