pad.libre-service.eu-etherpad/src/bin/dirty-db-cleaner.py
John McLear 2ea8ea1275 restructure: move bin/ and tests/ to src/
Also add symlinks from the old `bin/` and `tests/` locations to avoid
breaking scripts and other tools.

Motivations:

  * Scripts and tests no longer have to do dubious things like:

        require('ep_etherpad-lite/node_modules/foo')

    to access packages installed as dependencies in
    `src/package.json`.

  * Plugins can access the backend test helper library in a non-hacky
    way:

        require('ep_etherpad-lite/tests/backend/common')

  * We can delete the top-level `package.json` without breaking our
    ability to lint the files in `bin/` and `tests/`.

    Deleting the top-level `package.json` has downsides: It will cause
    `npm` to print warnings whenever plugins are installed, npm will
    no longer be able to enforce a plugin's peer dependency on
    ep_etherpad-lite, and npm will keep deleting the
    `node_modules/ep_etherpad-lite` symlink that points to `../src`.

    But there are significant upsides to deleting the top-level
    `package.json`: It will drastically speed up plugin installation
    because `npm` doesn't have to recursively walk the dependencies in
    `src/package.json`. Also, deleting the top-level `package.json`
    avoids npm's horrible dependency hoisting behavior (where it moves
    stuff from `src/node_modules/` to the top-level `node_modules/`
    directory). Dependency hoisting causes numerous mysterious
    problems such as silent failures in `npm outdated` and `npm
    update`. Dependency hoisting also breaks plugins that do:

        require('ep_etherpad-lite/node_modules/foo')
2021-02-04 17:15:08 -05:00

48 lines
1.5 KiB
Python
Executable file

#!/usr/bin/env PYTHONUNBUFFERED=1 python
#
# Created by Bjarni R. Einarsson, placed in the public domain. Go wild!
#
import json
import os
import sys
try:
dirtydb_input = sys.argv[1]
dirtydb_output = '%s.new' % dirtydb_input
assert(os.path.exists(dirtydb_input))
assert(not os.path.exists(dirtydb_output))
except:
print()
print('Usage: %s /path/to/dirty.db' % sys.argv[0])
print()
print('Note: Will create a file named dirty.db.new in the same folder,')
print(' please make sure permissions are OK and a file by that')
print(' name does not exist already. This script works by omitting')
print(' duplicate lines from the dirty.db file, keeping only the')
print(' last (latest) instance. No revision data should be lost,')
print(' but be careful, make backups. If it breaks you get to keep')
print(' both pieces!')
print()
sys.exit(1)
dirtydb = {}
lines = 0
with open(dirtydb_input, 'r') as fd:
print('Reading %s' % dirtydb_input)
for line in fd:
lines += 1
try:
data = json.loads(line)
dirtydb[data['key']] = line
except:
print("Skipping invalid JSON!")
if lines % 10000 == 0:
sys.stderr.write('.')
print()
print('OK, found %d unique keys in %d lines' % (len(dirtydb), lines))
with open(dirtydb_output, 'w') as fd:
for data in list(dirtydb.values()):
fd.write(data)
print('Wrote data to %s. All done!' % dirtydb_output)