2020-11-19 21:09:14 +00:00
|
|
|
# YaCy Maintenance Scripts
|
2020-11-19 21:00:22 +00:00
|
|
|
|
|
|
|
Scripts for easily maintaining your YaCy instance
|
|
|
|
|
|
|
|
|
|
|
|
## Requirements
|
|
|
|
|
|
|
|
- Using YaCy on docker
|
|
|
|
- Using "yacynet" as docker container name
|
|
|
|
- Default Solr collection must be called "collection1"
|
|
|
|
|
2020-11-19 21:09:14 +00:00
|
|
|
This requirements are basic to **make this scripts work right out of the box**.
|
|
|
|
You can make **any modifications to the scripts** to suit your needs (in case you don't use docker or have different parameters)
|
|
|
|
|
|
|
|
## Installation
|
|
|
|
|
|
|
|
Just **clone the repo somewhere** on your server
|
|
|
|
|
|
|
|
## Usage
|
|
|
|
|
2020-11-19 21:14:10 +00:00
|
|
|
You must **mount the directory of this project** onto your docker container on `/scripts` directory.
|
|
|
|
|
|
|
|
You can also **locate it wherever you want** and change the wrapper scripts to **suit your needs**.
|
|
|
|
|
|
|
|
|
2020-11-19 21:15:32 +00:00
|
|
|
|
2020-11-19 21:09:14 +00:00
|
|
|
Execute the wrapper scripts located at `x/`, depending on the **action you want to do**.
|
2020-11-19 21:14:10 +00:00
|
|
|
|
2020-11-19 21:09:14 +00:00
|
|
|
This scripts can also be **called into a cron job** directly
|
2020-11-19 21:00:22 +00:00
|
|
|
|
2020-11-19 21:31:05 +00:00
|
|
|
# Usage Examples!
|
|
|
|
|
|
|
|
```bash
|
|
|
|
./x/blist-add-alldomain google.com # block entire google.com domain
|
|
|
|
./x/blist-add-domainonly www.google.es # block only the given domain/subdomain
|
|
|
|
./x/clear-rejected-urls # clear all rejected URLS
|
2020-11-19 21:42:43 +00:00
|
|
|
|
|
|
|
########## ALL lang codes: ar bg ca cz da de el en es eu fa fi fr ga gl hi hu hy id it ja lv nl no pt ro ru sv th tr
|
|
|
|
./x/purge-by-lang fr # purge all URLs by the given language code (example: france)
|
|
|
|
|
2020-11-19 21:31:05 +00:00
|
|
|
./x/purge-by-regex amazon.com # purges all URLs that contains the given string in the DOMAIN part of the URL
|
|
|
|
./x/terminate-all # terminate all crawling tasks right now! (starts different crawling jobs if Auto-Crawl is ON)
|
|
|
|
```
|