From 930f13744ea3f07a4b90d4f39acc243694f8d913 Mon Sep 17 00:00:00 2001 From: Mahdi Dibaiee Date: Mon, 17 Apr 2017 14:53:12 +0430 Subject: [PATCH] chore: add readme --- README.md | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) create mode 100644 README.md diff --git a/README.md b/README.md new file mode 100644 index 0000000..2c3ddb5 --- /dev/null +++ b/README.md @@ -0,0 +1,31 @@ +web-scraper +=========== + +A simple script that scrapes a website, extracting texts in a CSV file with the format below, and saving images. + +| Page | Tag | Text | Link | Image | +|-----------|---------------------------------|--------------|-------------------|------------------------| +| page path | element tag (h{1,6}, a, p, etc) | text content | link url (if any) | image address (if any) | + +## Usage +First, install dependencies (python3): + +``` +pip install -r requirements +``` + +Then create a file containing urls of the websites you want to scrape, one line for each website, for example (I'll call this file `test_websites`): + +``` +https://theread.me +https://theguardian.com +``` + +Now you are ready to execute the script: + +``` +python index.py test_websites + # path to your file ^ +``` + +To see available options, try `python index.py -h`.