PixieBot Image Scraper 4.0
By: K0NxT3D

------------------------------------------------------

Overview:
------------------------------------------------------

PixieBot Image Scraper 4.0 is a powerful Python-based tool designed for efficient image extraction from websites.
Tailored for developers, researchers, content creators, and anyone needing automated image scraping, this version incorporates advanced features for managing large datasets with ease. 

Key features include directory management, concurrent processing, progress tracking, error handling, and duplicate image skipping.

------------------------------------------------------

Table of Contents:
------------------------------------------------------

1. Requirements
2. Dependencies
3. Purpose and Key Features
4. How to Use
5. Future Updates and Enhancements
6. Licensing and Author Information

------------------------------------------------------

Requirements:
------------------------------------------------------

- Python 3.x (Recommended Python 3.7+)
- pip (Python package manager)

Ensure Python is installed on your machine by running the following in your terminal or command prompt:

    python --version

If Python is not installed, download and install it from: https://www.python.org

------------------------------------------------------

Dependencies:
------------------------------------------------------

Before running the script, install the required dependencies. These can be installed via pip:

    pip install -r requirements.txt

The `requirements.txt` includes the following packages:

- `requests`: For making HTTP requests to scrape image URLs.
- `BeautifulSoup4`: For parsing HTML content and extracting image URLs.
- `tqdm`: For displaying a progress bar during the download process.
- `pyfiglet`: For the splash screen text display.

------------------------------------------------------

Purpose and Key Features:
------------------------------------------------------

PixieBot Image Scraper 4.0 offers a range of features to make image scraping as efficient and customizable as possible:

1. **Advanced Directory Management**:
   - Automatically creates and organizes folders for storing images.
   - Images are saved in a sub-folder named after the website’s domain, keeping everything structured.

2. **Customizable Scraping Options**:
   - **Follow External Links**: Decide whether to scrape images from external websites linked from the starting URL.
   - **Depth Control**: Set the maximum depth to follow external links, controlling how deep the scraper navigates.

3. **Concurrent Processing**:
   - Uses Python’s `ThreadPoolExecutor` to scrape multiple images simultaneously, enhancing scraping speed and efficiency.

4. **Progress Tracking**:
   - Integrated with `tqdm`, PixieBot displays real-time progress, including download speed, size, and estimated time left.

5. **Robust Error Handling**:
   - Handles errors such as 403 (Forbidden) and 404 (Not Found), ensuring smooth scraping even if some resources are inaccessible.

6. **Duplicate Image Skipping**:
   - The scraper avoids downloading images that already exist based on their filenames. 
   - **New in version 4.1**: Duplicate images are tracked in memory to prevent multiple downloads, even with concurrent tasks.

------------------------------------------------------

How to Use:
------------------------------------------------------

Follow the steps below to run PixieBot Image Scraper 4.0:

1. **Clone or Download the Repository**:
   - Download the PixieBot 4.0 source code, or clone the repository to your local machine.

2. **Install Dependencies**:
   - Run the following command to install the necessary dependencies:

        pip install -r requirements.txt

3. **Run the Scraper**:
   - To execute the script, use the following command:

        python pixiebot.py

4. **Configure Scraping Parameters**:
   - You’ll be prompted to provide the following inputs:
     - **Start URL**: The URL of the webpage to begin scraping images from.
     - **Follow External Links**: Choose whether to follow external links to scrape images from other pages (y/n).
     - **Max Depth for External Links**: Set the maximum depth to follow external links.

5. **Monitor Progress**:
   - As the scraper downloads images, it will display a progress bar with:
     - Number of images downloaded
     - Current download speed
     - Estimated time remaining

6. **Finish Scraping**:
   - After the task completes, the scraper will display a success message. You can choose to scrape another URL or exit the tool.

Optional Settings:
   - By default, images are stored in a subfolder named after the website’s domain (e.g., `scraped_images/example.com`).
   - The scraper checks if the image already exists and skips downloading duplicates based on filenames.

------------------------------------------------------

Future Updates and Enhancements:
------------------------------------------------------

PixieBot 4.0 will continue to evolve with the following planned enhancements:

1. **Enhanced Robust Features**:
   - Support for solving CAPTCHAs and bypassing JavaScript-rendered content to handle more complex websites.
   
2. **Streamlined User Interface**:
   - Future versions will have an improved configuration process to make it easier for users of all technical levels to use the tool.

3. **Increased Customization**:
   - More advanced filters for image types, sizes, and formats.
   - Greater control over how external links are followed.

4. **Cloud Storage Integration**:
   - Future releases will allow users to save scraped images directly to cloud storage platforms like AWS S3 or Google Drive.

------------------------------------------------------

Licensing and Author Information:
------------------------------------------------------

- **Author**: K0NxT3D
- **License**: MIT License

PixieBot Image Scraper 4.0 is open-source and free to use. Contributions are welcome! If you encounter bugs or have feature suggestions, feel free to submit an issue or pull request on the repository.

------------------------------------------------------

Contact and Support:
------------------------------------------------------

For questions or further assistance, you can reach the author at:

- **Email**: k0nxt3d@example.com
- **GitHub Repository**: https://github.com/K0NxT3D/PixieBot

------------------------------------------------------

PixieBot Image Scraper 4.0.

