A Python service that allows instant tracking of posts shared on desired subreddits.
  • Python 95.8%
  • Dockerfile 4.2%
Find a file
2023-09-04 21:31:20 +03:00
api Implementing logging and monitoring 2023-06-21 15:19:59 +03:00
crawler Implement error handling in Reddit crawler 2023-06-20 14:45:12 +03:00
database Fix database connection and cursor handling 2023-06-21 14:00:22 +03:00
tests Updated the code to include unit tests using pytest 2023-06-21 14:18:07 +03:00
Dockerfile Dockerfile is divided into two stages: development and production 2023-06-21 15:22:30 +03:00
LICENSE Create LICENSE 2023-09-04 21:31:20 +03:00
main.py Update main.py 2023-06-21 16:14:45 +03:00
README.md Update README.md 2023-06-21 16:19:22 +03:00
reddit_posts.db Initial commit: Set up project structure and Reddit crawler functionality 2023-06-20 11:41:37 +03:00
requirements.txt Initial commit: Set up project structure and Reddit crawler functionality 2023-06-20 11:41:37 +03:00
schema.sql Added additional data fields to the database schema 2023-06-21 09:19:47 +03:00

Reddit Crawler

The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.

Features

  • Crawls posts from a specified subreddit using the Reddit API.
  • Stores the crawled posts in a SQLite database.
  • Provides an API server to retrieve and filter the stored posts.
  • Supports error handling and logging mechanisms.
  • Implements unit tests for different modules.

Missions

  • Ability to login.
  • Storage of crawled posts in the database.
  • Real-time monitoring of posts.
  • Serving of posts through the API.
  • Testing of all written code.
  • Dockerizing the application.

Requirements

  • Python 3.9 or higher
  • SQLite database

Installation

  1. Clone the repository:

    git clone <repository_url>
    
  2. Change the directory

    cd <repository_name>
    
  3. Install the requirements

    pip3 install -r requirements.txt
    

Create & Run following command to apply the schema database

sqlite3 reddit_posts.db < schema.sql

Usage

Initialize the SQLite database & Start crawling posts:

python3 main.py

Run the API server:

python3 api_server.py

Access the API at http://localhost:5001/posts

Docker Build

docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]

Configuration

  • Reddit API credentials:

Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in reddit_crawler.py to your Reddit API credentials.

Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response

  • Database configuration:

Modify the reddit_posts.db file path in database/database_handler.py if desired.

Testing

To run the unit tests, use the following command:

pytest

Filter and Sorting

Filter for posts containing a specific keyword in the title:

/posts?keyword=python

Sorting posts within a specific date range, include the start_date and end_date parameters:

/posts?start_date=2023-01-01&end_date=2023-06-30

Sorting posts with a minimum or maximum number of upvotes:

/posts?min_upvotes=10&max_upvotes=100