A Python service that allows instant tracking of posts shared on desired subreddits.

Python 95.8%
Dockerfile 4.2%

Find a file

Kaptan 11f6fde67a Create LICENSE		2023-09-04 21:31:20 +03:00
api	Implementing logging and monitoring	2023-06-21 15:19:59 +03:00
crawler	Implement error handling in Reddit crawler	2023-06-20 14:45:12 +03:00
database	Fix database connection and cursor handling	2023-06-21 14:00:22 +03:00
tests	Updated the code to include unit tests using pytest	2023-06-21 14:18:07 +03:00
Dockerfile	Dockerfile is divided into two stages: development and production	2023-06-21 15:22:30 +03:00
LICENSE	Create LICENSE	2023-09-04 21:31:20 +03:00
main.py	Update main.py	2023-06-21 16:14:45 +03:00
README.md	Update README.md	2023-06-21 16:19:22 +03:00
reddit_posts.db	Initial commit: Set up project structure and Reddit crawler functionality	2023-06-20 11:41:37 +03:00
requirements.txt	Initial commit: Set up project structure and Reddit crawler functionality	2023-06-20 11:41:37 +03:00
schema.sql	Added additional data fields to the database schema	2023-06-21 09:19:47 +03:00

README.md

Reddit Crawler

The Reddit Crawler is a Python application that crawls posts from a specified subreddit and stores them in a SQLite database. It also provides an API server to retrieve and filter the crawled posts.

Features

Crawls posts from a specified subreddit using the Reddit API.
Stores the crawled posts in a SQLite database.
Provides an API server to retrieve and filter the stored posts.
Supports error handling and logging mechanisms.
Implements unit tests for different modules.

Missions

Ability to login.
Storage of crawled posts in the database.
Real-time monitoring of posts.
Serving of posts through the API.
Testing of all written code.
Dockerizing the application.

Requirements

Python 3.9 or higher
SQLite database

Installation

Clone the repository:
```
git clone <repository_url>
```
Change the directory
```
cd <repository_name>
```
Install the requirements
```
pip3 install -r requirements.txt
```

Create & Run following command to apply the schema database

sqlite3 reddit_posts.db < schema.sql

Usage

Initialize the SQLite database & Start crawling posts:

python3 main.py

Run the API server:

python3 api_server.py

Access the API at http://localhost:5001/posts

Docker Build

docker build -t [APP_NAME] .
docker run -p [host_port]:[container_port] [APP_NAME]

Configuration

Reddit API credentials:

Set the CLIENT_ID, CLIENT_SECRET, and USER_AGENT variables in reddit_crawler.py to your Reddit API credentials.

Otherwise you will get an error: Error occurred during crawling: received 401 HTTP response

Database configuration:

Modify the reddit_posts.db file path in database/database_handler.py if desired.

Testing

To run the unit tests, use the following command:

pytest

Filter and Sorting

Filter for posts containing a specific keyword in the title:

/posts?keyword=python

Sorting posts within a specific date range, include the start_date and end_date parameters:

/posts?start_date=2023-01-01&end_date=2023-06-30

Sorting posts with a minimum or maximum number of upvotes:

/posts?min_upvotes=10&max_upvotes=100