From c08650229939d22baa5e6aa507837958071e80a3 Mon Sep 17 00:00:00 2001
From: Sherlock <130759470+actuallysherlock@users.noreply.github.com>
Date: Sat, 3 May 2025 16:51:20 +0500
Subject: [PATCH] =?UTF-8?q?=F0=9F=A7=A9=20Fix:=20README?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Made the following changes:

- Improved README structure by making sure that the list items are consistent.

- Nothing else, yeah, nothing else lol.
---
 README.md | 80 ++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 46 insertions(+), 34 deletions(-)
diff --git a/README.md b/README.md
index 050eeeb..b6def0c 100644
--- a/README.md
+++ b/README.md
@@ -8,76 +8,88 @@ Sloth Search is a project that aims to recreate Google, including crawling, inde
 The project is divided into the following folders:
 
 - **Client**: Contains the front-end code, providing a user interface similar to Google search, where users can enter queries and view search results.
+
 - **Search**: Contains the core components of Sloth Search, which replicate the three main parts of Google:
+
   - **Crawling**: The web crawler that collects information from the web.
+
   - **Indexing**: Processing and storing the content collected by the crawler for efficient searching.
+
   - **Serving (PageRank)**: Serving search results based on their relevance and PageRank algorithm.
+
 - **Server**: Contains the search API used to handle client requests and provide search results.
 
 ## Installation and Setup
 
-1. **Clone the Repository**
+**1. Clone the Repository**
 
-   ```sh
-   git clone https://github.com/The-CodingSloth/sloth-search.git
-   cd sloth-search
-   ```
+```sh
+git clone https://github.com/The-CodingSloth/sloth-search.git
+cd sloth-search
+```
 
-2. ## Install the necessary Python dependencies, run:
+**2. Install the necessary Python dependencies**
 
 ```sh
 pip install -r requirements.txt
 ```
 
-3. **Client Setup**
+**3. Client Setup**
 
-   - The client contains the HTML, CSS, and JavaScript code to run the front-end.
-   - Open the `index.html` file in your browser, or use a static file server to serve the client code locally.
-   - You can also use the live server extension.
+- The client contains the HTML, CSS, and JavaScript code to run the front-end.
 
-4. **Search Setup**
+- Open the `index.html` file in your browser, or use a static file server to serve the client code locally.
+
+- You can also use the live server extension.
+
+**4. Search Setup**
+  
+- The `search` directory contains the code for crawling, indexing, and serving.
 
-- The `Search` directory contains the code for crawling, indexing, and serving.
 - You can start the process by running:
-  ```sh
-  python search/complete_examples/advanced_pagerank.py
-  ```
+
+```sh
+python search/complete_examples/advanced_pagerank.py
+```
+
 - This will crawl, index, and prepare the content for searching.
-- If you want to run any other files do the same process:
+
+- If you want to run any other files, do the same process:
 
 ```sh
 python search/<path to file you want to run>
 ```
 
-4. **Search Setup**
-   - The server uses Flask to provide an API for search queries.
-   - Start the Flask server by navigating to the `Server` directory and running:
-     ```sh
-     python google_search_api.py
-     ```
-
 ## How It Works
 
-1. **Crawling**
+**1. Crawling**
 
-   - The crawler starts with a set of seed URLs and collects links and content from the web.
-   - It respects `robots.txt` to avoid being blocked and to ensure ethical crawling.
-   - Parsed data is stored in a format ready for indexing.
+- The crawler starts with a set of seed URLs and collects links and content from the web.
 
-2. **Indexing**
+- It respects `robots.txt` to avoid being blocked and to ensure ethical crawling.
 
-   - The indexing module processes the crawled pages.
-   - The content is tokenized, cleaned, stemmed, and stop words are removed using the NLTK library.
-   - The resulting indexed data is saved to be used by the search API.
+- Parsed data is stored in a format ready for indexing.
 
-3. **Serving and PageRank**
-   - The PageRank algorithm is used to rank pages based on their importance.
-   - When a user searches for a query through the client, the server uses the indexed data and PageRank scores to return the most relevant pages.
+**2. Indexing**
+
+- The indexing module processes the crawled pages.
+
+- The content is tokenized, cleaned, stemmed, and stop words are removed using the NLTK library.
+
+- The resulting indexed data is saved to be used by the search API.
+
+**3. Serving and PageRank**
+
+- The PageRank algorithm is used to rank pages based on their importance.
+
+- When a user searches for a query through the client, the server uses the indexed data and PageRank scores to return the most relevant pages.
 
 ## Important Notes
 
 - **Respecting Websites**: The crawler respects `robots.txt` rules. Please make sure not to overload any websites.
+
 - **PageRank Algorithm**: The implementation of the PageRank algorithm uses an iterative approach to rank pages based on the links.
+
 - **Data Storage**: The crawler and indexer use CSV files for data storage (`advanced_pagerank_inverted_index.csv` and `advanced_pagerank.csv`). Make sure these files are writable during execution.
 
 ## Contributing