Build me a Super simple search engine python

Creating a super simple search engine in Python involves several basic steps, but due to the complexity of search engines, remember that this will be a very rudimentary version. It can search through a set of documents (like text files) for keywords and return the most relevant documents.

Here’s a basic outline for a simple search engine:

  1. Data Collection: Have a set of documents to search through. This could be any collection of text files.
  2. Indexing: Process these documents to create an index. This is usually a mapping of terms to the documents that contain them.
  3. Search Query: Accept a user’s search query and find relevant documents.
  4. Ranking (optional): Rank these documents based on some relevance criteria, like the frequency of the search term.
  5. Return Results: Display the search results to the user.

Below is a simple Python script that demonstrates these concepts:

import os
import re

class SimpleSearchEngine:
    def __init__(self, directory):
        self.index = {}
        self.documents = {}
        self.directory = directory
        self.build_index()

    def build_index(self):
        for filename in os.listdir(self.directory):
            if filename.endswith(".txt"):
                filepath = os.path.join(self.directory, filename)
                with open(filepath, 'r', encoding='utf-8') as file:
                    text = file.read().lower()
                    self.documents[filename] = text
                    for term in set(re.findall(r'\w+', text)):
                        if term in self.index:
                            self.index[term].add(filename)
                        else:
                            self.index[term] = {filename}

    def search(self, query):
        query_terms = set(re.findall(r'\w+', query.lower()))
        results = set()
        for term in query_terms:
            if term in self.index:
                if not results:
                    results = self.index[term]
                else:
                    results = results.intersection(self.index[term])
        return results

# Usage
search_engine = SimpleSearchEngine("path_to_documents")
results = search_engine.search("search query")
for result in results:
    print(result)

How to Use:

  1. Place the text files you want to search through in a directory.
  2. Replace "path_to_documents" with the path to this directory.
  3. Replace "search query" with the actual query.

Note: This script is extremely basic and lacks many features of a full-fledged search engine, like natural language processing, efficient data structures for indexing, advanced ranking algorithms, etc. It’s a good starting point for understanding the basics.