Click Here
06.18.03


By John Coggeshall

Every user expects a search engine on any Web site they visit. Unfortunately, search engines are extremely complex to develop. They require massive amounts of resources and time to implement. Today I'll show you how to use a MySQL (or any SQL database) to create your own basic search engine.

Note: This article requires that an SQL-compliant database be installed (such as MySQL). If a SQL database is not available, you can download one from http://www.mysql.com/. Please consult the PHP documentation on how to use MySQL with PHP.

The search engine concept

Although search engines can be incredibly complex applications, the concept behind them is fairly easy to grasp. In its simplest form, a search engine is nothing more than a program that determines if a given item (such as a word) exists in a given set of items (such as a document).

Today, I will refer to the item that is being searched for as the "needle" and the set of items being searched as the "haystack". Through the use of two SQL tables, I will take this needle-and-haystack concept, and turn it into a working and powerful search engine.
The SQL tables

You will be using two SQL tables to drive the backend of the search engine. The first of these tables is labeled as keywords_list, and is defined by an SQL CREATE statement as follows:

CREATE TABLE keywords_list(key_id INT(4) AUTO_INCREMENT PRIMARY KEY NOT NULL,
page_id INT(4) NOT NULL, keys VARCHAR(255) NOT NULL);


Because this table can only be described in reference to the second of the SQL tables you will be creating, it is best that the individual descriptions of the columns in the keywords_list table be saved for later. For now, let's continue by describing the second table, which I will call page_data:

CREATE TABLE page_data(page_id INT(4) AUTO_INCRMENT PRIMARY KEY NOT NULL,
url VARCHAR(255) NOT NULL, title VARCHAR(60) NOT NULL, desc TEXT);


Because the purpose of these two tables may not be immediately clear, I will describe the purpose of each briefly starting with the page_data table. As defined above, the page_data table is used to store all of the information you would expect to find in a single result from a search engine search, including:
  • the URL to the Web site

  • the title of the page

  • a brief description or summary of the page in question
Also included is an auto-incrementing integer defined as the primary key, called page_id. This integer is a unique identifier that allows you to refer to each separate page in the table uniquely. This is a very important behavior; its purpose will be described next.

As the name implies, the keywords_list table stores the keywords associated with the Web pages that the search engine has indexed. Specifically, the purpose of this database is to associate each keyword in the table individually with a Web page that is described in the page_data table. Setting the page_id value of a given keyword in the keywords_list table to the appropriate page_id of the Web page described in the page_data table accomplishes this task.

To better illustrate this concept, consider the following diagram:


The above is a representation of our two tables. As shown, the page_id values for those keywords on the left (the keywords_list table) are associated with the Web sites descriptions on the right (the page_data table).

In practice, the above relations are created through the use of SQL JOIN statements. Because the actual details of this concept are complex, it is recommended that you consult your SQL manual (for MySQL users, see the MySQL site) for more information on relations between tables.

Why not just one table?

If you have minimal SQL experience, you may ask: "Why not just use a single table?" Of course, the above two SQL tables can be combined into a single SQL table, which seems simpler in theory. However, in practice, as you will see, the above style is not only easier to implement, but is also much more efficient than a single table.

Click Here to Read the Full Article

About the Author:
John Coggeshall is a PHP consultant and author who started losing sleep over PHP around five years ago. Lately you'll find him losing sleep meeting deadlines for books or online columns on a wide range of PHP topics. He maintains a PHP website packed full of PHP-related materials, tutorials, projects and more at http://www.coggeshall.org/.


Read This Newsletter at: http://www.devwebnews.com/2003/0618.html

 

 

 

 

 

 

 

 



 

-- DevWebNews is an iEntry, Inc. publication --
2003 iEntry, Inc. All Rights Reserved Privacy Policy Legal