06.18.03

By
John Coggeshall
Every user expects a search engine on any Web site they visit. Unfortunately,
search engines are extremely complex to develop. They require massive amounts
of resources and time to implement. Today I'll show you how to use a MySQL (or
any SQL database) to create your own basic search engine.
Note: This article requires that an SQL-compliant database be
installed (such as MySQL). If a SQL database is not available, you can download
one from http://www.mysql.com/. Please consult
the PHP documentation on how to use MySQL
with PHP. The search engine concept
Although search engines can be incredibly complex applications, the concept behind
them is fairly easy to grasp. In its simplest form, a search engine is nothing
more than a program that determines if a given item (such as a word) exists in
a given set of items (such as a document).
Today, I will refer to the item that is being searched for as the "needle" and
the set of items being searched as the "haystack". Through the use of two SQL
tables, I will take this needle-and-haystack concept, and turn it into a working
and powerful search engine. |
The SQL tables
You will be using two SQL tables to drive the backend of the search engine. The
first of these tables is labeled as keywords_list, and is defined
by an SQL CREATE statement as follows: CREATE
TABLE keywords_list(key_id INT(4) AUTO_INCREMENT PRIMARY KEY NOT NULL,
page_id INT(4) NOT NULL, keys VARCHAR(255) NOT NULL);
Because this table can only be described in reference to the second of the SQL
tables you will be creating, it is best that the individual descriptions of the
columns in the keywords_list table be saved for later. For now, let's
continue by describing the second table, which I will call page_data:
CREATE TABLE page_data(page_id INT(4) AUTO_INCRMENT
PRIMARY KEY NOT NULL,
url VARCHAR(255) NOT NULL, title VARCHAR(60) NOT NULL, desc TEXT);
Because the purpose of these two tables may not be immediately clear, I will describe
the purpose of each briefly starting with the page_data table. As
defined above, the page_data table is used to store all of the information
you would expect to find in a single result from a search engine search, including:
- the URL to the Web site
- the title of the page
- a brief description or summary of the page in question
Also included is an auto-incrementing integer defined as the primary key, called
page_id. This integer is a unique identifier that allows you to refer
to each separate page in the table uniquely. This is a very important behavior;
its purpose will be described next.
As the name implies, the keywords_list table stores the keywords
associated with the Web pages that the search engine has indexed. Specifically,
the purpose of this database is to associate each keyword in the table individually
with a Web page that is described in the page_data table. Setting
the page_id value of a given keyword in the keywords_list
table to the appropriate page_id of the Web page described in the
page_data table accomplishes this task.
To better illustrate this concept, consider the following diagram:
The above is a representation of our two tables. As shown, the page_id
values for those keywords on the left (the keywords_list table) are
associated with the Web sites descriptions on the right (the page_data
table).
In practice, the above relations are created through the use of SQL JOIN statements.
Because the actual details of this concept are complex, it is recommended that
you consult your SQL manual (for MySQL users, see the MySQL site) for more information
on relations between tables. Why not just one table?
If you have minimal SQL experience, you may ask: "Why not just use a single table?"
Of course, the above two SQL tables can be combined into a single SQL table, which
seems simpler in theory. However, in practice, as you will see, the above style
is not only easier to implement, but is also much more efficient than a single
table.
Click
Here to Read the Full Article
About the Author:
| John Coggeshall is a PHP consultant and author who started losing
sleep over PHP around five years ago. Lately you'll find him losing sleep meeting
deadlines for books or online columns on a wide range of PHP topics. He maintains
a PHP website packed full of PHP-related materials, tutorials, projects and more
at http://www.coggeshall.org/. |
|
Read This Newsletter at: http://www.devwebnews.com/2003/0618.html |
|
|
|