INTELLIGENT SEARCH AGENTS

BY DARREN GREAVES




4. IMPLEMENTATION OVERVIEW
4.1. INTRODUCTION
4.2. PLATFORM
4.3. CHOICE OF LANGUAGE
4.3.1. Languages Available
4.3.2. Criteria for Evaluation
4.3.3. Final Choice
4.4. CHOICE OF METHODOLOGY/DEVELOPMENT STRATEGY
4.4.1. Methodologies Available
4.4.2. Criteria for Evaluation
4.4.3. Final Choice
4.5. DESIGN
4.5.1.1. Introduction
4.5.1.2. Perform Searches On Internet/Intranet Content
4.5.1.3. Add New Filters To Perform Different Searches
4.5.2. Design One
4.5.3. Evaluation of Design One
4.5.4. Design Two
4.5.5. Evaluation of Design Two

4. IMPLEMENTATION OVERVIEW
4.1. INTRODUCTION
This section will present the choices available to me as regards my development strategy and choice of language. First I will consider the platform used, the languages available then the development strategies. It will then go into detail on my designs for the program.
4.2. PLATFORM
I have a strong preference for being able to work on this project from home. My reasons being that I can work on my project at any time and not just when the University is open and I will not have a problem gaining access to my machine. Finding a free machine that has all the required software at the University is an ever increasingly rare event now. The platform for development at home would be Microsoft Windows NT4 Workstation. This does not yet restrict what platforms the end product can run upon.
4.3. CHOICE OF LANGUAGE
4.3.1. LANGUAGES AVAILABLE
C
C++
Java
Visual Basic
KQML (see 3.2.2.1 above)
IBM Agent Building Environment (ABE) (see 3.2.3.1 above)
I could of course have made a list of languages to fill two pages. So, there had to be an even more basic criteria of what made it on to the list. The criteria were simply that I was either familiar with the language (C, C++, Java and Visual Basic) or I felt it had something special to offer within the intelligent agent paradigm (KQML and the IBM ABE). The familiarity was required so I could judge it's ability to satisfy the more complex criteria listed in 4.3.2 below.
4.3.2. CRITERIA FOR EVALUATION
The criteria that these languages have to be judged upon are that the end product has to run at a sufficiently fast speed and the language needs to support all the features necessary for the production of the system. I also have a preference that the language environment has the development support required of a reasonable sized project. By development support I mean that any choice would have to include a development environment that has all the necessary features, such as; a text editor, a fast compiler (where applicable), and support an easy to use GUI interface. The GUI interface may not seem like an important requirement, but to have to learn a complex command line interface to a program is just unproductive time spent in my opinion, unless I am going to use that program again in the future.
The main criteria then is that the language supports the following required features:
Support for downloading pages off web servers
Although I could write code to deal with the raw HTTP protocol, that would be unproductive given that code already exists in other languages.
Support for creating a Windows GUI interface
GUI interfaces are certainly the norm now for any Windows program and it is no longer acceptable to produce a program that does not use a GUI.
Support the creation of plug-in filters
Also needs to support the ability to drop in new filters after program is completed.
There may be a requirement to support multithreaded code.
A program of this complexity may require the flexibility of multiple threads of execution.
The generated program must run as fast as can be expected
It is anticipated that there will be a high degree of text parsing and searching taking place, this will require a fast efficient program.


Given the requirements listed for the language I can now evaluate each of my choices listed in 4.3.1 above.
C
C's ability to generate fast programs is well-known so it certainly satisfies that criteria. It also can support creation of GUI interfaces using Windows libraries and handle the filters using Windows DLL files. It can also handle HTTP coding using a pre-defined Windows interface. So C satisfies all the criteria.
C++
C++ also satisfies all the criteria mentioned for C, but by using C++ in conjunction with Microsoft Foundation Classes (MFC) there is an entire framework of C++ classes geared towards writing Windows programs. Using MFC can dramatically decrease the time spent developing programs compared to using C. MFC also integrates well with ActiveX which could be ideal for the plug-in filter
Java
Java can support the creation of GUI interface with no problems as it too has a framework of classes to deal with this. It's framework also includes code for dealing with the HTTP protocol. The Java Beans technology can be used for plug-in filters too. The real failing with Java is that it's an interpreted language, that is, it does not generate true executable code and therefore does not execute very fast. Another question mark about Java is it's lack of stability, by which I mean that new versions of the language and compiler are released quite regularly often breaking code that worked previously. Sometimes it is necessary to upgrade a version to gain a new feature or a new library of code and having other code fail due to an upgrade can be very frustrating.
Visual Basic
Visual Basic (VB) satisfies all but the last two requirements. It cannot create multithreaded code and does not generate fast executables either. For these reasons I am loathe to use it to tackle this project. I have had some previous experience with VB and found that while some things are very easy to do, other seemingly simple things can be very hard or even impossible to do. I would hate to be half-way through my development and find out I could not do a simple thing because of a restriction in the language.
KQML (see 3.2.2.1 above)
KQML is not a language for creating programs in itself, but rather a language for communicating between mobile Agents. However, several projects have integrated KQML into other mainstream languages such as C, C++ and Java. Unfortunately, these projects are experimental and thus the reliability and stability of anything created using them cannot be guaranteed. Also, support for any problems cannot be relied upon either.
IBM Agent Building Environment (ABE) (see 3.2.3.1 above)
Again, as with KQML, the ABE is supported from within other languages, in this case C++ and Java. The downside with the ABE is that it is not really geared up for searching the Internet, but more for monitoring web sites for key information then triggering a response. It is also unsupported which again could be a problem.
4.3.3. FINAL CHOICE
Having evaluated all the choices by the criteria mentioned then the choice really comes to C and C++ as the only languages that can do everything required. My choice from these two then is C++, my reasons being that by using MFC I can develop a program much quicker than by using just C. The only downside to using MFC is that I am tied to the Windows platform. However, Microsoft Windows is one of the most popular Operating Systems in the world so it should not be seen as a major restriction. The compiler I will use is Microsoft Visual C++ version 5.0 (MSVC).
I have mentioned MFC already in relation to C++. I will briefly explain what it is. MFC is a framework of code for building Windows applications. It is part of MSVC and provides a vast body of pre-written and tested code that can greatly speed up development of any project that uses it. I have been using this framework of code for over a year and consider myself reasonably experienced with it.

4.4. CHOICE OF METHODOLOGY/DEVELOPMENT STRATEGY
4.4.1. METHODOLOGIES AVAILABLE
Again, as with the choice of languages, I will evaluate the choices available, and choose based on ability to meet my criteria.
My initial restrictions are based on methodologies I have some familiarity with. The choices are as follows:
SSADM
RAD
Prototyping
4.4.2. CRITERIA FOR EVALUATION
The criteria that a methodology has to fulfil is that it can support the development of a project of this size. An important point is that the project will be managed by a one person team, this means that any advantages a methodology has for team management and communication are largely irrelevant for this project. The project has a reasonably short timeframe to completion, this means that a flexible approach would be an advantage. Other criteria are that the methodology fits in well with the language and development environment that I am using. I will know review each chosen methodology against my criteria.
SSADM
SSADM is based on the waterfall methodology, that basically means that the project is divided into phases and each phase has a sign-off stage where it is complete. There is typically little or no movement back to a stage once it has been completed. I find this approach inflexible as at the outset of the project I am unsure of what the end product will look like and exactly what features it will offer. I have a general idea, but not a detailed specification. This would make it very difficult to sign-off a stage as complete when I am unsure if I have included every required piece of functionality.
RAD
RAD is seen by many as the alternative to the range of waterfall methodologies. It tries to overcome the shortcomings of the waterfall's inflexible approach by involving the users of the end product during stages of development. It also allows a lot of scope for movement between different phases of the project. The advantages of involving the end-user have a small bearing on the project as regards meetings with the supervisor and such. However, I feel that RAD is slightly too heavy-weight for this project as it has as it's central theme many meetings with users, developers and sponsors. If you strip away these meetings the main technique left in the RAD methodology is simply Prototyping.
Prototyping
Prototyping is, as stated above, one of the central techniques within RAD. I do not believe it is a true methodology, but simply a technique for use within other methodologies. However, I do feel it is ideal for my project as it can be used as a light-weight methodology. The advantages for me are that it allows a cyclic development where a prototype can be constructed and then changed very easily to suit new developments in the project. It is suited to my development environment as my compiler supports the creation of user-interfaces very easily and quickly. The lack of support for meetings and communications between team members is irrelevant for this project so can not be seen as a disadvantage.

4.4.3. FINAL CHOICE
So, based on how each methodology stood up to my criteria, I have decided to use Prototyping as my methodology. With this in mind the overall plan would be to draw up an overall design, breaking each part down into components. I would normally not go into great detail for my designs unless I felt there was a particular problem I was having difficulty with.
In summary then, I feel the advantages of prototyping for this project are as follows:
Provide a very quick feedback mechanism, allowing me to implement solutions to problems with the minimum of time wasted.
Provide a less-structured framework that allows creativity to flourish, and encourages novel solutions to problems.
4.5. DESIGN
4.5.1.1. INTRODUCTION
In this section the overall design for the project will be mapped out. First, a brief reminder of the main aims of the project. To produce a program that can do the following two things:
Perform searches on Internet/intranet content.
Add new filters to perform different types of searches.

4.5.1.2. PERFORM SEARCHES ON INTERNET/INTRANET CONTENT
This will be achieved in the following manner.
An interface that will allow the user to specify the following information:
A list to select which filters to match documents with, the scope of the project will include only one simple text filter.
The ability to specify a root domain to start searching from.
Text to search for.
The search will be performed simply by traversing html files for links to further files. The matches will be performed by passing each www page to each search filter in turn and the filter will return any details of matches found.
When the search is finished, the results will be presented to the user.

4.5.1.3. ADD NEW FILTERS TO PERFORM DIFFERENT SEARCHES
The key to this feature is implementing the initial filter in such a way that it is simple to add more filters. Example filters could be a synonym filter, or a graphic filter. I plan to use ActiveX to create the filters as it is the main component technology for Windows platforms.
This will be expanded upon in 5.3.2.4 below.

4.5.2. DESIGN ONE

Figure 2 - Design One
4.5.3. EVALUATION OF DESIGN ONE
Whilst this design seems quite effective at first glance, upon closer inspection I found two major problems with it.
How do the users options get passed to the filter processor when there is no link to it? It would of course be possible to pass them from the User Options onto the Search Engine then onto the Filter Processor. But, that means the Search Engine is dealing with data that has no meaning to it. That is not very sensible from an Object Orientated aspect as an object should only deal with data important to itself.
The User Options Screen needs to know what filters are available for the user to choose from but only the Filter Processor will know this. Again the problem is of the Filter Processor communicating with the User Interface.
I then developed a second design that would overcome these shortcomings.
4.5.4. DESIGN TWO

Figure 3 - Design Two
4.5.5. EVALUATION OF DESIGN TWO
The fundamental difference between figure 3 and the previous design in figure 2 is that the Filter Processor is now the central component that all the others talk to. This makes a lot more sense as everything needs to communicate with the Filter Processor for various reasons. The problem of course with this is that it could develop into a bottleneck. I anticipate though for the scale of this project it will not be a problem. For a high performance application though other solutions may be required.
A good thing about this design is that it can be seen that everything plugs into the Filter Manager. This is similar to a micro-kernel design in Operating Systems and has the advantage that different components can be plugged in and out as needs change. A further advantage would be that components would not all have to exist on the same machine. The possibility of having different filters running on a machine located elsewhere on a network is raised too. I mention some of this as speculative as I will be unlikely to develop all this functionality into the project, but it's useful that the basis for this will exist already.
The design in figure 3 is the design I shall be using then for the program. I don't expect it to change drastically from this design but it is always an option in case something unexpected turns up. The ability to take a step back and redesign if a problem does occur is one of the great strengths of a Prototyping approach in my opinion.