Intelligent Search Agents

INTELLIGENT SEARCH AGENTS

BY DARREN GREAVES

ACKNOWLEDGEMENTS
My Project Supervisor:
Mike Van Kleef

Ex-colleagues from my placement time at Braid Systems:
Bryan Childs
Sean Batten

Someone who pushed me to work hard all the time:
Christine

3. RESEARCH

3.1. INTRODUCTION

3.2. AGENT RELATED PROJECTS

3.2.1.Large Scale Projects

3.2.1.1. Harvest

3.2.1.2. Harvest Evaluation

3.2.1.3. ARPA Knowledge Sharing Effort[ARP WWW]

3.2.1.4. ARPA Evaluation

3.2.2. Agent Programming Languages & Paradigms

3.2.2.1. KQML Knowledge Query and Manipulation Language

3.2.2.2. KQML Evaluation

3.2.2.3. Agent-Oriented Programming Paradigm

3.2.2.4. AOPP Evaluation

3.2.2.5. LALO Langage D'agents Logiciel Objet

3.2.2.6. LALO Evaluation

3.2.3. Downloadable Agent Development Toolkits

3.2.3.1. Agent Building Environment (ABE)

3.2.3.2. ABE Evaluation

3.2.3.3. Aglets Self-contained Java Objects

3.2.3.4. Aglets Evaluation

3.3. CONCLUSIONS

3. RESEARCH

3.1. INTRODUCTION

In the research component of this project I shall be doing the following things:
· A study of a range of different projects within the intelligent agent paradigm.
· A critical viewpoint of the current state of the intelligent agent paradigm.
The context for this range of topics is that they are all attempts to tackle the problem illustrated in 2.1 above. Namely that of finding useful and relevant information on the Internet. By looking at this sample of different approaches to the problem I can hopefully gain an insight into the main problems in this field and maybe some possible solutions.

3.2. AGENT RELATED PROJECTS

3.2.1. LARGE SCALE PROJECTS

3.2.1.1. HARVEST

Harvest - The Harvest Information Discovery and Access System[HAR WWW]
One of the better known projects related to not just the finding of data but the filtering of it also. The authors have used the term 'digest' to describe the filtering of masses of data into concise relevant information. It also features the ability to replicate information across the internet automatically, thus decreasing load on the Internet as users can access copies of information nearest to them.
The Harvest project was designed and created by a team of academics and people from industry. The group announced Harvest in November 1994 after conducting extensive tests over the previous four months. The primary aim of the project was to ease access to the various different resources of information available over the Internet. In the paper 'Harvest: A scalable, customisable discovery and access system[HA2 WWW] the team explain the problems that Harvest tries to resolve. They are seen to be caused by the rapid growth of both the amount of data on the Internet and the diversity of the different types of data. They cite these reasons as making the Internet difficult to use to locate data and use it effectively and it is easy to agree with their findings.
The Harvest project is introduced as a means of reducing this problem by providing a set of integrated tools to build indexes of related data (possibly from diverse sources), search the indexes in a flexible manner and replicate these indexes across the Internet. These tools they say will reduce the problem and in a public press release on their web site they cite a number of successful example projects as proof of it's effectiveness.
Officially the Harvest project has been closed since August 1996, however this is only related to funding. The project will continue to be used and developed by volunteer users who have just released a new version of the tools, and more importantly there are several commercial developments that make use of the Harvest tools. These are mainly related to Harvest's caching ability and include Netscape's catalogue server.

3.2.1.2. HARVEST EVALUATION

One measure of any projects success is that it is being used. On this metric then Harvest can be considered a relative success. As well as the use by Netscape mentioned above, there are a range of projects that make use of the Harvest technology. One is a UNIX file searching program known as Glimpse[GLI WWW] which has also been used to create a web searching facility. However, there is mention on the main web site that Harvest is superior to both the Lycos search engine and the World Wide Worm (two examples of WWW indexing engines). If this is the case then one wonders why is it not being used for such large scale indexing by the main search engines.
A review[ROB WWW] of the major search engines (showing what software each one uses for creating their indexes) shows that Harvest is not being used. The reason for this may simply be that each search engine is happy using their existing technology and has no desire to change. Most of the major Internet search engines were already in existence before Harvest was released and it's possible that it may simply need more time to establish itself. I believe then that Harvest has the potential to fulfil the promising claims made about it but for this to be proved only time will tell.

3.2.1.3. ARPA KNOWLEDGE SHARING EFFORT[ARP WWW]

Defense Advanced Research Projects Agency (DARPA) is a United States Military effort that has the following mission statement. - "DARPA's primary responsibility is to help maintain U.S. technological superiority and guard against unforeseen technological advances by potential adversaries." .
DARPA is a large well funded research and development department with the primary aim of keeping the United States ahead of the rest of the world in the area of technological innovations. One of their projects is the ARPA Knowledge Sharing Effort[ARP WWW]. This project involves a consortium of companies and educational institutions and their main aim is to work together to build a framework for sharing knowledge. They aim to create projects whose scale of development would not be possible by companies working alone. Their primary effort is to develop a system whereby it is possible to build a knowledge-based system simply be re-using existing components. A lot of the work done by the members of this group is related to Intelligent Agents and can be seen in other projects looked at here.

3.2.1.4. ARPA EVALUATION

Whilst researching this project I came across a lot of agent projects run by university departments and research centres that had created web pages stating that their project was to be the next big step forward in agent technology. At first this all seemed very exciting but after looking at several such sites I began to realise that each of these projects seemed to be existing in a vacuum. By this I mean that they were all trying to solve the same problem (more than likely making the same mistakes along the way also), but they didn't seem to be taking heed of what other similar work has already been done.
When I came across the ARPA project I realised that here was a project that was showing a more collaborative approach to the problem by bringing together different groups to develop a framework for knowledge sharing. The advantages of using a framework are that much of the groundwork has been done already and more time could be spent by problem solvers finding unique and novel solutions to their problems. As this is such a large project that covers such a wide field it is likely that it will take many years before the real benefits of it are shown. However, the following section (3.2.2 below) shows some results of their work already.

3.2.2. AGENT PROGRAMMING LANGUAGES & PARADIGMS

There currently exist a range of different languages and paradigms aimed at increasing the knowledge base surrounding the intelligent agent paradigm. Many of these are developed as research projects at Universities or industry funded research centres. The criteria for inclusion as a basis for research in this report is whether I felt I could learn something from them or whether they presented a major effort in the intelligent agent paradigm.

3.2.2.1. KQML KNOWLEDGE QUERY AND MANIPULATION LANGUAGE

KQML[KQM WWW] is a language designed for communication between an application program and an intelligent agent or between two intelligent agents. It was created by The External Interfaces Working Group as part of the ARPA Knowledge Sharing Effort (see 3.2.1.3 above). The group was formed with the primary purpose of developing methods of communication amongst Intelligent Agents and KQML is the principle result of this effort. KQML defines both the format of the message and the protocol used to communicate the messages. Various projects have been developed to allow integration of KQML into other languages such as Java, C, C++ and LISP so it is relatively straightforward to integrate it into other agent related projects. In fact it has already been used in other agent projects (see 3.2.2.3 below).

3.2.2.2. KQML EVALUATION

KQML is an excellent example of the point I made about re-use in 3.2.1.4 above. It is being used in many projects already and I expect that this use will increase as time goes on. It is possible that KQML may become the standard for communication between Intelligent Agents just as TCP/IP became the standard for communication over the Internet. It is not possible to say if this will be the case but it is certainly true that without the collaborative efforts of the ARPA Knowledge Sharing Effort we would not even be considering a potential world standard for intelligent agent communication.

3.2.2.3. AGENT-ORIENTED PROGRAMMING PARADIGM

Yoav Shoham et al present a paper[SHO WWW] that introduces the idea of Agent-Orientated Programming (AOPP) as a new paradigm. It is part of an ARPA funded research project conducted by the Centre for the Study of Language and Information at Stanford University. The authors have drawn from a variety of different sources to develop an idea of an agent that has a formal mental state, and agents that can interact within communities. The agents can use this as a basis to co-operate, or compete even to complete a task.

3.2.2.4. AOPP EVALUATION

This is another idea that has been developed with the aid of ARPA. It would have been easy to ignore this paper were it not for another project I found that uses it. When I see re-use of this and other work in other projects it gives me the belief that this collaborative effort will be the way forward for agent projects.

3.2.2.5. LALO LANGAGE D'AGENTS LOGICIEL OBJET

LALO is a project that is based on the concepts of the Agent-Orientated Programming Paradigm (see 3.2.2.3 above). It is being developed at the Centre De Recherche Informatique De Montreal in Canada . LALO is the actual name of the language used and stands for Langage d'Agents Logiciel Objet which roughly translates to Logical Agent Object Language. Agents written using LALO can be translated into C++ by the provided translator. Downloads of the framework are only granted with written request by FAX detailing what interest you have in the project. Although LALO is a language for writing agents it also makes use of KQML (see 3.2.2.2 above) to communicate with other Agents.

3.2.2.6. LALO EVALUATION

Projects like this are a perfect example of the benefits of the research being done in this area by the ARPA Knowledge Sharing Effort. It makes use of both the Agent-Orientated Paradigm (see 3.2.2.3 above) and KQML (see 3.2.2.1 above). Hopefully more and more projects of this kind can be made possible as more background work is done.

3.2.3. DOWNLOADABLE AGENT DEVELOPMENT TOOLKITS

There currently exist a range of different toolkits, SDK's (Software Development Kits) and class libraries for users to develop their own Agents. I will take a brief look at two of these projects here as I feel each represents a different approach.

3.2.3.1. AGENT BUILDING ENVIRONMENT (ABE)

The ABE[ABE WWW] is a freely downloadable toolkit for Windows, OS/2 and AIX platforms. It is developed by IBM. It is not an officially supported product although there is a contact email and a FAQ list. It is presented in the form of a C++ class library that can be used to add intelligent agents to existing applications. The agents can be designed with a set of rules to wait on a certain event then trigger an action. The authors give an example of paging a user if a stock quote drops below a certain level, the agent could be monitoring the stock price over the Internet. The toolkit also comes with what are called 'adapters', these are interfaces to the outside world. There is already a HTTP adapter for the web and a NNTP adapter for Usenet. It is also possible for developers to write their own adapters in either C++ or Java. The ABE is just one of a number of projects being carried out by the IBM Intelligent Agents Research programme.

3.2.3.2. ABE EVALUATION

I downloaded the ABE toolkit and took a quick tour of it's capabilities. It comes with a comprehensive tutorial introducing it's features. Although I didn't have sufficient time to create anything useful with it I certainly think it has the potential to allow simple creation of useful applications. This is because it already includes pre-written interface adapters so that a large proportion of the work is already done for you. All that needs to be done is to create a series of rules to tell the agent what to do.

3.2.3.3. AGLETS SELF-CONTAINED JAVA OBJECTS

From the IBM Aglets Web Site - "An Aglet is a Java object that can move from one host on the Internet to another. That is, an Aglet that executes on one host can suddenly halt execution, dispatch to a remote host, and resume execution there. When the Aglet moves, it takes along its program code as well as its state (data). A built-in security mechanism makes it safe for a computer to host untrusted Aglets."[AGL WWW]
The fundamental point here is movement of self-contained objects. This is quite different to typical search engines which reside on a Server with a fast connection to the Internet and send and receive data from there.
The white paper on Aglets available from the IBM Aglets Web Site makes mention of this difference and describes it as a paradigm shift. It describe the current paradigm as distributed, but stationary, and interaction takes place through synchronous message passing. It notes that this paradigm is incomplete and enhancements such as asynchronous message passing, object mobility and active objects are required. Mobile agents, it claims, can provide this extra functionality within a single uniform paradigm that the authors claim is fundamental to distributed object computing.
The reason that the authors claim that this is a paradigm shift rather than just a new way of using agents is that they mention that it will allow new types of web services and even new web businesses. Although the authors fail to give any examples of what sorts of services or businesses these might be.
A Java compatible toolkit is available for download and it has very recently come out of beta testing and is now a full release.

3.2.3.4. AGLETS EVALUATION

Certainly the claims that are made about Aglets are very intriguing and exciting but at present there are very few sites making use of them. This suggests to me that Aglets are a fair bit away from being a paradigm shift if you take the word paradigm to mean 'a viewpoint shared by the majority'. Probably the biggest reason for this slow take-up is simply that a Web Server needs to be running a Aglet Daemon for any Aglet to transfer itself to that server. As these servers are very rare at present Aglet agents are unable to use them to traverse the whole of the Internet. Now that the SDK is out of beta the web site has been able to announce a site that is successfully using Aglets for a flight ticket finding service, unfortunately, the site was in Japanese so I was unable to try it out.

3.3. CONCLUSIONS

I have made a distinction early on in this project between mobile agents and non-mobile agents (see 2.2.4 above). The non-mobile agents are the more progressed of the two types with a larger body of research and projects in existence. At present most agent projects are geared towards locating information on the Internet. But web sites such as IBM Aglets present a vision to us of agents doing more than just finding information but performing tasks on our behalf too. The authors of IBM Aglets refer to a paradigm shift that is taking place to allow this to happen.
Dr Pattie Maes (one of the worlds foremost authorities on Software Agents) presents a similar viewpoint in the paper "Intelligent Software"[MAE WWW]. This idea is expanded upon further in her paper "Agents that reduce Work and Information Overload"[MAES94]. Dr Maes argues that agents can bring great benefits to our lives by learning from our actions and assisting us. Of course, the fundamental question with this prediction is that of how it will be achieved.
Dr Maes view is that "artificial evolution" will be the way forward. This involves agents that improve themselves as they learn. Agents that learn useful things will be used more and can continue evolving, agents that are less useful will eventually no longer be used and will in effect die. At present these systems are still mostly theory, although Dr Maes and her colleagues have built a simple system that can learn and improve the more that it is used.
For these evolutionary agents to interact a mobile agent paradigm would be ideal. Given the size of the Internet there would be endless scope for agents to learn and evolve. It would seem to be the ideal way of delivering this sort of functionality as it doesn't involve re-inventing the wheel. If an agent exists that can do something already then it is easier for your agent to ask that agent to do it on it's behalf than to learn how to do it itself.
I feel sure that mobile agents are the way forward with agent technology The only remaining question then is how long it will take to come to fruition. Projects such as IBM Aglets are using it now but only in small scale systems. It needs more time and larger scale projects to see some real benefits from it. As to the evolutionary agents, they are still a fair way off from being a viable system, but as more and more research is done this will change. The next 5-10 years is when I expect to see a lot of growth in these areas.
ARPA's efforts are a prime example of how progress will be made in this area. Large scale systems development and paradigm shifts can only come about when groups start collaborating (the Internet is one such example). As I stated earlier it will take time for this to come to fruition also. It may be we are on the dawn of a new mobile agent paradigm or it may be that it is more difficult than we first thought and it may not happen for 20 years. I feel sure though that it will happen eventually because we want it to happen, and if we want it enough, we can usually find a way.