The first scientific search engine is CiteSeer, which was launched in 1997 and focuses on freely available literature in the field of computer and information sciences. The search engine is still in use today, now as a further development under the name CiteSeerX4. Besides the thematic CiteSeer is regarded as the first search service to focus on collecting and indexing documents from personal websites of scientists and from increasingly available repositories using crawlers and parsers. Another special feature is the automatic retrieval and indexing of the references contained in the documents. Based on this, CiteSeer offers for the first time a tool for citation analysis and preparation of scientific documents from the web.

CiteSeer picks up on the approaches of Garfield's Citation Index for Science (the evaluation and compilation of literature sources from publications) and determines the context of a quotation in the respective publication that was reproduced as an excerpt in a query. Furthermore, CiteSeer calculates hubs (articles that cite many highly cited articles) and authorities (highly cited articles). Based on different algorithms, CiteSeer determines similar documents. For this it combines word vectors of similar words in different articles; similarity investigations of titles and the product of so-called "Common Citation" with the inverse document frequency to find articles with similar citations. An initial graphical presentation of the results of a query was also offered: A graph showing the number of citations in relation to the publication year. Thus CiteSeer offered for the first time a tool which made it possible to find scientific documents available on the web and to analyze them according to scientific theoretical aspects.

Language English

Launched 1997
Closed 2008, renamed to CiteSeerX

Developer NEC Research Institute, Princeton, New Jersey, (by Steve Lawrence, Lee Giles and Kurt Bollacker)

Country of Origin US America

1997 - 2008 NEC Laboratories America, Inc. and College of Information Sciences and Technology - Penn State

Topic Academic, Scientific or Educational Search engine

Region No Limitation

Technical functionalities
Robot/Crawler based, algorithmic search
SeEn with analysing, data mining tools
Search engine for databases, repositories, portals and other closed (deep web) or open content collections
Targeted Web Spider

Used SeEn CiteSeer / CiteSeerX

Older Version Internet Archive / WebCite

»CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking using the method of autonomous citation indexing.
CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Lee Giles with technical and administrative direction by Isaac Councill.
After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of it's original architecture. Since it's inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.
CiteSeerx is a scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge.
Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeerx indexes PostScript and PDF research articles on the Web « Source


Critical points

Features & Functionality


References & further Publications

Wikipedia (EN):
Wikipedia (Others): n.a.

Other Sources

NEC Laboratories America & Pennsylvania State University School of Information Sciences and Technology. (2007). About CiteSeer URL:
Ortega, J. L. (2014). Academic search engines: a quantitative outlook. Oxford: Chandos Publishing. URL:
Garfield, E. (1955). Citation Indexes for Science: A New Dimension in Documentation Through Association of Ideas. Science, 122(3159), 108–111. URL:
Lawrence, S., Giles, C. L., & Bollacker, K. (1999). Digital Libraries and Autonomous Citation Indexing. IEEE Computer, 32(6), 67–71. URL:
About CiteSeerX URL:

Created: 2015-10-03