The first scientific search engine is CiteSeer, which was launched in 1997 and focuses on freely available literature in the field of computer and information sciences. The search engine is still in use today, now as a further development under the name CiteSeerX4. Besides the thematic CiteSeer is regarded as the first search service to focus on collecting and indexing documents from personal websites of scientists and from increasingly available repositories using crawlers and parsers. Another special feature is the automatic retrieval and indexing of the references contained in the documents. Based on this, CiteSeer offers for the first time a tool for citation analysis and preparation of scientific documents from the web.

CiteSeer picks up on the approaches of Garfield's Citation Index for Science (the evaluation and compilation of literature sources from publications) and determines the context of a quotation in the respective publication that was reproduced as an excerpt in a query. Furthermore, CiteSeer calculates hubs (articles that cite many highly cited articles) and authorities (highly cited articles). Based on different algorithms, CiteSeer determines similar documents. For this it combines word vectors of similar words in different articles; similarity investigations of titles and the product of so-called "Common Citation" with the inverse document frequency to find articles with similar citations. An initial graphical presentation of the results of a query was also offered: A graph showing the number of citations in relation to the publication year. Thus CiteSeer offered for the first time a tool which made it possible to find scientific documents available on the web and to analyze them according to scientific theoretical aspects.

In 2008, a new architecture and a new data model were developed and CiteSeer was renamed to CiteSeerX.

Language English

Launched 2008, previously known as CiteSeer
Closed No

Developer College of Information Sciences and Technology - Penn State

Country of Origin US America

2008 - [...] College of Information Sciences and Technology - Penn State

Topic Academic, Scientific or Educational Search engine

Region No Limitation

Technical functionalities
Robot/Crawler based, algorithmic search
Search engine for special file formats

Used SeEn CiteSeer / CiteSeerX

Older Version Internet Archive / WebCite

»CiteSeer was the first digital library and search engine to provide automated citation indexing and citation linking using the method of autonomous citation indexing.
CiteSeer was developed in 1997 at the NEC Research Institute, Princeton, New Jersey, by Steve Lawrence, Lee Giles and Kurt Bollacker. The service transitioned to the Pennsylvania State University's College of Information Sciences and Technology in 2003. Since then, the project has been led by Lee Giles with technical and administrative direction by Isaac Councill.
After serving as a public search engine for nearly ten years, CiteSeer, originally intended as a prototype only, began to scale beyond the capabilities of it's original architecture. Since it's inception, the original CiteSeer grew to index over 750,000 documents and served over 1.5 million requests daily, pushing the limits of the system's capabilities. Based on an analysis of problems encountered by the original system and the needs of the research community, a new architecture and data model was developed for the "Next Generation CiteSeer," or CiteSeerx, in order to continue the CiteSeer legacy into the foreseeable future.
CiteSeerx is a scientific literature digital library and search engine that focuses primarily on the literature in computer and information science. CiteSeerx aims to improve the dissemination of scientific literature and to provide improvements in functionality, usability, availability, cost, comprehensiveness, efficiency, and timeliness in the access of scientific and scholarly knowledge.
Rather than creating just another digital library, CiteSeerx attempts to provide resources such as algorithms, data, metadata, services, techniques, and software that can be used to promote other digital libraries. CiteSeerx indexes PostScript and PDF research articles on the Web « Source


"Next Generation CiteSeer" Source

Critical points

Features & Functionality


Example results page for "sand": Source

References & further Publications

Wikipedia (EN):
Wikipedia (Others): n.a.

Other Sources

About CiteSeerX URL:
NEC Laboratories America & Pennsylvania State University School of Information Sciences and Technology. (2007). About CiteSeer URL:
Ortega, J. L. (2014). Academic search engines: a quantitative outlook. Oxford: Chandos Publishing. URL:
Garfield, E. (1955). Citation Indexes for Science: A New Dimension in Documentation Through Association of Ideas. Science, 122(3159), 108–111. URL:
Lawrence, S., Giles, C. L., & Bollacker, K. (1999). Digital Libraries and Autonomous Citation Indexing. IEEE Computer, 32(6), 67–71. URL:
About CiteSeerX URL:

Created: 2015-10-03