The Wayback Machine - http://web-wp.archive.org/web/20230120174955/https://www.researchgate.net/post/Looking-for-an-old-paper-on-a-circuit-board-information-retrieval-system-implementation
Looking for an old paper on a circuit-board information retrieval system implementation?
Many years ago I read a paper on a hardware implementation of an information retrieval system. It was implemented as a circuit board, where the query would be set by putting jumpers on one side of the board and the result would be indicated by LEDs or the equivalent on another side of the board. The math behind it was very insightful, and I'd love to find it again, but I've been unable to. The paper was written (probably well) before 1975, perhaps even in the 1950's. I vaguely remember that the primary author's name began with an S but that's as far as I've gotten. (I'm not thinking of Vannevar Bush's Memex.)
In order to obtain useful content encompassed by a large number of irrelevant information, the content extraction becomes indispensable for web data application. An approach of web content extraction based on the text density model is proposed, which integrates page structure features with language features to convert text lines of page document in...
The Web has continued to grow up since its inception in volume of information, in the complexity of its
topology, as well as in its diversity of content and services. This phenomenon was transformed the web in
spite of his young age to an obscure media to take useful information. Today, they are billions of HTML
documents, images and other media...
Web pages consist of not only actual content, but also other elements such as branding banners, navigational elements, advertisements, copyright etc. This noisy content is typically not related to the main subjects of the webpages. Identifying the part of actual content, or clipping web pages, has many applications, such as high quality web printin...