The Research on Ajax Dynamic Parse and Application of Ajax Plugin in Nutch

Abstract:

Nutch is an excellent open source web search engine. Its JavaScript plugin has the ability to parse static links but cannot parse dynamic links, so Nutch can’t access the content belonging to deep web that is generated by executing script such as JavaScript. To solve this problem, this paper puts forward to crawl dynamic page content by providing JavaScript engine to execute JavaScript on page in the page crawling phase, so as to enable the existing JSParseFilter plugin to pick up as much as possible page links from deep web. Experiments indicate that Nutch’s ability to access deep web can be improved obviously by using the method in this paper.

Category: Oral Presentation
Time: Wednesday, October 31, 2012 - 13:30 to 15:00

Contact Us

Please contact us if you find any problem about presentation/author/session. Thank you!