UDC 004.41

THE PARSER DEVELOPMENT FOR EXTRACTING AND ANALYZING DATA IN THE COURT-RELATED FIELD

Shchukova Kristina Borisovna
National Research Tomsk Polytechnic University

Abstract
The article is intended to analyze various data obtained from websites of regional and district Tomsk courts. The process of comparing structure web pages and parsing HTML pages using PHP and C# is considered in details. Near-duplicates and shingling, as well as regular expressions and Levenshtein distance stand for analyzing and comparing texts, sentences and words. Due to these algorithms, the issue relating to extraction of necessary units can be sorted out effectively and quite accurately.

Keywords: Data Mining, OLAP


Article reference:
The parser development for extracting and analyzing data in the court-related field // Modern technics and technologies. 2015. № 11 [Electronic journal]. URL: https://technology.snauka.ru/en/2015/11/8162

View this article in Russian

Sorry, this article is only available in Русский.



All articles of author «Щукова Кристина Борисовна»


© If you have found a violation of copyrights please notify us immediately by e-mail or feedback form.

Contact author (comments/reviews)

Write comment

You must authorise to write a comment.

Если Вы еще не зарегистрированы на сайте, то Вам необходимо зарегистрироваться: