UDC 004.41

THE PARSER DEVELOPMENT FOR EXTRACTING AND ANALYZING DATA IN THE COURT-RELATED FIELD

Shchukova Kristina Borisovna
National Research Tomsk Polytechnic University

Abstract
The article is intended to analyze various data obtained from websites of regional and district Tomsk courts. The process of comparing structure web pages and parsing HTML pages using PHP and C# is considered in details. Near-duplicates and shingling, as well as regular expressions and Levenshtein distance stand for analyzing and comparing texts, sentences and words. Due to these algorithms, the issue relating to extraction of necessary units can be sorted out effectively and quite accurately.

Keywords: Data Mining, OLAP

Article reference:
The parser development for extracting and analyzing data in the court-related field // Modern technics and technologies. 2015. № 11 [Electronic journal]. URL: https://technology.snauka.ru/en/2015/11/8162

Sorry, this article is only available in Русский.

All articles of author «Щукова Кристина Борисовна»

For Authors

About journal

THE PARSER DEVELOPMENT FOR EXTRACTING AND ANALYZING DATA IN THE COURT-RELATED FIELD