Chinese Internet Text View Extraction Management Software
Time
From 2017-08 to 2017-12
Project Introduction
The project is the first project to cooperate with the Intelligence Institute (BISTI) after I enter the laboratory. It aims to simplify one or a batch of Internet news content for users and sends a message in one sentence.
Procedure
By crawling a large number of Internet text files and data cleaning, pretreatment, word vector conversion and other operations, it’s to extract the core viewpoints of Internet text. In other words, it is an application of “automatic text summary”. The comparison is performed in two ways. The one is the design of word frequency abstract algorithm based on traditional machine learning (e.g. textrank, etc.). The second is based on in-depth learning seq2seq. Through this project, I am familiar with some background knowledge and related research in my field, and follow up several development directions and frontier algorithms in the field of NLP.