DaCENA (Data Context for News Articles) is a web application conceived, designed and developed by the Department of Informatics, Systems and Communication (DISCO) of the University of Milano – Bicocca and DensityDesign Research Lab. DaCENA is a tool for exploring Knowlege Graphs, graphs on which node are entities and edges are relations between two entities. The web application showcases a new approach to reading online news articles with the support of a data context built from interlinked facts available in the Web of Data. Given a source article, a set of facts that are estimated to be more interesting for the readers are extracted from the Web and presented using tailored information visualization methods and an interactive user interface. By looking at these facts, the reader access background factual knowledge with the advantage of being supported in the interpretation of the news content and being suggested connections to related topics to further explore.
DaCENA, given a source article, finds the entities inside the text using Named Enity Recognition (NER) tools. The entities are then used to search paths from each of the entities towards the others. We consider path of maximum length equal to 3. DaCENA uses the DBpedia Knowledge Graph to extracts the paths. Example: if inside an article DaCENA has found the entity Hillary Clinton and the entity Bernie Sanders, one possible path that could be found by the application, using the DBpedia SPARQL endpoint, could connect both the entities Bernie Sanders and Hillary Clinton to the entity Democratic Party with the relation party.
This path is a semi-walk on the graph (multi-graph) and we can define it has semantic association. DaCENA extracts many associations from the graphs that are stored in a database. To evaluate the association a new measures has been developed. We called this measure serendipity,the main aim of the measure is to find associations that are both interesting (relevant) and new (unexpected) for a user. Serendipity is built has a combination of two measures relevance and rarity. An association is relevant if the concatenation of the entities’ abstracts (short pieces of text that describe each entity) is similar (textual similarity, ex. cosine similarity) to the text of the article. An association is rare when it’s composed by property (edges) that are used less frequently inside the Knowledge Graph (ex. DBpedia). Serendipity is than built as the combination of the tow measure explained above; the value of this measure can be calculated using the following formula: serendipity(Association,Article)=α relevance(Association,Article)+β rarity(Association) Where alpha and beta are two values that sum to 1 and can be used in the interface to personalize results.
Palmonari, M., Uboldi, G., Cremaschi, M., Ciminieri, D., & Bianchi, F. (2015, May). DaCENA: Serendipitous News Reading with Data Contexts. In European Semantic Web Conference (pp. 133-137). Springer International Publishing.