Università degli Studi di Udine OpenUniud - Archivio istituzionale delle tesi di dottorato

OpenUniud - Archivio istituzionale delle tesi di dottorato >
Udine Thesis Repository >
01 - Tesi di dottorato >

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/10990/819

Autori: Pavan, Marco
Supervisore afferente all'Università: MIZZARO, STEFANO
Titolo: Effectiveness of Data Enrichment on Categorization: Two Case Studies on Short Texts and User Movements
Abstract (in inglese): The widespread diffusion of mobile devices, e.g., smartphones and tablets, has made possible a huge increment in data generation by users. Nowadays, about a billion users daily interact on online social media, where they share information and discuss about a wide variety of topics, sometimes including the places they visit. Furthermore, the use of mobile devices makes available a large amount of data tracked by integrated sensors, which monitor several users’ activities, again including their position. The content produced by users are composed of few elements, such as only some words in a social post, or a simple GPS position, therefore a poor source of information to analyze. On this basis, a data enrichment process may provide additional knowledge by exploiting other related sources to extract additional data. The aim of this dissertation is to analyze the effectiveness of data enrichment for categorization, in particular on two domains, short texts and user movements. We de- scribe the concept behind our experimental design where users’ content are represented as abstract objects in a geometric space, with distances representing relatedness and similarity values, and contexts representing regions close to the each object where it is possibile to find other related objects, and therefore suitable as data enrichment source. Regarding short texts our research involves a novel approach on short text enrichment and categorization, and an extensive study on the properties of data used as enrich- ment. We analyze the temporal context and a set of properties which characterize data from an external source in order to properly select and extract additional knowledge related to textual content that users produce. We use Twitter as short texts source to build datasets for all experiments. Regarding user movements we address the problem of places categorization recognizing important locations that users visit frequently and intensively. We propose a novel approach on places categorization based on a feature space which models the users’ movement habits. We analyze both temporal and spa- tial context to find additional information to use as data enrichment and improve the importance recognition process. We use an in-house built dataset of GPS logs and the GeoLife public dataset for our experiments. Experimental evaluations on both our stud- ies highlight how the enrichment phase has a considerable impact on each process, and the results demonstrate its effectiveness. In particular, the short texts analysis shows how news articles are documents particularly suitable to be used as enrichment source, and their freshness is an important property to consider. User Movements analysis demonstrates how the context with additional data helps, even with user trajectories difficult to analyze. Finally, we provide an early stage study on user modeling. We exploit the data extracted with enrichment on the short texts to build a richer user profile. The enrichment phase, combined with a network-based approach, improves the profiling process providing higher scores in similarity computation where expected
Parole chiave: Enrichment; Categorization; Short texts; User movements; User modeling
MIUR : Settore ING-INF/05 - Sistemi Di Elaborazione Delle Informazioni
Lingua: eng
Data: 3-apr-2017
Corso di dottorato: Dottorato di ricerca in Informatica
Ciclo di dottorato: 28
Università di conseguimento titolo: Università degli Studi di Udine
Luogo di discussione: Udine
Altre informazioni: Co-supervisore: Ivan Scagnetto
Citazione: Pavan, M. Effectiveness of Data Enrichment on Categorization: Two Case Studies on Short Texts and User Movements. (Doctoral Thesis, Università degli Studi di Udine, 2017).
In01 - Tesi di dottorato

Full text:

File Descrizione DimensioniFormatoConsultabilità
Pavan_PhD_thesis.pdfTesi finale2,74 MBAdobe PDFVisualizza/apri

Tutti i documenti archiviati in DSPACE sono protetti da copyright. Tutti i diritti riservati.

Segnala questo record su




Stumble it!



  ICT Support, development & maintenance are provided by CINECA. Powered on DSpace SoftwareFeedback CINECA