Solid documents from the LiquidPub Project

Documents under this tab represent stable snapshots of development and evolution. It is this type of documents that eventually become deliverables, papers and other 'conventional' publications. The documents are grouped by topic area.

LiquidPub members can find here information about how to acknowledge support of LiquidPub.

Foundational Publications

The world of scientific publications has been largely oblivious to the advent of the Web and to advances in ICT. Scientific knowledge dissemination is still based on the traditional notion of “paper” publication and on peer review as quality assessment method. The current approach encourages authors to write many (possibly incremental) papers to get more “tokens of credit”, generating often unnecessary dissemination overhead for themselves and for the community of reviewers. Furthermore, it does not encourage or support reuse and evolution of publications: whenever a (possibly small) progress is made on a certain subject, a new paper is written, reviewed, and published, often after several months.

We propose a paradigm shift in the way scientific knowledge is created, disseminated, evaluated and maintained. This shift is enabled by the notion of Liquid publications, which are evolutionary, collaborative, and composable scientific contributions. Many Liquid Publication concepts in this document are based on a parallel between scientific knowledge artifacts and software artifacts, and hence on lessons learned in (agile, collaborative, open source) software development. Liquid Publications concepts are reified by a model based on i) Scientific Knowledge Objects (SKOs), which are the digital instantiation of liquid publications, by ii) the processes involved in their creation, evolution, and quality assessment, and by iii) the people and roles that contribute to knowledge creation (authors, reviewers, bloggers..). Various models (including social reputation models) are developed to analyze and improve publication quality assessment and the process for attributing credit to and measuring reputation for individuals.

LP Deliverables

For each document, the latest document is linked in the deliverable title, while previous versions are linked as v1, v2, ...

This report presents an overview of the State of the Art in topics related to the on-going research in the Liquid Publications Project. The Liquid Publications Project (LiquidPub) aims to bring fundamental changes to the processes by which scientific knowledge is created, disseminated, evaluated and maintained. In order to accomplish this, many processes and areas will have to be modified. We group the areas involved in this change into four areas: creation and evolution of scientific knowledge, evaluation processes (primarily, peer-review processes and their evaluation), computational trust and reputation mechanisms, and business and process models.  Due to the size and complexity of each of these four areas, we will only discuss topics that are directly related to our proposed research.

This deliverable is intended to provide a last version of the design of the SKO structural model applied within the scope of the LiquidPub project. The structural model includes variable granularity components and relations between them (e.g. citations). This deliverable also includes considerations of features enabled by the SKO specification like evolution tracking, search and navigation, ownership and control models. All these features are oriented towards enabling the LiquidPub project’s applications (e.g. Liquid Journals, Liquid Books and Liquid Conferences).

This deliverable is intended to provide a description of the current status for LiquidPub Core Platform SKO-based API. The LiquidPub SKO-based API is based on the concepts described on D1.2 and it is meant to become the knowledge bus and repository that would connect the three use cases, while also offering a common framework for the implementation of services common to the platform. The actual implementation is based on a REST API interface returning XML representations of data content, thus hiding actual data storage configuration and enabling automatic data browsing by third party programs.

This deliverable reports on copyright and licensing research in LiquidPub. It includes a review of copyright, trademarks and patents and their relationship to scientific discourse, a range of existing licensing models and use-cases, and a discussion of various key points of licensing philosophy. It proposes preliminary licensing models for the various Liquid Publication paradigms—liquid books, liquid journals and liquid conferences—and discusses alternative or extended possibilities to the models proposed. In addition it discusses role and process models for the different Liquid Publication paradigms. Finally, it reports on the implementation and validation activities in the third year.

This deliverable describes our efforts to reach disciplines other than computer science. Besides presenting our research in presentations and papers to audiences other than computer scientists (cf. D6.2 and D6.4), we mainly reached other disciplines through a) the survey on Web2.0 & scientific publishing, b) the liquid conferences organized on the platform Interdisciplines, and c) collaboration between the University of Trento and the Museum of Archaeology and Anthropology in Cambridge. All three achievements will be described in this deliverable. 

This document reports on LiquidPub Management System (LPMaSys) and Gelee System. These are tools, developed in LiquidPub and dealing with the management of research processes. In particular, LPMaSys deals with the specification and the automatic execution of the processes concerned with the creation, dissemination, and evaluation of research work. Gelee allows users to manage lifecycle of any artifact identifiable by a URI (e.g., SKOs such as deliverables, papers), to monitor the progress and automatically execute actions on resources upon the resource entering specific lifecycle states.

In this paper we focus on the analysis of peer reviews and reviewers behavior in conference review processes. We report on the development, definition and rationale of a theoretical model for peer review processes to support the identification of appropriate metrics to assess the processes main properties. We then apply the proposed model and analysis framework to data sets about reviews of conference papers. We discuss in details results, implications and their eventual use toward improving the analyzed peer review processes. Conclusions and plans for future work close the paper.

When users rate objects, a sophisticated algorithm that takes into account ability or reputation may produce a fairer or more accurate aggregation of ratings than the straightforward arithmetic average. Recently a number of authors have proposed different co-determination algorithms where both user and object reputation are iteratively refined together, permitting accurate measures of both to be derived directly from the rating data. These algorithms are of direct relevance to the LiquidPub project because they could find their application also in modern scientific publishing systems where scientists would be allowed to evaluate papers written by others.
Using various artificial datasets, we perform a comparative test of several co-determination ranking algorithms and identify their respective realms of use. In most practical rating systems only a limited range of discrete values (such as the 5-star system of is employed. We test different scales of discrete ratings and show that this seemingly minor modification in fact has a significant impact on algorithms’ performance. Paradoxically, where rating resolution is low, increased noise in users’ ratings may even improve the overall performance of the system.

In many systems, objects from a given set (let it be movies in The Internet Movie Database or books on Amazon) can be rated by individual users. A similar situation occurs in Liquid Journals where readers may be allowed to rate papers and journals. A sophisticated algorithm, taking into account user ability or reputation, may produce a better aggregation of ratings than the simple arithmetic average. Various co-determination algorithms are available to this end with both user and object reputation iteratively refined together and resulting in improved measures of both derived directly from the rating data. However, none of the proposed algorithms has been studied on real data. We use various distinct real datasets to test several ranking algorithms, compare their results, and identify advantages and limits of each algorithm.

D4.1 - Credit Attribution for Liquid Publications

This document is concerned with the issue of credit attribution in Liquid Publications. Credit attribution is crucial for providing researchers with incentives for using the system as well as encouraging them to behave ‘properly’. We propose the use of novel information sources for computing the reputation of both researchers and their research work. We also introduce the notion of liquidity of reputation measures through the OpinioNet propagation algorithm that illustrates how the reputation of one entity may influence the reputation of another. We then argue that our proposal results in ‘fair’ measures that encourage ‘good’ research behaviour in the system, such as giving priority to the quality over the quantity of research work.

D4.3 - Development of the analysis and evaluation plug-ins

This document reports on the OpinioNet reputation module and on Research Impact Evaluation (ResEval), developed in LiquidPub. OpinioNet computes the reputation of both researchers and research work (or SKOs), while ResEval computes the research impact of researchers and SKOs using different metrics (at the moment, citation-based metrics). The integration of the two tools is described in D5.2v1 Integration of plugins.

D5.1 - Design of the Liquid Publications Integrated Platform

This document presents the three use cases identified for the LiquidPub project and the design of the architecture of the Liquid Publication integrated platform. This new version not only introduces advances in the development of the platform but also a more mature understanding of the requirements and possibilities, explored in the three use cases covering the aspects of knowledge creation, dissemination and evaluation.

D5.2 - Integration of Plugins

This deliverable describes how the services of the LiquidPub platform are integrated.

D6.1 - Innovative publisher services for liquid publications

This report is the first version of the Deliverable 6.1. It provides the preliminary results from the ongoing research activities as described in Task 6.1 of the LiquidPub project. The aim of the deliverable is to describe the evolution of the scientific publishing industry over time and to explore how the Liquid publication model might affect it.

D6.2 - Report on Communication, Dissemination and Validation Activities and Annex

In what follows, we provide a detailed description of the dissemination, communication, and validation activities we have set up in our second year. We have listed these activities along the following six axes: (1) contacts with private and institutional partners that may contribute to the implementation of the project, (2) project meetings and inbound communication activities, (3) outbound dissemination activities for a larger audience, (4) scholarly communication: publications and presentations, (5) the LiquidPub survey on scientific publishing and Web 2.0, (6) validation activities. We further account for the first year reviews, assess the fulfilment of the plans for the second year and propose plans for the communication, dissemination and validation activities in the third year.

Scientific Knowledge Objects

Scientific Knowledge Objects v.1  F. Giunchiglia, R. Chenu.

This document introduces the SKO and its associated structures as a response to the needs of a collaborative platform for the creation, dissemination and publication of Complex Artifacts and also as an option to the current paper centered scientific publication practices. The approach presented is based on three Organization levels (Data, Knowledge and Collection) and also three States (Gas, Liquid, Solid) that regulate the properties and operations allowed at each level.

Scientific Knowledge Objects Appendix v.1  F. Giunchiglia, R. Chenu.

This document is a complement to the Scientific Knowledge Objects technical report. The chapters of this document roughly correspond with the chapters in the main document where implementation considerations and other details, that were not included in the main document but still considered important for the approach, are given.

Empirical Analysis

Exploring and Understanding Scientific Metrics in Citation Network  N. Krapivin, M. Marchese and Fabio Casati.

This paper explores scientific metrics in citation networks in scientific communities, how they differ in ranking papers and authors, and why. In particular we focus on network effects in scientific metrics and explore their meaning and impact. We initially take as example three main metrics that we believe significant; the standard citation count, the more and more popular h-index, and a variation we propose of PageRank applied to papers (called PaperRank) that is appealing as it mirrors proven and successful algorithms for ranking web pages and captures relevant information present in the whole citation network. As part of analyzing them, we develop generally applicable techniques and metrics for qualitatively and quantitatively analyzing such network-based indexes that evaluate content and people, as well as for understanding the causes of their different behaviors. We put the techniques at work on a dataset of over 260K ACM papers, and discovered that the difference in ranking results is indeed very significant (even when restricting to citation-based indexes), with half of the top-ranked papers differing in a typical 20-element long search result page for papers on a given topic, and with the top researcher being ranked differently over half of the times in an average job posting with 100 applicants.

Unsupervised Key-Phrases Extraction from Scientific Papers Using Domain and Linguistic Knowledge  Mikalai Krapivin, Maurizio Marchese, Andrei Yadrantsau, Yanchun Liang.

The domain of Digital Libraries presents specific challenges for unsupervised information extraction to support both the automatic classification of documents and the enhancement of users’ navigation in the digital content. In this paper, we propose a combined use of machine learning techniques (i.e. Support Vector Machines) and Natural Language Processing techniques (i.e. Stanford NLP parser) to tackle the problem of unsupervised key-phrases extraction from scientific papers. The proposed method strongly depends on the robust structural properties of a scientific paper as well as on the lexical knowledge that we are able to mine from its text. For the experimental assessment we have use a subset of ACM papers in the Computer Science domain containing 400 documents. Preliminary evaluation of the approach shows promising result that improves – on the same data-set – on state-of-the-art Bayesian learning system KEA from a minimum 27% to a maximum 77% depending on KEA parameters tuning and specific evaluation set. Our assessment is performed by comparison with key-phrases assigned by human experts in the specific domain and freely available through ACM portal.

Focused Page Rank in Scientific Papers Ranking  Mikalai Krapivin, Maurizio Marchese.

We propose Focused Page Rank (FPR) algorithm adaptation for the problem of scientific papers ranking. FPR is based on the Focused Surfer model, where the probability to follow the reference in a paper is proportional to its citation count. Evaluation on Citeseer autonomous digital library content showed that proposed model is a tradeoff between traditional citation count and basic Page Rank (PR). In contrast to basic Page Rank, proposed Focused Surfer model suffers less from the "outbound links" problem. We believe that FPR algorithm is closer to reality because highly cited papers are more visible and tend to attract more citations in future. This is in accordance with the one of the most significant principles of Scientometrics. No need for lexical analysis of the domain corpus and simplicity of implementation are among the strong points of the proposed model and make the proposed ranking technique attractive for academia digital libraries.

Is peer review any good? An analysis framework and large-scale experiments. Fabio Casati, Maurizio Marchese, Azzurra Ragone, Matteo Turrini

Presented at EUROPEAN COMPUTER SCIENCE SUMMIT - ECSS 2009 - 5th Annual INFORMATICS-Europe Meeting - 8-9 October 2009, Paris - slides of the talk given at the Summit can be found here.

Lifecycle, Processes, and Resources

Universal Resource Lifecycle Management  M. Baez, F. Casati, M. Marchese.

This paper presents a model and a tool that allows Web users to define, execute, and manage lifecycles for any artifact available on the Web. In the paper we show the need for lifecycle management of Web artifacts, and we show in particular why it is important that non-programmers are also able to do this. We then discuss why current models do not allow this, and we present a model and a system implementation that achieves lifecycle management for any URI-identifiable and accessible object. The most challenging parts of the work lie in the definition of a simple but universal model and system (and in particular in allowing universality and simplicity to coexist) and in the ability to hide from the lifecycle modeler the complexity intrinsic in having to access and manage a variety of resources, which differ in nature, in the operations that are allowed on them, and in the protocols and data formats required to access them.

Gelee presentation at WISS ICDE workshop, Shanghai, March. M. Baez, F. Casati, M. Marchese.

Gelee: Cooperative Lifecycle Management for (Composite) Artifacts. Marcos Báez, Cristhian Parra, Fabio Casati, Maurizio Marchese, Florian Daniel, Kasia di Meo, Silvia Zobele, Carlo Menapace, Beatrice Valeri ICSOC/ServiceWave 2009: 645-646, DOI: 10.1007/978-3-642-10383-4_50

Credit Attribution

Propagation of Opinions in Structural Graphs  N. Osman, C. Sierra, J. Sabater-Mir.

Trust and reputation measures are crucial in distributed open systems where agents need to decide whom or what to choose. Existing work has mainly focused on the reputation of single entities, neglecting their position amongst others and its effect on the propagation of trust. This paper presents an algorithm for the propagation of reputation in structural graphs. It allows agents to infer their opinion about unfamiliar entities based on their view of related entities. The proposed mechanism focuses on the “part of ” relation to illustrate how reputation may flow (or propagate) from one entity to another. The paper bases its reputation measures on opinions, which it defines as probability distributions over an evaluation space, providing a richer representation of opinions.

Publishing Industry and Business Models

How Science 2.0 is affecting the scientific publishing industry: an analysis of the Web 2.0 initiatives for scientific knowledge production and dissemination R. Cuel, D. Ponte, A. Rossi. Poster at CERN workshop on Innovations in Scholarly Communication (OAI6).

"Internet-based review models for scientific knowledge: a radical innovation?", Pier Franco Camussone, Roberta Cuel, Diego Ponte. Accepted at the 11th European Conference on Knowledge Management, Famalicao, Portugal, September, 2-3, 2010.


Infraestructura web para comunidades y recursos cientificos - Modelo, servicios y metricas - Cristhian Daniel Parra Trepowski
Degree thesis (in spanish), whose content has been developed as part of the LiquidPub project itself.

With the advent of Web era and specially with the current boom of social networking, the scenario in which scientific knowledge is created and disseminated has radically changed. The oportunities of this new scenario have not been fully harnessed yet, specially when it comes to the exploration of groups or communities that naturally arise, evolve and disapear in the scope of scientific research.  In this thesis we analyze the problems and challenges of defining and modeling communities. We stress the importance of identifying such communities with the goal of improving the search capability and stablish the foundations for fairer evaluation methods. This thesis outline (i) a conceptual model for the definition, discovery, maintenance and use of such communities, (ii) the design and implementation of a service infraestructure to interact with scientific entities and communities of the model, and (iii) the design and implementation of a resource space management system that provides seamless access to papers, experiments, presentations, authors or any other scientific resource available and identifiable by an URI (uniform resource identifier).  Furthermore, new evaluation metrics are proposed for scientific work and scientists based on the communities concept as a starting point for future work on the subject.

Discovering scientific communities using conference network - Alejandro Mussi
The book presents a complete model and a tool for the detection of scientific communities based on the relations between conferences (Degree thesis).

