PROVIT - PROVenance Integration Tools

Python 3.6 GitHub license GitHub issues Docs passing

PROVIT is a light, dezentralized data provenance and documentation tool. It allows the user to track workflows and modifications of data-files.

PROVIT works completely decentralized, all information is stored in .prov files (as JSON-LD RDF graphs) along it’s corresponding data file in the file system. No additional database or server setup is needed.

A small subset of the W3C PROV-O vocabulary is implemented.

PROVIT aim to provided an easy to use interface for users who have never worked with provenance tracking before. If you feel limited by PROVIT you should have a look at more extensive implementations, e.g.: prov.

Full documentation is available under: provit.readthedocs.io.

Requirements

This software was tested on Linux with Python 3.5 and 3.6.

Installation

Installation via pip is recommended for end users. We strongly encourage end users to make use of a virtualenv.

pip

Clone the repository and create a virtual environment (optional) and install into with pip into the virtualenv.

$ mkvirtualenv provit
$ pip install provit

git / Development

Clone the repository and create a virtualenv.

$ git clone https://github.com/diggr/provit
$ mkvirtualenv provit

Install it with pip in editable mode

$ pip install -e ./provit

Usage

PROVIT provides a command line client which can be used to enrich any file based data with provenance information.

PROVIT also includes a (experimental) web-based interface (PROVIT Browser).

Command Line Client

Usage:

Open PROVIT Browser:

$ provit browser

Add provenace event to a file:

$ provit add FILEPATH [OPTIONS]

Options:

-a AGENT, --agent AGENT
 Provenance information: agent (multiple=True)
--activity ACTIVITY
 Provenance information: activity
-d DESCRIPTION, --desc DESCRIPTION
 Provenance information: Description of the data manipulation process
-o ORIGIN, --origin ORIGIN
 Provenance information: Data origin
-s SOURCES, --sources SOURCES
 Provenance information: Source files (multiple=True)
--help Show this message and exit.

Provenance Class

from provit import Provenance

# load prov data for a file, or create new prov for file
prov = Provenance(<filepath>)

# add provenance metadata
prov.add(agents=[ "agent" ], activity="activity", description="...")
prov.add_primary_source("primary_source")
prov.add_sources([ "filepath1", "filepath2" ])

# return provenance as json tree
prov_dict = prov.tree()

# save provenance metadata into "<filename>.prov" file
prov.save()

Roadmap

General roadmap of the next steps in development

  • Tests
  • Tutorials
  • Windows support
  • Agent management in PROVIT Browser

Overview

Authors:P. Mühleder muehleder@ub.uni-leipzig.de, F. Rämisch raemisch@ub.uni-leipzig.de
License:MIT
Copyright:2018, Peter Mühleder and Universitätsbibliothek Leipzig