
Modelling


The elusive database / process-based
provenance convergence:
"Is there a common notion of provenance
that can view wf provenance
and db provenance as different flavors
of the same broad model?"

DFL - IS'08

Hidders, Jan, Natalia Kwasnikowska, Jacek Sroka, Jerzy Tyszkiewicz, and Jan Van den Bussche. “DFL: A dataflow language based on Petri nets and nested relational calculus.” Information Systems 33 (2008): 261-284. 


IPAW'08

Kwasnikowska, Natalia, and Jan den Bussche. âMapping the NRC Dataflow Model to the Open Provenance Model.â In Provenance and Annotation of Data and Processes, edited by Juliana Freire, David Koop, and Luc Moreau, 5272:3-16. Springer Berlin / Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89965-5_3. 


PVLDB'12 - the PigLatin paper

Olston, Christopher, Benjamin Reed, Utkarsh Srivastava, Ravi Kumar, and Andrew Tomkins. âPig latin: a not-so-foreign language for data processing.â In Proceedings of the 2008 ACM SIGMOD international conference on Management of data, 1099-1110. New York, NY, USA: ACM, 2008. http://doi.acm.org/10.1145/1376616.1376726. 


Contributions

A framework that marries database-style and workflow provenance models, capturing internal state as well as fine-grained dependencies in workflow provenance, when data operations are expressed using Pig Latin expressions.

A graph-based representation of fine-grained provenance for workflows, which also captures module invocations and module state changes.


Joint modelling of execution traces
and process specification


Privacy-preserving provenance

position / early papers

WANDS'10 - Davidson et al.

Davidson, Susan B., Sanjeev Khanna, Sudeepa Roy, and Sarah Cohen Boulakia. “Privacy issues in scientific workflow provenance.” In First International Workshop on Workflow Approaches to New Data-centric Science (WANDS’10), edited by Paolo Missier, Vasa Curcin, and Susan Dadvidson. Indianapolis: ACM, 2010. http://portal.acm.org/citation.cfm?id=1833398.1833401. 


SSDBM'11 - Dey, Zinn, et al.

Dey, Saumen, Daniel Zinn, and Bertram Ludäscher. “ProPub: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance.” In Scientific and Statistical Database Management, edited by Judith Bayard Cushing, James French, and Shawn Bowers, 6809:225-243. Springer Berlin / Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-22351-8_13. 


PODS'11 - Davidson et al.

Davidson, Susan B, Sanjeev Khanna, Tova Milo, Debmalya Panigrahi, and Sudeepa Roy. âProvenance views for module privacy.â In Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 175-186. New York, NY, USA: ACM, 2011. http://doi.acm.org/10.1145/1989284.1989305. 


Collecting

Merging Provenance from different executions:
the virtual experiment

the DataONE "Data Tree of Life" project

Missier, Paolo, Bertram Ludascher, Shawn Bowers, Manish Kumar Anand, Ilkay Altintas, Saumen Dey, Anandarup Sarkar, Biva Shrestha, and Carole Goble. "Linking Multiple Workflow Provenance Traces for Interoperable Collaborative Science." In Proc.s 5th Workshop on Workflows in Support of Large-Scale Science (WORKS), 2010. 


Secure / tamper-evident /
trustworthy provenance for workflows
Police forces are becoming very interested in this

The fake Picasso paper (TAPP 2009)

Hasan, Ragib, Radu Sion, and Marianne Winslett. âThe Case of the Fake Picasso: Preventing History Forgery with Secure Provenance.â In TAPP 2009, 1-14, 2009. http://www.usenix.org/events/fast09/tech/full_papers/hasan/hasan.pdf.

Scure and tamper-proof provenance chain. No-one can add or remove entries from the middle of the chain without detection

The primary threat we guard against in this paper is undetected rewrites of history, which occur when ma-licious entities forge provenance chains to match illicit document writes and metadata modifications.

trusted provenance @Tapp 10

Secure data management 2009

Zhang, Jing, Adriane Chapman, and Kristen LeFevre. âDo You Know Where Your Dataâs Been? â Tamper-Evident Database Provenance.â In Secure Data Management, edited by Willem Jonker and Milan Petkovic, 5776:17-32. Springer Berlin / Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-04219-5_2. 


Collecting and warehousing provenance:
PBase / GoldenTrail

IDCC'11 conference on data curation

Missier, Paolo, Bertram Ludascher, Shawn Bowers, Ilkay Altintas, Saumen Dey, and Michael Agun. Golden Trail: Retrieving the Data History that Matters from a Comprehensive Provenance Repository. In Procs. 7th International Digital Curation Conference. Bristol,UK, 2011. http://www.dcc.ac.uk/events/idcc11. 
