Vizzuality playbook in progress
This project is maintained by Vizzuality
We want a minimum check to be applied to different aspects of the project to ensure that the end-product meet our high standards. We have identified several key criteria to check in four broad areas of a project’s life cycle.
There are a number of discovery activities included in the main Project workflow, which are broken into smaller steps in the long version of the Research Workflow. In this document we outline the steps we need to take to ensure a quality delivery of each output.
Write in plain english. Not all clients will know what a persona is etc. Make sure to present clearly what each output can be used for. Consider using Hemingway to test the readability of your work. Give adequate time for review. Notify colleagues ahead of time when to expect a draft. Once they have the document, give them enough time to read. Keep reflecting on the research methods doc. Every project is an opportunity to try a new approach, or refine our existing methodology. After each discovery period the team should reflect on what was achieved, and note down any improvements in that document.
While individual projects and datasets may differ significantly, broadly speaking, where possible we should be attempting to ensure the following criteria are met:
Aim for open source and open access Datasets are being held within projects in a way that makes them openly accessible and with licenses that enable their reuse.
Keep a clear chain of custody and metadata We should aim to make the data provenance clear: i.e. document the point of origin, chain of custody and make clear any transformations or pipelines which have been applied to datasets that may have modified their contents. Datasets should also be described with metadata.
Encourage dataset reuse across projects We should avoid siloing datasets and pipelines to individual projects. Datasets (and the knowledge of how to work with them) are resources, ideally we would want to use these resources again in the future. This is particularly important as we have many projects with overlapping spheres of interest. Where possible projects should encourage the use of datasets in a way that builds off of the usability and knowledge of datasets already used in the company, and in turn shares new datasets where possible.
Ensure data transformations and pipelines clear and reproducible We should at minimum be able to easily understand and replicate any modification to the original datasets, and produce any derivative data generated over the course of a project. Key practices that can ensure this include the use of technology like Docker and Jupyter notebooks.
Ensure we are delivering as much value as is reasonably possible Often, the majority of time can be spent in the preparation and data engineering and bug fixing side of a project’s data tasks. We need to ensure that in addition to that we should make sufficient effort to deliver value beyond simply engineering the datasets into a minimally useable format - we should be aiming to deliver products that derive meaning and insights from raw data.