Have you ever been so engrossed in a data analysis project you think you’ll never forget every important detail? Soon enough, it’s a year and 10 projects later and you can’t remember all the many components that made that project a success. Who decided to filter out the data? Why did we choose these specific business rules?
Documentation takes time to do, but in the long run, it saves much more time than it takes. Imagine all the time spent debugging a code notebook to find the issue. That process could take even longer if you didn’t write the code. Documentation helps you collaborate and share with others and is the proverbial “breadcrumb trail” to your future self when you have to troubleshoot or enhance something you created three projects ago.
Dataiku, a collaborative data science software platform, makes documentation easy. The visual coding flow allows you to quickly identify where each major command took place. Here are three areas for additional documentation that are well worth the time:
- Project Description
- Recipe Short Description
- Prepare Recipe Comment
The data used for this example is from a Kaggle project using financial transaction data to detect fraudulent transactions and money laundering.
When creating a project, there should be a clear project description on the project page. Not only does it make it clear why the project exists, but also opens the opportunity for it to be reused by other projects. For example, if the project identifies target markets, another project could use this data for further analysis instead of reinventing the same data cleansing.
The project description should include:
- The project purpose or goal.
- A project sponsor or department owner.
- A short description of the data sources.
- A description of any data that was filtered before entering this project.
To view and edit your project description go to your project homepage by clicking on the project title near the top left corner of the screen. Add a project short description under the title for quick reference and searching.
Recipe Short Description
Imagine opening a new project and being able to understand the way information is flowing and changing through the project without clicking on anything. Visual recipes make it easy to see the functionality of each recipe, but you can follow the builder’s train of thought even better by adding a recipe short description. This description is displayed when you hover over a recipe, allowing for a simple overview of each step in the flow without clicking into the recipes themselves.
The recipe short description should include:
- The purpose of the recipe.
- Business logic or reasons for removal of rows.
- Key facts that would allow the user to understand the flow.
To edit the short description, click on the recipe and go to the information panel on the right side of the screen. Under the details header, navigate to About and click Edit. Enter the short description and click Save. You can also add a long description which will show in the Details panel to the right. In the long description, you can even use Markdown to add formatting.
Prepare Recipe Comment
Within the prepare recipe, Dataiku allows you to create steps as part of the preparation script. While these steps give a lot of transparency into the actions taken on the data, it is still important to add comments to any step that isn’t straightforward, or steps you want to remember the reasoning behind.
There should be a comment on the step if:
- There was business logic incorporated.
- Any step rows are removed.
- A column has been duplicated or “find and replace” was used.
- It is a formula step.
- A column is renamed.
To add a comment to a preparation recipe step, open the preparation recipe and navigate to a step in the script panel on the left. Click on the “i” symbol on the step to comment and edit. Select Always show comment to keep it in view.
A few minutes of documentation will save you time, help you collaborate with others, and provide you with troubleshooting assistance in the future. Dataiku’s visual recipes already provide valuable documentation out of the box with other great features such as wikis, dashboards, and the timeline on the project page. These features combined with our list of additional documentation will ensure your project stays clean and tidy and ready to reuse again!
Ready to learn more about documentation in Dataiku or other best practices from a top Dataiku partner?