Skip to content
Thread Background

THREAD®️

THREAD®️ together your Dataiku datasets and created definitions to formalize a catalog for improved understanding and governance.

POWERFUL CATALOGING AT YOUR FINGERTIPS

WELCOME TO THREAD®️

At Snow Fox Data, we specialize in understanding the unique challenges our customers have throughout their data journey. Time and again, we heard clients express challenges with documenting data and viewing upstream/downstream data lineage. Clients like you asked for help and we deliveredA free, lightweight cataloging tool, THREAD®️ ties together your data lineage and provides a single location to document data connected to Dataiku.

THREAD®️ is proud to be recognized as one of two finalists for the 2022 Dataiku Frontrunner Awards in the Partnership Acceleration Category.

FEATURES AND BENEFITS

Data Visualizations Phone Computer Analytics

 

THREAD® Logo 2024-1

Are you challenged to find the right data cataloging tool for your business? See what our THREAD®️ plugin can do for you:

  • Increase efficiency and visibility
  • Improve governance
  • Improve trust in your data
  • Streamline training
  • Cost savings

 

Ready to install the  THREAD®️ plugin?

Use the below installation guide to get started.

YOUR GUIDE TO INSTALLING THREAD®️

INSTALLATION

To install THREAD®️ in DSS, open the Apps menu, click Plugins, and search for Thread. You can also download a zipped version through the button below. 

Download THREAD️

1.  Create a new blank project in Dataiku. Suggested name is “Thread Catalog”.

Dataiku Thread Screen Shot Catalog


Dataiku Thread Screen Shot Catalog


2.  In this new project, navigate to the “Webapps” landing page.

Dataiku Thread Screen Shot webapps

3.  Click “Create your first webapp” and select “Visual Webapp”.

Dataiku Thread Screen Shot visal webapp

4.  Select the “Thread” plugin option and name this new Webapp “Thread Catalog”.

Dataiku Thread Screen Shot

5.  In the following page, select “Auto-start backend”, then “Save and view webapp”.



For “Run backend as”, select a user with proper privileges. Your THREAD®️ installation will be able to index all projects accessible by the selected backend user.

6.  Navigate to the link on the THREAD®️ plugin page from Step 5.
dataiku thread screen shot plugin page
7.  Click the “Scan” icon in the Thread UI to start the Dataiku server scan.

dataiku thread screen shot server scan

NOTE:  Indexing can be limited to specific projects using project tags. Specify the tags to be included by adding a “limit_to_tags” project variable to the Thread project as shown in the following example:

dataiku thread screen shot limited tags

 

Automating THREAD®️ Full Rescan

THREAD®️ can be configured to automatically rescan on a scheduled basis by simply adding a new project variable “rescan_cron” to the installed THREAD®️ project. This variable defines the cron schedule that a full automated scan will execute on. This scan WILL NOT delete any definitions. 

Cron reference: https://crontab.guru/

dataiku thread screen shot global variables

Congratulations! You have successfully installed the THREAD®️ plugin!

Can't wait to start using the  THREAD®️ plugin?

These user instructions are here to help you get started.

HOW TO USE THREAD®️

USER INSTRUCTIONS

Getting Started

You must be logged into Dataiku to access THREAD®️. To access THREAD®️ data while in Dataiku go to your THREAD®️ project and then Webapps.

Select "Thread Data".

To open this as a public webapp click the link.

 

Home Screen

Search

This search bar allows you to search for columns, datasets, projects and definitions. Uncheck the boxes to filter the results.

Dataiku Instance Stats

This is a summary of all projects, datasets, columns, and definitions in only this node of Dataiku that have been indexed.

 

View Search Results from the Home Page

Project

You can see on this page

  1. Project Name
  2. Project ID
  3. Project owner
  4. Project folder
  5. Project Tags
  6. Datasets tab
    1. Name: All datasets in this project.  Clicking this will take you to this dataset’s result page
    2. Documentation Status: How many columns have been documented in each data set
  7. DSS tab
    1. Project flow in Dataiku
  8. Documentation Status of all columns in the entire project
  9. Options
    1. Open project in new tab: new tab to view this project information page
    2. Rescan project: for bringing in new datasets and columns from Dataiku

Dataset

You can see on this page:

  1. Dataset Name
  2. Storage type of the dataset
  3. Project Name of this data set.  You can click this to view the project results page
  4. Columns
    1. Name: Column Name of all columns in this dataset
    2. Type: Data type
    3. Description: Column Description
  5. Lineage
    1. Source dataset (even if in a different project)
    2. Current dataset highlighted in blue
    3. Final dataset(s)
    4. Numbers between the boxes indicate count of datasets between these datasets
    5. The blue arrow in the top right corner of each box allows you to jump to that datasets results page
      DSS: View the dataset explore page in Dataiku
  6. Documentation Status: Total number and percentage of columns in this dataset that have been defined
  7.  Options
    1. Open Dataset in a new Tab

Column

You can see on this page:

  1. Column Name
  2. Data type
  3. Project Name.  You can click this to view this project results page
  4. Dataset name. You can click this to view the dataset results page
  5. Definition
    1. Definition Name
    2. Definition Tags
    3. Description of definition
  6. Lineage
    1. Source dataset of this column (even if in a different project)
    2. Dataset the column you are viewing is located in highlighted in blue
    3. Final dataset(s) this column goes to
    4. Numbers between the boxes indicate count of datasets between these datasets
    5. The blue arrow in the top right corner of each box allows you to jump to that dataset results page
  7. DSS: View the dataset with this column in Dataiku explore page
  8. Documentation Status: Whether or not this column has a definition
  9. Options
    1. Open Dataset in a new Tab

Create & Edit Definitions

To create, edit or remove a definition from a column, find the column using the home screen search. 

Apply a Definition to a Column

Click Add Definition

If a definition for this column already exists, “Search for Definition” in the bottom left corner.  Find the correct definition and apply to this column.

If this is a brand new definition enter:

  1. Name of definition: This is a business-friendly name for the definition that may be applied to many columns that do not have the same name.
  2. Tags: Apply new tags or existing tags by typing the tag name
  3. Description: definition of the word

Select Applied Data Sets: Select which datasets you would like this to apply to in the lineage.  Any places this has been applied previously will automatically select and can be unselected.

Catalog Screen

View all definitions on this node of Dataiku.  One definition may be applied to many columns, even if they don’t have the same name.

You are able to view:

  1. Definition name: Sort this alphabetically ascending or descending by pressing the arrow next to the title. Click this link to edit or delete (using options) the definition
  2. Description: Sort this alphabetically ascending or descending by pressing the arrow next to the title
  3. Tag: Customizable metadata added when assigned to a column

Navigation

  1. Search by name, definition or tag using the top right search bar
  2. Filter by tag in the filter bar
  3. Filter by tag by clicking a tag

Click on the name to edit or delete

From this page you can view or edit

  1. Column Name
  2. Name of the definition (this does not need to match the column name)
  3. Tags
  4. Applied to: if you click this will take you to that dataset results 
  5. Description: definition in business term

To delete this definition options > delete

Adding New Data

In order to add new projects, datasets and columns from Dataiku into thread you need to rescan DSS using the arrow next to your username.

Sending Links

You can share a link to a specific page in THREAD®️ with anyone who has security access in Dataiku and permission to “read project content” in the THREAD®️ project.

You've now completed the users guide and you're ready to start using the plugin!

You asked. We Answered.

Our FAQ's are here to help you along the way.

FREQUENTLY ASKED QUESTIONS

Why can’t I see my project, dataset, or column in the search results, or lineage screen?

Thread needs to index any new projects, datasets, or columns.  You can do this for all projects datasets and columns in your instance by going to the arrow in the top right corner next to your username.

Alternatively, you can do this per project by searching for that project on the home screen and using options > Rescan Project.  This method is suggested for those instances with a lot of projects.

 
How do I know if a definition already exists for this column?

From the home screen search for the column you would like to define.  When you add definition use the “Search for Definition” in the bottom left corner.  If you find the definition you can apply it here.

Alternatively, you can view all existing definitions in the catalog before going to edit the column.

Can everyone see all the data in my instance?

Users must have Dataiku access to use Thread and permission of “read project content” to the Thread project.  All users can see the column names and definitions for all projects on the instance. If the user doesn’t have access to a particular project, they cannot view the DSS tab in the project, dataset or column results screens.

Who can view, edit or delete definitions?

Any Dataiku user with “read project content” permission to the Thread project can view, edit and delete definitions in Thread and the column descriptions in Dataiku.

What if I change the name of a column?

Thread is able to recognize the same column with a different name as the same lineage if the column was renamed in a prep recipe.

How do I know how much is documented?

You can view by project, dataset, or column how much has been documented.

Can I have two Thread webapps running at the same time?

Yes, you can have two Thread webapps but they need to be in separate projects.

Thread Background

OUR EXPERTS ARE HERE TO HELP

Whether you need additional training, want to talk with us about your data journey, or are ready to maximize your data strategy, our team of data experts are here for you. 
TALK TO AN EXPERT