Skip to content
Rocket Launch Explore Blog Header-1
Ryan MooreMay 14, 20249 min read

Exploring Dataiku DSS 12.6: Harness the Power of Generative AI

As generative AI has swept through the technology world, software packages across nearly all industries have made an effort to integrate their products with services like ChatGPT, Google Gemini, and others. Data science and engineering tools are no exception to this, with many products adding features such as code assistance and development chat features. Dataiku has been following this trend from the start and has done an exceptional job integrating a backbone called “LLM Mesh” along with a number of other product integrations to expedite development and documentation efforts. In this article, we are going to look at the new integrations available in DSS 12.6 and highlight some of the possibilities of each.

Getting Started

In order to enable the features we’ll be reviewing, it’s necessary to start in the Administrative settings of your DSS instance. If you don’t have Administrative capabilities, you may need to work with a platform admin for this setup.

LLM Connection

If you’ve used Dataiku before, you’re likely familiar with the idea of “Connections”, which are typically settings that allow for access to a data source or destination. New with DSS LLM Mesh is the idea of an LLM Mesh connection that integrates with LLM services such as the OpenAI API or Pinecone vector database. Before we can fully enable the LLM integrations in DSS, it’s necessary to set up at least one of these connections. We won’t cover details about setting up these connections in this article, but you’ll find details on the Dataiku website

In order to make use of all of the features we’ll cover today, we need to add at least one accessible LLM connection to your instance. In our example, we’ll be using an OpenAI API connection.

Dataiku DSS 12.6 (1)

Enabling LLM Features

Once you have an LLM connection ready to utilize, the next configuration step will be in the Dataiku Administration section. In the “Settings” tab, you’ll find a new section under “Other” called “AI Services” as shown below.

Dataiku DSS 12.6 (2)

In this settings page, there are three sections of features that we will enable for the new DSS LLM integration. As shown in the previous screenshot, you’ll want to enable the “AI Prepare”, “AI Explain”, and “AI Code Assistant”. When you enable the “AI Code Assistant”, you’ll be required to select a Default LLM Connection that the Code Assistant will use for its code assist capabilities. Here you’ll select one of the LLM connections that you have created on your DSS instance, with the understanding that there could be charges associated with the code assistance usage.

Exploring LLM Features

 

AI Code Assistant

If you are in a role that involves writing code, you’ve likely seen the massive trend towards AI Code Assistant features in software development tools such as Visual Studio Code. Services like Github’s Copilot provide some incredible assistance and can be a way to utilize Generative AI to greatly increase your productivity. Thankfully, this capability has also been added to the development environments in DSS to assist with the coding experience. 

Jupyter Integration

The default development environment for writing Python code in Dataiku is using embedded Jupyter notebooks, which is what we’ll use for this example. If you open a Jupyter notebook in a DSS project, you can enable the AI Code Assistant features by adding the following line near the top of your notebook:

%load_ext ai_code_assistant

After running this line, you should receive a message confirming that the Assistant is loaded and indicating that the %aihelp command can be used to get more information on the capabilities. 

Dataiku DSS 12.6 (3)

Once you have the assistant loaded, you’ll be able to take advantage of three primary functions. As a reminder, each of these functions make calls to the LLM selected in the configuration section we presented above.

  • %%aiask: Allows you to ask a technical question related to development and/or the code in your notebook.
  • %%aiwrite: Generates Python code per your request. This generated code is placed directly into a cell in your Jupyter Notebook with comments for usage.
  • %%aiexplain: Explains existing code you might have in your notebook. For example, you may have inherited a project or codebase and need some assistance understanding a function definition. This feature can help answer your questions right inside your Jupyter Notebook. 
 
Function: %%aiask

The %%aiask function is a quick and robust way to ask questions in your development environment that you may otherwise pose to Google or Stack Overflow. Being able to ask directly in your notebook may reduce the amount of context-switching involved in your development. As an example, see the below screenshot showing a question posed to the %%aiask function “how can I convert a column in a pandas dataframe from string to date?”.

Dataiku DSS 12.6 (4)

After running, the output of our cell is a description and code sample which answers our question accurately! 

Function: %%aiwrite

The following screenshot shows an example of using the %%aiwrite function to generate a Python function based on our prompt “write a function to compute the correlation between all numeric columns in a pandas dataframe”.

Dataiku DSS 12.6 (5)

As you can see, the LLM was not only able to quickly generate code to accomplish our task, but also provided comments and documentation about the code that has been generated. With this example, you can imagine the possibilities of being able to create code much faster and more efficiently using the %%aiwrite function in DSS. As the comments indicate, be sure to validate the generated code, as LLMs aren’t perfect.

Function: %%aiexplain

Just like the %%aiwrite function, the %%aiexplain function is easy to use in your DSS Jupyter Notebooks. The %%aiexplain function can be added on a line prior to the code you’d like to understand, in a Jupyter Notebook cell. After executing the cell, rather than allowing the Python kernel to execute the contained code, your LLM connection will generate an explanation and write it as a cell output. In the following screenshot, you’ll see an example of using %%aiexplain to explain a very common cell in a Dataiku Python Notebook, which establishes a reference to a DSS dataset and loads the contents of that dataset as a Pandas DataFrame. 

Dataiku DSS 12.6 (6)

In this case, the %%aiexplain function was able to accurately break down the functionality of these two lines of code in a way that is easy to understand.

Visual Studio Code Integration

We’ve just covered the integration of code assist into Jupyter in DSS, but it is also important to note that this functionality is also available in a Visual Studio Code Studio environment. You can find those setup details on the Dataiku website.

AI Prepare

One of the reasons that Dataiku is such a great data development tool is because of it's great support for both coders and “clickers” to accomplish technical tasks. Similar to the previous AI Code Assistant feature, the AI Prepare feature allows for the generation of project logic by using text prompts. With the AI Prepare feature, however, the output of this generation is steps in a visual prepare recipe, which can improve productivity and help you learn the capabilities of the DSS prepare recipe.

Note: Unlike the previous Code Assistant functionality, this feature does NOT utilize your defined LLM connection and therefore does not incur charges to your organization. Instead, Metadata about processed datasets (and, optionally, sample data) is sent to Dataiku and third-party services for processing. Be sure to review whether this is in line with your organization’s policies.

Creating a Step

You’re likely familiar with the DSS visual prepare recipe and the Add a new step button which allows you to select from the 100+ processors that can be used as a part of a visual recipe. The new AI Prepare capability adds a new option underneath this button, appropriately titled AI Prepare. 
Dataiku DSS 12.6 (7)

By selecting this button, you can enter a text prompt describing the preparation step(s) you would like to create. In the following screenshot, we’ll prompt to “Remove all rows with an order_date before 2015”, referring to a column in the input dataset of our prepare recipe. 

Dataiku DSS 12.6 (8)As a result, you’ll see a newly created step group with a description and a single step that accomplishes the task requested in our prompt. If the requested task was more complex, you may see multiple steps contained in this group, which are executed sequentially. On the other hand, on occasion, the step generated is actually invalid, possibly because it wrote a bad formula recipe. If this happens, Dataiku will provide a message and automatically disable the newly generated step, which you may be able to modify to resolve.

Dataiku DSS 12.6 (9)

The AI Prepare feature is a great way to quickly create prepare steps and to get ideas about how to accomplish a task within a recipe. As with the previous disclaimers, be sure to double-check any steps created using this feature to avoid any unintended results.

AI Explain

One of the many capabilities of DSS that make it a great tool for collaborative development is the ability to add metadata and documentation to elements within your projects. Specifically, we can add large bodies of text as project, dataset, recipe or flow zone descriptions which can provide additional context for future development. With the new AI Explain feature integrated into Dataiku, we can use a LLM to help us generate descriptive text of these elements as a starting point for project documentation.

Note: Unlike the previous Code Assistant functionality, this feature does NOT utilize your defined LLM connection and therefore does not incur charges to your organization. Metadata about processed datasets and recipes is instead sent to Dataiku and third-party services for processing. Be sure to review whether this is in line with your organization’s policies.

Explain Flow

First, you’ll find the DSS UI GenAI explain features in the Flow Actions button in the lower-right corner of the flow. 

Dataiku DSS 12.6 (10)When selected, Dataiku will go through a series of steps to determine the purpose and function of each of the zones in your flow and then attempt to tie them together in a summary statement. This generation can be adjusted for Language, Purpose, and Length and when complete, can be placed directly into your overall project description. Although it may not always be perfect, it can be a great starting point for documentation. 
Dataiku DSS 12.6 (11)

Explain Zone

A relatively new feature in the DSS flow is the ability to add metadata to Flow Zones. Shown in the image below is a selected flow zone in which we can edit the Short Description, Long Description, and Tags (just like a dataset or recipe) as well as use the new Explain button to automatically generate a description. 

Dataiku DSS 12.6 (12)

Just like the Explain Flow option, the resulting dialog contains a detailed description of the dataset and recipe content of the selected flow zone, along with the ability to change the language, purpose, and length of the content. In this case, you’ll see a Use as Zone Description button on the dialog which will set the Long Description of the zone to this content. Once the description is set, you can easily edit it to correct any errors or add context. 

Dataiku DSS 12.6 (13)

Powerful Generative AI Tools for Productivity and Collaboration

The integration of Generative AI features, particularly the LLM Mesh, into Dataiku's Data Science Studio represents a significant advancement in the capabilities of data science and engineering platforms. By leveraging services like the OpenAI API, it’s now possible to access powerful tools for code assistance, project logic generation, and project documentation. These features not only streamline development processes but also enhance productivity and collaboration within your data teams.

Ready to increase productivity and collaboration with your team, but need help moving the needle?

CONTACT US

avatar

Ryan Moore

With over 20 years of experience as a Lead Software Architect, Principal Data Scientist, and technical author, Ryan Moore is the Head of Delivery and Solutions at Snow Fox Data and our resident Dataiku Neuron. He provides Data Science architecture and implementation consultation to organizations around the globe.

RELATED ARTICLES