Skip to content

Models

A model consists of code that populates a single table or view, along with metadata properties that define how your users can interact with the model.

Models define a data processing step and frequently represent some bit of business logic. They can be written in SQL or Python. Models can reference each other, and SBT will automatically build your complete data processing pipeline by parsing the model code for dependencies. A bundle of models powers an app on the Insight Cloud platform.

Models all live in the models/ directory of your SBT project. A model consists of two parts:

  • A definition - a python or sql file that contains the code for that model. The name of the file will become the unique identifier for the model.
  • A configuration - a yaml file that contains metadata for the model. Must have a named config matching the model name from the definition file.

Example

Let's look at a simple example to illustrate how models work.

First we have the directory structure of the project:

Directory Structure
sbt-project/
├── sources/
│   └── sources.yml
└── models/
    ├── my_first_model.sql
    └── configs.yml

The model definition is a SQL file. It uses standard SQL plus some special syntax to access the Seek Run Context object. We can access sources, variables, and other model outputs using the seek object. All of these variables are compiled at runtime based on the user's request. In this example, we are reading from an expected source table and adding a user-defined parameter to the output.

my_first_model.sql
select
  src.*,
  '{seek.params[TEST_VAR]}' as user_param_value
from {seek.sources[vendor.table]} src

Next let's look at the model config file. It contains a yaml block like the below, which gives us the config for the model my_first_model. The variables key is a list of variables that users will be able to modify when running your app. The values_query key is a query that returns the possible values for the variable - this is useful for powering dropdowns in Insight Cloud.

configs.yml
models:
  - name: my_first_model
    config:
      language: sql
      materialized: table
      type: output
      variables:
        - name: TEST_VAR
          display_name: Test Variable
          type: VARCHAR
          default: No test var set
          values_query: values_test

Model Configs

The model config is contained in a yaml file. The filename doesn't matter, you can create multiple yaml files in nested directories if that suits your needs, or you can have one centralized yaml that holds all your model configs. SBT will search all yaml files for models: keys to find a config that matches the model definition.:

Model Object Fields

Field Description Available Values
name The name of the model, must be unique. Used as the name of tables created by this model. string, must be unique
config.packages A list of python packages that need to be installed in the app environment to run this model. list of strings, typical pip version pinning is supported
config.materialized The type of db object to create in snowflake table or view
config.language The language of the model. python or sql
config.type The type of model. output or mapping
config.variables A list of model variables that can be set when running the model. list of objects
config.py_modules A list of python wheel files that need to be installed in the snowflake environment to run this model. These should be located in the modules directory to be included with model deployments list of strings

Model Variable Fields

Field Description Available Values
name The name of the variable string
display_name The name of the variable as it will be displayed to users string
type The snowflake data type of the variable string
default The default value of the variable string
values_query A query that returns the possible values for the variable string
models:
  - name: python_test
    config:
      packages:
        - "snowflake-snowpark-python"
        - "numpy"
        - "scikit-learn"
        - "pandas"
      materialized: table
      language: python
      type: output
      variables:
        - name: TEST_VAR
          display_name: Test Variable
          type: VARCHAR
          default: No test var set
          values_query: values_test
      py_modules:
        - seek_cpg.whl

Seek Run Context

All of the magic of SBT happens via the Seek Run Context object. Models have access to an object called seek which should be used to interact with the Seek platform. The seek object has properties for accessing sources, params, other models, and other current run context.

class SeekRunCtx(BaseModel):
    model_name: str
    target_name: str
    current_db: str
    params: dict
    sources: dict
    models: dict

Keep reading to see how to use the seek object in your models.

Model Defintion

SQL Models

SQL models are select statements stored in a .sql file. The SQL query defined in the file will eventually be wrapped in CREATE TABLE/VIEW AS SELECT statements. Wrap any referenes to the SeekRunContext in single curly braces - remember this is going to end up getting formatted in python.

Note

For SQL models only - don't wrap the name of the source/model/param you are accessing in quotes. The below example is correctly not quoting atlas_luminate.atlas_itemlist_custom

SELECT * EXCLUDE EPOCH
       RENAME "ITEM NBR" AS item_nbr
FROM {seek.sources[atlas_luminate.atlas_itemlist_custom]}

Python Models

Python models are regular python .py files. They must contain at least one function called model which takes two arguments, session and seek. The session arg is a snowpark session used for interacting with snowflake. The seek arg is the SeekRunContext object. The model function should return a dataframe - could be snowpark or pandas. There can be as many helper functions as you like in the file, but the model function is required.

Below is an example python model definition. This model reads from a source, does some transformations, and writes to an output table.

from snowflake.snowpark.functions import lit

def model(session, seek):
    # Accessing seek sources works
    base_table = session.table(seek.sources["atlas.view_items"])
    pivot_values = base_table.select("Brand Desc").distinct()
    # Accessing defined params works
    var_df = pivot_values.with_column("vars", lit(seek.params["TEST_VAR"]))
    # Accessing seek object properties works
    other_df = var_df.with_column("database", lit(seek.model_name))
    # Casting to pandas works
    pandas_df = other_df.to_pandas()
    # Return table gets written to output table
    snowpark_df = session.createDataFrame(pandas_df)
    return snowpark_df