Schemas
SBT allows you to define schema specs in your model yaml files. Those schemas can then be assigned to sources or models. When a schema is assigned to a source or model, sbt
can validate the model output or source table/view against the schema definition. The use of schemas is what powers Insight Cloud's ability to evaluate if a user's data is compatible with an app. This is key to Seek's ability to run your app logic against any user's data.
Schema specs include the following fields:
Field | Description |
---|---|
name | The name of the schema |
description | An optional description of the schema |
columns | A list of column definitions |
Column definitions include the following fields:
Field | Description |
---|---|
name | The name of the column |
type | The type of the column |
description | An optional description of the column |
constraints | A optional list of constraints |
Column Constraints
Schema column constraints are used to enforce rules on the data in the column. This allows your app to catch data quality issues upstream and avoid providing users with confusing analysis.
There are 3 types of constraints currently:
not_null
- Ensures that the column has no null valuesunique
- Ensures that the column has all unique valuesexpr
- Allows for a custom expression to be evaluated on the column. The expression is a string that is used in acount
query to evaluate the column. If the count is zero, the constraint is satisfied. The expression can be any valid SQL expression that can be used in awhere
clause. For example,> 5
,< CURRENT_DATE()
,= FALSE
,not in ('US', 'CA')
.
SBT Commands
sbt
has a few commands for checking schema adherence:
sbt validate-model
- compares a model instantiated in Snowflake against a desired schemasbt validate-source
- compares a Snowflake source against a desired schemasbt validate-app
- runs validations for all models and sources in the app with a schema assigned- Use the
--validate
flag on therun-model
orrun-app
command to trigger a schema validation after running the model or app
Example YAML
Here is an example schema object definition:
schemas:
- name: sample_target_schema
description: an optional description
columns:
- name: CITY
type: string
constraints:
- type: not_null
- type: unique
- name: DC NBR
type: integer
description: an optional description
constraints:
- type: expr
expression: '>7500'
- name: LATITUDE
type: float
description: an optional description
- name: ISMANUALGEOCODE
type: boolean
- name: ADDRESS
type: str
constraints:
- type: not_null
description: an optional description
- name: STATE/PROV
type: text
description: an optional description, this column is optional
optional: true
To assign a schema to a model or schema, add a target_schema
to the yaml config. You only need to assign the name of the schema, sbt will handle looking up the details of the named schema. We do this so that multiple models or sources can share the same schema definition.
models:
- name: sql_test
config:
language: sql
materialized: table
type: output
target_schema: sample_target_schema
sources:
- name: atlas_luminate
description: "Atlas-prepped Luminate Tables "
database: SEEK_TEST_DATA
tables:
- name: luminate_stores
- name: luminate_distributioncenters
target_schema: sample_target_schema
Notes
- Column names are case sensitive, column types are not
- A full list of all the types available in snowflake is here