BigQuery
Learn more about the shape of your data like column names, data types, and other information.
To use the data from the Awell platform, it's essential to understand some Awell terminology first.
Below, we explain the terminology that you'll find back in the data schema. If you have any further questions about this, please contact your Awell Customer Success Manager.
In the Awell domain, a care flow definition is a care flow template designed in the Awell Studio, representing the general structure and components of a care plan without being tied to any specific patient.
A care flow on the other hand is a patient-specific instance derived from a care flow definition that tracks and manages an individual patient's care journey. For example, if you created the care flow definition "post-operative follow-up" in Awell Studio, all your patients that get included in this care flow definition will have an individual care flow.
Similar to care flow definitions and care flows, a data point definition is a care flow component as designed in Awell Studio while a data point is the patient-specific instance of that data point definition. For example, if you collect the weight of a patient in a care flow, the data point for a specific patient could be 80 kg (or 176lbs).
To avoid having to work with randomly generated IDs for data point definitions, we allow users to define a human-readable identifier in Awell Studio for all data point definitions. In the data repository, we combine this human-readable identifier with the source to form a 'key'. Let's imagine that you are building a patient form to collect the patient's weight and height, with the intent of calculating a BMI score. You set the form key to bmi, and set the question keys to height and weight respectively. This will result in data point definitions with the following keys:
In Awell Studio you can see the list of published versions for a given care flow definition with an auto incremented version number. This version number is only used for display purposes. Behind the scenes we assign a unique release identifier to each published version, which you can retrieve through the Get published pathway definitions query.
The release identifier is guaranteed to be globally unique so it can be safely used as input to build analytics query on the data set.
The data repository contains three different types of data: data points, orchestration data and patient data.
Data points refers to any atomic piece of data collected from a user or system in care flows (e.g. form responses, calculation results, etc.). These are the datapoints that you can find in the data catalog in Awell Studio and that you can use to build care flows.
Orchestration data refers to data generated by the orchestration of the care flow itself, i.e. data about which actions were orchestrated when in care flows.
Patient data refers to the data you explicitly provide on the patients you enroll in your care flows. When using anonymous patients this only contains identifiers.
The data points are stored in two tables: data_points
and data_point_definitions
. The data_points
table uses a normalized structure to handle the fact that atomic data can have different value types (string, boolean, numeric, date etc), and can come from many different sources (forms, calculations, Extensions, etc.).
Table name: data_points
.
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. |
definition_id | STRING | Version agnostic identifier linking to the `definition_id` column in the `data_point_definitions` table, acting as a foreign key to this table, together with `release_id`. |
release_id | STRING | An internal identifier for the published version_number of the care flow definition. Refers to the `release_id` in the `published_careflows` table, serving as a foreign key, together with `definition_id` for connecting to the proper care flow definition. |
care_flow_id | STRING | Identifier of the care flow in which the data point was collected. Refers to the `id` column in the `care_flows` table, serving as a foreign key to the `care_flows` table. |
care_flow_definition_id | STRING | Identifier of the care flow definition (designed care flow template) from which the care flow was instantiated. Refers to the `definition_id` in the `published_careflows` table, serving as a foreign key, together with `release_id` for connecting to the proper care flow definition. |
activity_id | STRING | Identifier of the activity in which the data point was collected. Refers to the `id` column in the `activities` table, acting as a foreign key to the `activities` table. |
value_raw | STRING | Serialised value of the data point. Better to use type dedicated columns. |
value_boolean | BOOLEAN | Typed value of the data point. This column is only populated for rows with a value type of `boolean`. |
value_numeric | NUMERIC | Typed value of the data point. This column is only populated for rows with a value type of `number`. |
value_date | TIMESTAMP | Typed value of the data point. This column is only populated for rows with a value type of `date`. |
value_type | STRING | Primitive type of the value before serialisation (boolean, date, number, string). |
label | STRING | Descriptive label associated with the value, providing a human-readable description. Example: for value_numeric 0, the label might be "Female" or "Ocassionally". Especially useful for data points collected in a form. |
value_type | STRING | Primitive type of the value before serialisation (boolean, date, number, string, numbers_array). |
date | TIMESTAMP | Data point collection time (UTC Timestamp). |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
status | STRING | [IRRELEVANT FOR ANALYSIS] It will always be `created` indicating collection of a data point. |
Table name: data_point_definitions
.
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. Not to be used as foreign key when joining with other tables. |
definition_id | STRING | Version agnostic identifier of designed data point. |
release_id | STRING | An internal identifier for the published version_number of the care flow definition. Refers to the `release_id` in the `published_careflows` table. |
source_definition_id | STRING | |
category | STRING | Identifies how/where the data point is collected. Examples: `form`, `calculation`, `step`. |
key | STRING | Human readable qualified key which defines the meaning of the collected data. It is usually formed with a dot notation of category name and data point name. The naming convention may vary (e.g., snake_case, camelCase, ...). Example: Email.CompletionDate |
options | RECORD | Nested field with an array of objects, each representing a valid option with value and label. Example: "value": "1", "label": "Yes", "value": "0", "label": "No" . |
value_type | STRING | The expected primitive type for the collected data (boolean, date, number, string, numbers_array). |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
Table name: care_flows
.
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. |
patient_id | STRING | Identifier of the patient enrolled in the care flow. Refers to the `id` column in the `patients` table, serving as a foreign key to the `patients` table. |
definition_id | STRING | Identifier of the care flow definition (designed care flow template) from which the care flow was instantiated. Refers to the `definition_id` in the `published_careflows` table, serving as a foreign key, together with `release_id` for connecting to the proper care flow definition. |
title | STRING | Title (Name) of the care flow definition. |
release_id | STRING | An internal identifier for the published version_number of the care flow definition. Refers to the `release_id` in the `published_careflows` table, serving as a foreign key, together with `definition_id` for connecting to the proper care flow definition. |
status | STRING | Current care flow status. Possible values: `active`, `stopped`, `completed`, `missing_baseline_info` |
start_date | TIMESTAMP | Recorded start date of the care flow (UTC). It is always available. |
stop_date | TIMESTAMP | Recorded stop date of the care flow (UTC). Populated only for stopped flows, otherwise NULL. |
complete_date | TIMESTAMP | Recorded completion date of the care flow (UTC). Populated only for completed flows, otherwise NULL. |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
Table name: activities
.
Activities use a generic structure to be able to describe any action that needs to be performed by a human or system.
Go to the Domain Model reference to find more information about how activities are structured, including examples.
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. |
care_flow_id | STRING | Identifier of the care flow associated with the activity. Refers to the `id` column in the `care_flows` table, serving as a foreign key to the `care_flows` table. |
care_flow_definition_id | STRING | Identifier of the care flow definition (designed care flow template) from which the care flow was instantiated. Refers to the `definition_id` in the `care_flows` table. |
status | STRING | The current activity status. One of: `active`, `done`, `failed`, `canceled`, `expired`. Status done indicates complete resolution of the activity, such as a sent message being read or a form being fully completed. Done refers to completed activity. |
resolution | STRING | An internal system status reflecting the outcome of executing the activity, indicating `success`, `failure` (e.g., if a plugin call fails), or `NULL` for activities yet to be resolved or not applicable. |
date | TIMESTAMP | The creation date of the activity (UTC). Usually refers to start date. |
scheduled_date | TIMESTAMP | The date when the scheduled activity is set to start (UTC). Relevant only for scheduled activities. |
completion_date | TIMESTAMP | Completion date of `done` activities. |
action | STRING | |
action_component_name | STRING | The name of the action component holding the primary object, such as a message, form, api_call, calculation. In care flow design, this is typically referred to simply as an action. |
object_type | STRING | Type of primary object this activity relates to. Example values: action, api_call, calculation, form, message, pathway, plugin_action, reminder, step, track. |
object_name | STRING | The name of the primary object the activity is associated with. For messages, this is the subject; for forms the form name. |
object_id | STRING | Id of the primary object. |
indirect_object_type | STRING | Type of indirect/secondary object this activity relates to. Examples: `patient`, `stakeholder`, `plugin`. |
indirect_object_name | STRING | The name of the related indirect/secondary object the activity relates to. It points to who should engage with or is targeted by the activity. It could be a system (for example plugin name) or a human (for example care provider name). |
step_name | STRING | Name of the step the activity belongs to. [IMPORTANT NOTE] Relevant only for activities within steps. If the `object_type` is step, this value will be NULL and step name will be in `object_name` field. |
track_name | STRING | Name of the track the activity belongs to. [IMPORTANT NOTE] Relevant only for activities within track. If the `object_type` is track, this value will be NULL and track name will be in `object_name` field. |
track_id | STRING | Identifier of the track the activity belongs to. [IMPORTANT NOTE] Relevant only for activities within track. If the `object_type` is track, this value will be NULL. |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
Table name: patients
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. |
profile_id | STRING | Unique identifier of the associated patient profile. Acts as a foreign key referring to the id property in the patient_profiles table |
status | STRING | Indicates patient status within the system. Currently, `active_record` is the only available value, indicating that patient is present/not deleted. |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
Table name: patient_profiles
Column | Type | Description |
---|---|---|
id | STRING | Unique identifier. |
name | STRING | Concatenation of the first name and last name. |
first_name | STRING | First name of the patient. |
last_name | STRING | Last name of the patient. |
STRING | Email address of the patient. | |
birth_date | DATE | Birth date of the patient. |
sex | STRING | Sex of the patient in ISO_IEC-5218 standard. One of `0` (Not known), `1` (Male), `2` (Female). |
national_registry_number | STRING | National registry number of the patient. |
patient_code | STRING | Arbitrary identifier associated to the patient. You can use this to facilitate the reconciliation of patient records between Awell and your domain. |
phone | STRING | Phone number in the E164 format. |
mobile_phone | STRING | Phone number in the E164 format. |
address_street | STRING | |
address_city | STRING | |
address_zip | STRING | |
address_state | STRING | |
address_country | STRING | |
last_synced_at | TIMESTAMP | [IRRELEVANT FOR ANALYSIS] Recorded timestamp of importing data to BigQuery. |
status | STRING | Indicates patient status within the system. Currently, `active_record` is the only available value, indicating that patient is present/not deleted. |