Data Types
Kaskada operates on typed Timestreams. Similar to how every Pandas DataFrame
has an associated dtype
, every Kaskada Timestream
has an associated PyArrow data type returned by kaskada.Timestream.data_type
. The set of supported types is based on the types supported by Apache Arrow.
Each Timestream
contains points of the corresponding type. We’ll often say that the “type” of a Timestream
is the type of the values it contains.
Kaskada’s type system describes several kinds of values. Scalar types correspond to simple values, such as the string "hello"
or the integer 57
. They correspond to a stream containing values of the given type, or null
. Composite types are created from other types. For instance, records may be created using scalar and other composite types as fields. An expression producing a record type is a stream that produces a value of the given record type or null
.
Scalar Types
Scalar types include booleans, numbers, strings, timestamps, durations and calendar intervals.
Types |
Description |
---|---|
|
Booleans represent true or false. Examples: |
|
Unsigned integer numbers of the specified bit width. Examples: |
|
Signed integer numbers of the specified bit width. Examples: |
|
Floating point numbers of the specified bit width. Examples: |
|
Unicode strings. Examples: |
|
Points in time relative the Unix Epoch (00:00:00 UTC on January 1, 1970). Time unit may be seconds (s), milliseconds (ms), microseconds (us) or nanoseconds (ns). Examples: |
|
A duration of a fixed amount of a specific time unit. Time unit may be seconds (s), milliseconds (ms), microseconds (us) or nanoseconds (ns). Examples: |
|
A calendar interval corresponding to the given amount of the corresponding time. The length of an interval depends on the point in time it is added to. For instance, adding 1 `interval_month` to a timestamp will shift to the same day of the next month. Examples: |
Record Types
Records allow combining 1 or more values of potentially different types into a single value. Records are unnamed - any two records with the same set of field names and value types are considered equal. Fields within a record may have different types. Field names must start with a letter.
For example, {name: string, age: u32 }
is a record type with two fields and {name: 'Ben', age: 33 }
is corresponding value.
NOTE: Record types may be nested.
Collection Types
Kaskada also supports collections – lists and maps.
The type list<T>
describes a list of elements of type T
. For example, list<i64>
is a list of 64-bit integers.
Similarly, map<K, V>
describes a map containing keys of type K
and values of type V
. For example, map<str, i64>
is a map from strings to 64-bit integers.
Type Coercion
Kaskada implicitly coerces numeric types when different kinds of numbers are combined. For example adding a 64-bit signed integer value to a 32-bit floating point value produces a 64-point floating point value
Type coercion will never produce an integer overflow or reduction in numeric precision. If needed, such conversions must be explicitly specified using as
.
The coercion rules can be summarized with the following rules:
- Unsigned integers can be widened:
u8
⇨u16
⇨u32
⇨u64
. - Integers can be widened:
i8
⇨i16
⇨i32
⇨i64
. - Floating point numbers can be widened:
f16
⇨f32
⇨f64
. - Unsigned integers can be promoted to the next wider integer
u8
⇨i16
,u16
⇨i32
,u32
⇨i64
. - All numbers may be converted to
f64
. - Strings may be implicitly converted to timestamps by attempting to parse them as RFC3339 values. The timestamp will be null for strings that don’t successfully parse.
One aspect of the coercion rules is that when an operation is applied to two different numeric types the result may be a third type which they may both be coerced to. The type promotion table shows the type resulting from a binary operation involving two different numeric types.
u8 |
u16 |
u32 |
u64 |
i8 |
i16 |
i32 |
i64 |
f16 |
f32 |
f64 |
|
---|---|---|---|---|---|---|---|---|---|---|---|
u8 |
u8 |
u16 |
u32 |
u64 |
i16 |
i16 |
i32 |
i64 |
f16 |
f32 |
f64 |
u16 |
u16 |
u16 |
u32 |
u64 |
i32 |
i32 |
i32 |
i64 |
f16 |
f32 |
f64 |
u32 |
u32 |
u32 |
u32 |
u64 |
i64 |
i64 |
i64 |
i64 |
f32 |
f32 |
f64 |
u64 |
u64 |
u64 |
u64 |
u64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
i8 |
i16 |
i32 |
i64 |
f64 |
i8 |
i16 |
i32 |
i64 |
f16 |
f32 |
f64 |
i16 |
i16 |
i32 |
i64 |
f64 |
i16 |
i16 |
i32 |
i64 |
f16 |
f32 |
f64 |
i32 |
i32 |
i32 |
i64 |
f64 |
i32 |
i32 |
i32 |
i64 |
f16 |
f32 |
f64 |
i64 |
i64 |
i64 |
i64 |
f64 |
i64 |
i64 |
i64 |
i64 |
f16 |
f32 |
f64 |
f16 |
f16 |
f16 |
f16 |
f16 |
f16 |
f16 |
f16 |
f16 |
f16 |
f32 |
f64 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f32 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |
f64 |