Quick Start#
This shows the bare minimum needed to get started with Kaskada.
Install#
Install the latest version.
This uses kaskada>=0.6.0-a.3
to ensure the pre-release version is installed.
pip install kaskada>=0.6.0-a.3
See the section on installation to learn more about installing Kaskada.
Write a query#
The following Python code imports the Kaskada library, creates a session, and loads some CSV data. It then runs a query to produce a Pandas DataFrame.
import asyncio
import kaskada as kd
kd.init_session()
content = "\n".join(
[
"time,key,m,n",
"1996-12-19T16:39:57,A,5,10",
"1996-12-19T16:39:58,B,24,3",
"1996-12-19T16:39:59,A,17,6",
"1996-12-19T16:40:00,A,,9",
"1996-12-19T16:40:01,A,12,",
"1996-12-19T16:40:02,A,,",
]
)
source = await kd.sources.CsvString.create(content, time_column="time", key_column="key")
source.select("m", "n").extend({"sum_m": source.col("m").sum() }).to_pandas()
_time | _key | sum_m | m | n | |
---|---|---|---|---|---|
0 | 1996-12-19 16:39:57 | A | 5 | 5.0 | 10.0 |
1 | 1996-12-19 16:39:58 | B | 24 | 24.0 | 3.0 |
2 | 1996-12-19 16:39:59 | A | 22 | 17.0 | 6.0 |
3 | 1996-12-19 16:40:00 | A | 22 | NaN | 9.0 |
4 | 1996-12-19 16:40:01 | A | 34 | 12.0 | NaN |
5 | 1996-12-19 16:40:02 | A | 34 | NaN | NaN |