Data Ingest
Let's start by adding a new table and some data to your lake.
To add data to your lake you have 2 options:
- Add a parquet file to the bucket with the key
load/<database>/<table>/<filename>.parquet
. Either a new table will be automatically created or the data will be appended to an existing table under/stg
. Once loaded, files are deleted from this directory. - Add some data to the bucket with the key
prepare/<database>/<table>/<filename>.<filetype>
in one of the supported filetypes: JSON, CSV, Avro, ORC. Data dropped here will be automatically batched up and converted into compatible parquet files, then moved to/load
for writing to the lake.
Example
- Download the test.parquet file
- Upload the
test.parquet
file to your Lagoon bucket with the keyload/mytiki/users/test.parquet
- Open your Lagoon bucket and look for the keys under
stg/mytiki/users
you should see/data
and/metadata
folders with Iceberg files under each. Your originaltest.parquet
file should be gone from the initialload/
directory.
-
Open up AWS Athena in the same region as your Lagoon, you should see the workgroup
mytiki-lagoon
with the databasemytiki
and the tableusers
-
Run
select * from mytiki.users
and you should get a result like:# name favorite_number favorite_color _etl_loaded_at 1 Alyssa 256 2024-05-31 18:57:58.179000
Updated about 2 months ago