Data Ingest

Let's start by adding a new table and some data to your lake.

To add data to your lake you have 2 options:

  1. Add a parquet file to the bucket with the key load/<database>/<table>/<filename>.parquet. Either a new table will be automatically created or the data will be appended to an existing table under /stg. Once loaded, files are deleted from this directory.
  2. Add some data to the bucket with the keyprepare/<database>/<table>/<filename>.<filetype> in one of the supported filetypes: JSON, CSV, Avro, ORC. Data dropped here will be automatically batched up and converted into compatible parquet files, then moved to /load for writing to the lake.

Example

  1. Download the test.parquet file
  2. Upload the test.parquet file to your Lagoon bucket with the key load/mytiki/users/test.parquet
  3. Open your Lagoon bucket and look for the keys under stg/mytiki/users you should see /data and /metadata folders with Iceberg files under each. Your original test.parquet file should be gone from the initial load/ directory.

  1. Open up AWS Athena in the same region as your Lagoon, you should see the workgroup mytiki-lagoon with the database mytiki and the table users

  2. Run select * from mytiki.users and you should get a result like:

    #namefavorite_numberfavorite_color_etl_loaded_at
    1Alyssa2562024-05-31 18:57:58.179000


What’s Next

Transform your raw data