You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
Add 2 new Make command (`make notebook`) to spin up a jupyter notebook;
`make notebook-infra` spins up jupyter notebook along with integration
test infrastructure.
### Pyiceberg Example Notebook
Pyiceberg example notebook (`notebooks/pyiceberg_example.ipynb`) is
based on the
https://py.iceberg.apache.org/#getting-started-with-pyiceberg page and
doesn't require additional test infra.
### Spark Example Notebook
Spark integration example notebook
(`notebooks/spark_integration_example.ipynb`) is based on
https://iceberg.apache.org/docs/nightly/spark-getting-started/ and
requires integration test infrastructure (Spark, IRC, S3)
With spark connect (#2491) and our testing setup, we can quickly spin up
a local env with `make test-integration-exec` which includes:
* spark
* iceberg rest catalog
* hive metastore
* minio
In the jupyter notebook, connect to spark easily
```
from pyspark.sql import SparkSession
# Create SparkSession against the remote Spark Connect server
spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
spark.sql("SHOW CATALOGS").show()
```
## Are these changes tested?
Yes, run both `make notebook` and `make notebook-infra` locally and run
the example notebooks
## Are there any user-facing changes?
<!-- In the case of user-facing changes, please add the changelog label.
-->
PyIceberg provides Jupyter notebooks for quick experimentation and learning. Two Make commands are available depending on your needs:
234
+
235
+
### PyIceberg Examples (`make notebook`)
236
+
237
+
For basic PyIceberg experimentation without additional infrastructure:
238
+
239
+
```bash
240
+
make notebook
241
+
```
242
+
243
+
This will install notebook dependencies and launch Jupyter Lab in the `notebooks/` directory.
244
+
245
+
**PyIceberg Example Notebook** (`notebooks/pyiceberg_example.ipynb`) is based on the [Getting Started with PyIceberg](https://py.iceberg.apache.org/#getting-started-with-pyiceberg) page. It demonstrates basic PyIceberg operations like creating catalogs, schemas, and querying tables without requiring any external services.
For working with PyIceberg alongside Spark, use the infrastructure-enabled notebook environment:
250
+
251
+
```bash
252
+
make notebook-infra
253
+
```
254
+
255
+
This command spins up the full integration test infrastructure via Docker Compose, including:
256
+
257
+
-**Spark** (with Spark Connect)
258
+
-**Iceberg REST Catalog** (using the [`apache/iceberg-rest-fixture`](https://hub.docker.com/r/apache/iceberg-rest-fixture) image)
259
+
-**Hive Metastore**
260
+
-**S3-compatible object storage** (Minio)
261
+
262
+
**Spark Example Notebook** (`notebooks/spark_integration_example.ipynb`) is based on the [Spark Getting Started](https://iceberg.apache.org/docs/nightly/spark-getting-started/) guide. This notebook demonstrates how to work with PyIceberg alongside Spark, leveraging the Docker-based testing setup for a complete local development environment.
263
+
264
+
After running `make notebook-infra`, open `spark_integration_example.ipynb` in the Jupyter Lab interface to explore Spark integration capabilities.
265
+
231
266
## Code standards
232
267
233
268
Below are the formalized conventions that we adhere to in the PyIceberg project. The goal of this is to have a common agreement on how to evolve the codebase, but also using it as guidelines for newcomers to the project.
Copy file name to clipboardExpand all lines: mkdocs/docs/index.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -198,6 +198,10 @@ Since the catalog was configured to use the local filesystem, we can explore how
198
198
find /tmp/warehouse/
199
199
```
200
200
201
+
## Try it yourself with Jupyter Notebooks
202
+
203
+
PyIceberg provides Jupyter notebooks for hands-on experimentation with the examples above and more. Check out the [Notebooks for Experimentation](contributing.md#notebooks-for-experimentation) guide.
204
+
201
205
## More details
202
206
203
207
For the details, please check the [CLI](cli.md) or [Python API](api.md) page.
0 commit comments