Supporting DELETE and UPDATE¶
The Presto engine provides APIs to support row-level SQL DELETE and UPDATE.
To implement DELETE or UPDATE, a connector must:
Layer an
UpdatablePageSourceon top of the connector’sConnectorPageSourceDefine
ConnectorMetadatamethods to get a rowId column handleStart the operation using
beginUpdate()orbeginDelete()Finish the operation using
finishUpdate()orfinishDelete()
DELETE and UPDATE Data Flow¶
DELETE and UPDATE have a similar flow:
For each split, the connector will create an
UpdatablePageSourceinstance, layered over the connector’sConnectorPageSource, to read pages on behalf of the Presto engine, and to write deletions or updates to the underlying data store.The connector’s
UpdatablePageSource.getNextPage()implementation fetches the next page from the underlyingConnectorPageSource, optionally reformats the page, and returns it to the Presto engine.The Presto engine performs filtering and projection on the page read, producing a page of filtered, projected results.
The Presto engine passes that filtered, projected page of results to the connector’s
UpdatablePageSourcedeleteRows()orupdateRows()method. Those methods persist the deletions or updates in the underlying data store.When all the pages for a specific split have been processed, the Presto engine calls
UpdatablePageSource.finish(), which returns aCollection<Slice>of fragments representing connector-specific information about the rows processed by the calls todeleteRowsorupdateRows.When all pages for all splits have been processed, the Presto engine calls
ConnectorMetadata.finishDelete()orfinishUpdate, passing a collection containing all the fragments from all the splits. The connector does what is required to finalize the operation, for example, committing the transaction.
The rowId Column Handle Abstraction¶
The Presto engine and connectors use a rowId column handle abstraction to agree on the identities of rows to be updated or deleted. The rowId column handle is opaque to the Presto engine. Depending on the connector, the rowId column handle abstraction could represent several physical columns.
The rowId Column Handle for DELETE¶
The Presto engine identifies the rows to be deleted using a connector-specific
rowId column handle, returned by the connector’s ConnectorMetadata.getDeleteRowIdColumnHandle()
method, whose full signature is:
ColumnHandle getDeleteRowIdColumnHandle(
ConnectorSession session,
ConnectorTableHandle tableHandle)
The rowId Column Handle for UPDATE¶
The Presto engine identifies rows to be updated using a connector-specific rowId column handle,
returned by the connector’s ConnectorMetadata.getUpdateRowIdColumnHandle()
method. In addition to the columns that identify the row, for UPDATE the rowId column will contain
any columns that the connector requires in order to perform the UPDATE operation.
UpdatablePageSource API¶
As mentioned above, to support DELETE or UPDATE, the connector must define a subclass of
UpdatablePageSource, layered over the connector’s ConnectorPageSource. The interesting methods are:
Page getNextPage(). When the Presto engine callsgetNextPage(), theUpdatablePageSourcecalls its underlyingConnectorPageSource.getNextPage()method to get a page. Some connectors will reformat the page before returning it to the Presto engine.void deleteRows(Block rowIds). The Presto engine calls thedeleteRows()method of the sameUpdatablePageSourceinstance that supplied the original page, passing a block of rowIds, created by the Presto engine based on the column handle returned byConnectorMetadata.getDeleteRowIdColumnHandle()void updateRows(Page page, List<Integer> columnValueAndRowIdChannels). The Presto engine calls theupdateRows()method of the sameUpdatablePageSourceinstance that supplied the original page, passing a page of projected columns, one for each updated column and the last one for the rowId column. The order of projected columns is defined by the Presto engine, and that order is reflected in thecolumnValueAndRowIdChannelsargument. The job ofupdateRows()is to:Extract the updated column blocks and the rowId block from the projected page.
Assemble them in whatever order is required by the connector for storage.
Store the update result in the underlying file store.
CompletableFuture<Collection<Slice>> finish(). The Presto engine callsfinish()when all the pages of a split have been processed. The connector returns a future containing a collection ofSlice, representing connector-specific information about the rows processed. Usually this will include the row count, and might include information like the files or partitions created or changed.
ConnectorMetadata DELETE API¶
A connector implementing DELETE must specify three ConnectorMetadata methods.
getDeleteRowIdColumnHandle():ColumnHandle getDeleteRowIdColumnHandle( ConnectorSession session, ConnectorTableHandle tableHandle)The ColumnHandle returned by this method provides the rowId column handle used by the connector to identify rows to be deleted, as well as any other fields of the row that the connector will need to complete the
DELETEoperation. For a JDBC connector, that rowId is usually the primary key for the table and no other fields are required. For other connectors, the information needed to identify a row usually consists of multiple physical columns.beginDelete():ConnectorDeleteTableHandle beginDelete( ConnectorSession session, ConnectorTableHandle tableHandle)As the last step in creating the
DELETEexecution plan, the connector’sbeginDelete()method is called, passing thesessionandtableHandle.beginDelete()performs any orchestration needed in the connector to start processing theDELETE. This orchestration varies from connector to connector.beginDelete()returns aConnectorDeleteTableHandlewith any added information the connector needs when the handle is passed back tofinishDelete()and the split generation machinery. For most connectors, the returned table handle contains a flag identifying the table handle as a table handle for aDELETEoperation.finishDelete():void finishDelete( ConnectorSession session, ConnectoDeleteTableHandle tableHandle, Collection<Slice> fragments)During
DELETEprocessing, the Presto engine accumulates theSlicecollections returned byUpdatablePageSource.finish(). After all splits have been processed, the engine callsfinishDelete(), passing the table handle and that collection ofSlicefragments. In response, the connector takes appropriate actions to complete theDeleteoperation. Those actions might include committing the transaction, assuming the connector supports a transaction paradigm.
ConnectorMetadata UPDATE API¶
A connector implementing UPDATE must specify three ConnectorMetadata methods.
getUpdateRowIdColumnHandle:ColumnHandle getUpdateRowIdColumnHandle( ConnectorSession session, ConnectorTableHandle tableHandle, List<ColumnHandle> updatedColumns)The
updatedColumnslist contains column handles for all columns updated by theUPDATEoperation in table column order.The ColumnHandle returned by this method provides the rowId used by the connector to identify rows to be updated, as well as any other fields of the row that the connector will need to complete the
UPDATEoperation.beginUpdate:ConnectorTableHandle beginUpdate( ConnectorSession session, ConnectorTableHandle tableHandle, List<ColumnHandle> updatedColumns)As the last step in creating the
UPDATEexecution plan, the connector’sbeginUpdate()method is called, passing arguments that define theUPDATEto the connector. In addition to thesessionandtableHandle, the arguments includes the list of the updated columns handles, in table column order.beginUpdate()performs any orchestration needed in the connector to start processing theUPDATE. This orchestration varies from connector to connector.beginUpdatereturns aConnectorTableHandlewith any added information the connector needs when the handle is passed back tofinishUpdate()and the split generation machinery. For most connectors, the returned table handle contains a flag identifying the table handle as a table handle for aUPDATEoperation. For some connectors that support partitioning, the table handle will reflect that partitioning.finishUpdate:void finishUpdate( ConnectorSession session, ConnectorTableHandle tableHandle, Collection<Slice> fragments)During
UPDATEprocessing, the Presto engine accumulates theSlicecollections returned byUpdatablePageSource.finish(). After all splits have been processed, the engine callsfinishUpdate(), passing the table handle and that collection ofSlicefragments. In response, the connector takes appropriate actions to complete theUPDATEoperation. Those actions might include committing the transaction, assuming the connector supports a transaction paradigm.