Create the Data Flow
This section describes how to create a dataflow to migrate information from one repository to another. In the following example, a GetWeb processor downloads web pages and a PutFileSystem processor inserts them into a folder in the file system.
To create a dataflow to migrate data between repositories
-
Add and configure a connector to retrieve information from the source repository. The connector must generate the
AUTN_MIGRATION_URIdocument field (see Supported Connectors). -
Add a processor to copy the value of the
AUTN_MIGRATION_URImetadata field to a FlowFile attribute namedidol.put.migrationuri.-
Add a processor, by dragging the processor icon
from the components toolbar to the canvas.The Add Processor dialog box opens.
-
In the Source list, click idol.nifi.
The list of processors is filtered to show only NiFi Ingest processors.
-
Select the UpdateAttributeFromMetadata processor and click ADD.
The processor is added to the canvas.
-
Right-click the processor and click Configure.
The Configure Processor dialog box opens.
- Click the Properties tab.
-
Click
and add a dynamic property:Property Name Property Value The FlowFile attribute to add or update:
idol.put.migrationuriAn XPath expression, to choose the document metadata field or document metadata field attribute to use to set the value of the FlowFile attribute:
//AUTN_MIGRATION_URI
-
-
Add and configure a connector to insert the information into the destination repository. For information about the connectors that you can use, see Supported Connectors. This example uses a File System Connector, so you would need to add a PutFileSytem processor and set the following dynamic property:
Property Name Example Property Value migration:rootDirectoryWindows:
D:\path\to\migrated_files\Linux:
/path/to/migrated_files/ -
Connect the processors:
- Connect the success relationship of the
Get*processor to the UpdateAttributeFromMetadata processor. - Connect the success relationship of the UpdateAttributeFromMetadata processor to the
Put*processor.
- Connect the success relationship of the
-
You can now start all of the processors in the dataflow. (Go to the Operate palette and click Start
).The web pages are downloaded and inserted into the file system.