Removes or tags duplicates after indexing.
This index action runs on a specified subset of the content, locating duplicates using a variety of methods. Any duplicates can then be deleted, moved to a different database, or tagged within a specified field, depending on the value of DuplicateAction that is chosen.
The DREDUPLICATE index action only removes duplicate documents within a single Content component, rather than removing duplicates over the whole distributed system. To remove all duplicates, you must ensure that duplicates of a document are all sent to the same instance of the Content component, for example by using DistributeByFields mode.
http://12.3.4.56:20001/DREDUPLICATE?DuplicateAction=Delete&ReferenceField=*/DREREFERENCE
In this example, duplicates are identified using the DREREFERENCE field, and any duplicates found are deleted.
| Parameter | Description | Required |
|---|---|---|
| ChecksumField | A reference field used to determine whether a match is exact. | |
| Database | The database to move duplicates to. | see Comments |
| DatabaseMatch | A list of databases to search for duplicates in. | |
| DuplicateAction | The action to perform on duplicates. | Yes |
| MaxID | The last DocID to find duplicates of. | |
| MinID | The first DocID to find duplicates of. | |
| ReferenceField | A reference field to use as the initial determination of whether two documents are a match. | Yes |
| TagField | The field to tag duplicates with. | see Comments |
| TagValue | The static value to tag duplicates with in the TagField. | |
| ThreadHashField | The field containing the thread hash values used to determine whether a match is a duplicate. |
This index action accepts the following standard index action parameters.
| Parameter | Description |
|---|---|
| IgnoreMaxPendingItems | Whether to ignore the IndexQueueMaxPendingItems limit for this index action. |
| IndexUID | An identification code for any document tracking events. |
| NoArchive | Turn off configured archiving for the index action. |
| Priority | The priority for the index job. |
You must set Database when DuplicateAction is set to Database.
You must set TagField when DuplicateAction is set to Tag.
|
|