The text_to_docs function splits a file into multiple documents.
text_to_docs( doc, sectionName, filename)
| Argument | Description |
|---|---|
doc
|
(LuaDocument) The document that you want to divide into multiple documents. |
sectionName
|
(string) The name of the section in the CFS configuration file that contains the TextToDocs configuration parameters. For information about these parameters, see TextToDocs Task Parameters. |
filename
|
(string) The file that contains the text to be converted (the original file that resulted in the document). |
LuaDocuments. A list of document objects representing the documents that are produced.
You might have a connector ingesting files from a repository, but want to split those files into multiple documents. The following example uses the get_filename function to find the path of the file associated with an ingested document, and uses the text_to_docs function to generate multiple documents. This example splits the file using settings in the [MyTextToDocs] section of the CFS configuration file. It then calls the ingest function to add the resulting documents to the ingest queue.
function handler(document)
if document:hasField("PROCESSED") then
return true
end
local file = get_filename(document)
local docs = text_to_docs(document, "MyTextToDocs", file)
for i, doc in ipairs(docs) do
doc:addField("PROCESSED", "YES")
ingest(doc)
end
return true
end
In this example, the original documents are also indexed. If you want to index only the documents generated by the text_to_docs function, you could return false from the handler function.
|
|