define({"0":{y:0,u:"../Content/Part_Overview.htm",l:-1,t:"Overview of Filter SDK",i:0.00115444194995053,a:"Overview of Filter SDK This section provides an overview of the OpenText File Content Extraction Filter SDK and describes how to get started with the C API. Introducing Filter SDK Getting Started"},"1":{y:0,u:"../Content/filter/intro_filtersdk.htm",l:-1,t:"Introducing Filter SDK",i:0.00164509035833651,a:"Introducing Filter SDK This section describes the File Content Extraction Filter SDK. "},"2":{y:0,u:"../Content/filter/Overview.htm",l:-1,t:"Overview",i:0.00135070131330492,a:"OpenText File Content Extraction Filter SDK enables you to incorporate text extraction functionality into your own applications. It extracts text and metadata from a wide variety of file formats on numerous  platforms , and can automatically recognize over 2000 document types. It supports both ..."},"3":{y:0,u:"../Content/filter/Package_Contents.htm",l:-1,t:"Package Contents",i:0.00115444194995053,a:"To get started with the Filter SDK, unzip the package to a directory on your machine. The Filter SDK packages for non-Windows platforms store the correct file permissions for the files in the SDK (for example, some files must be executable) . OpenText recommends that you unzip the SDK on the ..."},"4":{y:0,u:"../Content/Shared/_KV_License_Update.htm",l:-1,t:"License Information",i:0.0109367922474599,a:"Your license key controls whether you have the full version of the File Content Extraction SDK, or a trial version. It also determines whether the following advanced features are enabled: Advanced character set detection with the character set detection library (kvlangdetect). Advanced document ..."},"5":{y:0,u:"../Content/C/gettingstarted/KeyViewTutorial_Introduction.htm",l:-1,t:"Explore Filter SDK Features",i:0.00115444194995053,a:"You can use File Content Extraction by calling it from your own applications through one of its APIs. However, to help you get started, the SDK includes some non-production test utilities which allow you to use File Content Extraction from the command line and explore its functionality. This section ..."},"6":{y:0,u:"../Content/Chapter_GettingStarted.htm",l:-1,t:"Getting Started",i:0.0018413497216909,a:"Getting Started This section provides information about how to get started with the File Content Extraction Filter C API."},"7":{y:0,u:"../Content/C/gettingstarted/Getting_Started_Tutorial.htm",l:-1,t:"Getting Started with the C API",i:0.00171460066129868,a:"To help you get started with using the C API, you can use the programming tutorials:  C API Programming Tutorial . This tutorial helps you get started using the File Content Extraction Filter C SDK to filter files.  It aims to help you:  familiarize yourself with the Filter SDK C API. create a ..."},"8":{y:0,u:"../Content/C/gettingstarted/C_Tutorial_Basic.htm",l:-1,t:"C API Programming Tutorial",i:0.00207969307840928,a:"The File Content Extraction Filter SDK allows you to embed File Content Extraction functionality into your application. This section demonstrates how to get started with the Filter C API. OpenText also recommends that you refer to the filter_tutorial sample program, because the source code for that ..."},"9":{y:0,u:"../Content/C/gettingstarted/C_Tutorial_Advanced.htm",l:-1,t:"C API Advanced Programming Tutorial",i:0.00188316404382053,a:"This tutorial helps you to: familiarize yourself with more advanced Filter SDK functionality. work on streams, rather than files. This tutorial assumes that you have already completed the  C API Programming Tutorial . Using a Custom Stream In some cases you might want to get File Content Extraction ..."},"10":{y:0,u:"../Content/Part_UseSDK.htm",l:-1,t:"Use Filter SDK",i:0.00115444194995053,a:"Use Filter SDK This section explains how to perform some basic tasks by using the File Extraction and Filter APIs, and describes the sample programs. Use the File Extraction API Use the Filter API Use the Metadata API Sample Programs Advanced Topics Troubleshooting"},"11":{y:0,u:"../Content/extract/Extract_API.htm",l:-1,t:"Use the File Extraction API",i:0.00189088337523406,a:"Use the File Extraction API This section describes how to extract subfiles from a container file by using the File Extraction API. "},"12":{y:0,u:"../Content/extract/Introduction.htm",l:-1,t:"Introduction",i:0.00115444194995053,a:"A file can contain other files, which we call subfiles. Examples of subfiles include e-mail attachments and embedded OLE objects. A file that contains subfiles is called a container file. The following are examples of container files: Archive files such as the ZIP, TAR, and RAR formats. Mail ..."},"13":{y:0,u:"../Content/extract/Extract_Subfiles_C.htm",l:-1,t:"Extract Subfiles",i:0.00126347492959186,a:"To  filter  all files in a container file, you must open the container and extract its subfiles by using the File Extraction API. The extraction process is done repeatedly until all subfiles are extracted and exposed for  filtering . After a subfile is extracted, you can call  Filter  API functions ..."},"14":{y:0,u:"../Content/extract/Sanitize_paths.htm",l:-1,t:"Sanitize Absolute Paths",i:0.00663751493675019,a:"When you extract a subfile from a container and write it to disk, you specify an extract directory  and a path to extract the file to. To set the path, you might use the path in the container file that you are extracting from, as returned  from the function  fpGetSubFileInfo() .  However, if the ..."},"15":{y:0,u:"../Content/extract/_KV_xtract_c_SubFileInputStream.htm",l:-1,t:"Access a Subfile without Extracting the Entire File",i:0.00115444194995053,a:"Some operations do not require all of the data contained within a subfile. For example, format detection can often be performed using only the beginning and end of a file, without needing to extract the entire file to disk or memory. When filtering, you might process part of a file before deciding ..."},"16":{y:0,u:"../Content/extract/Extract_protected_containers.htm",l:-1,t:"Extract Protected Container Files",i:0.00115444194995053,a:"This section describes how to extract container files that are protected with a password or require other types of credentials. The following guidelines apply to specific file types. Lotus Notes NSF files. If you are running a Notes client with an active user connected to a Domino server, you must ..."},"17":{y:0,u:"../Content/Shared/_KV_xtract_Extract_Images.htm",l:-1,t:"Extract Images",i:0.00135070131330492,a:"You can use the File Extraction API  to extract images within a file. If you use this feature, images within the file  behave in the same way as any other subfile. Extracted images  have the name image[X].[Y], where [X] is an integer, and [Y] is the extension. The format of the image is the same as ..."},"18":{y:0,u:"../Content/extract/Understand_Subfile_Hierarchy.htm",l:-1,t:"Understand the Subfile Hierarchy",i:0.00673992807582333,a:"When you extract a container file, the paths or relationships between the subfiles might be irrelevant. For example, you might want to filter the subfiles contained in a ZIP archive, but you might not care about the file and folder structure. File Content Extraction provides information that enables ..."},"19":{y:0,u:"../Content/extract/_KV_ExtractMailFiles.htm",l:-1,t:"Extract Mail Files",i:0.00164509035833651,a:"A file that contains an e-mail message, for example a Microsoft Outlook .MSG file, is typically treated as a container even if it does not have any attachments. File Content Extraction provides access to the message text as a subfile.  When you extract a subfile that represents the text of a ..."},"20":{y:0,u:"../Content/extract/_KV_Set_CharSet_Extraction.htm",l:-1,t:"Specify a Target Character Set",i:0.00115444194995053,a:"File Content Extraction provides a way to specify the target character set when extracting subfiles. This is only used to set the character set when extracting the body of a mail message. OpenText recommends using UTF-8 or UTF-16 because these are widely supported and can encode a diverse range of ..."},"21":{y:0,u:"../Content/extract/Extract_Mailbox.htm",l:-1,t:"Mailbox Files",i:0.00115444194995053,a:"A Mailbox (MBX) file is a collection of individual emails compiled with RFC 822 and RFC 2045 - 2049 (MIME), and divided by message separators. There are many mail applications that export to an MBX format, such as Eudora Email and Mozilla Thunderbird.  In Eudora MBX files, attachments are inserted ..."},"22":{y:0,u:"../Content/extract/Extract_PST.htm",l:-1,t:"Outlook Personal Folders Files",i:0.00139976615414352,a:"File Content Extraction can extract Outlook items such as messages, appointments, contacts, tasks, notes, and journal entries from a PST file.  By default, an extracted message will include header information (the fields To, From, Sent, and so on) even though this information is stored in the PST ..."},"23":{y:0,u:"../Content/extract/Extract_LotusDominoXML.htm",l:-1,t:"Lotus Domino XML Language Files",i:0.00115444194995053,a:"You can make sure that dates and times extracted from Lotus Domino .DXL files are displayed in a uniform format. To extract custom date/time formats In the formats.ini file, set the DateTimeFormat option in the [dxlsr] section. For example: [dxlsr]\nDateTimeFormat=%m/%d/%Y %I:%M:%S %p In this ..."},"24":{y:0,u:"../Content/extract/Extract_LotusNotes_DB.htm",l:-1,t:"Lotus Notes Database Files",i:0.00115444194995053,a:"A Lotus Notes database is a single file that contains multiple documents called notes. Notes include design notes (such as forms, views, folders, navigators, outlines, pages, framesets, agents, and resources), data document notes, profile document notes, access control list notes, and collection ..."},"25":{y:0,u:"../Content/extract/Extract_PDF.htm",l:-1,t:"Extract Subfiles from PDF Files",i:0.00213573876672248,a:"File Content Extraction can extract document-level and page-level attachments from a PDF document. Document-level attachments are added by using the Attach A File tool, and can include links to or from the parent document or to other file attachments. Page-level attachments are added as comments by ..."},"26":{y:0,u:"../Content/Shared/_KV_PDF_ImprovePerformanceWithSmallImages.htm",l:-1,t:"Improve Performance for PDFs with Many Small Images",i:0.00139976615414352,a:"To improve performance when processing PDF files that contain many small images, you can choose to ignore images unless they exceed a minimum width and/or height. If an image is smaller than the minimum width or height, File Content Extraction does not extract the image.  For example, to ignore ..."},"27":{y:0,u:"../Content/extract/Extract_OLE.htm",l:-1,t:"Extract Embedded OLE Objects",i:0.00115444194995053,a:"The File Extraction API can extract embedded OLE objects from the following types of documents: Lotus Notes (DXL) Microsoft Excel Microsoft Word Microsoft PowerPoint Microsoft Outlook Microsoft Visio Microsoft Project OASIS Open Document Rich Text Format (RTF) When an embedded OLE object is ..."},"28":{y:0,u:"../Content/extract/_KV_xtract_c_Default_Filenames_Extracted_Subfiles.htm",l:-1,t:"Default File Names for Extracted Subfiles",i:0.00193184970641032,a:"When you do not specify a file name in the call to   fpExtractSubFile() , in some cases a default file name is applied to the extracted subfile. Default File Name for Mail Formats To avoid naming conflicts and problems with long file names, File Content Extraction applies its own names to the ..."},"29":{y:0,u:"../Content/Chapter_UseFilterAPI.htm",l:-1,t:"Use the Filter API",i:0.00189088337523406,a:"Use the Filter API This section describes how to perform some basic filtering tasks by using the Filter API."},"30":{y:0,u:"../Content/filter/Extract_Format_Info.htm",l:-1,t:"Obtain Format Information",i:0.00368088417578255,a:"The file format detection module (kwad) detects a file\u0027s format, and reports the information to your application. When detecting the file format, File Content Extraction uses the content of the file rather than the file extension. In some cases, the file extension can be an unreliable marker because ..."},"31":{y:0,u:"../Content/Shared/_KV_Code_Identification.htm",l:-1,t:"Source Code Identification",i:0.00170822542949287,a:"Source code identification attempts to identify files that contain a particular programming language as a specific format.  When you do not enable source code identification, files that contain source code are often identified as ASCII text files, which means the application treats them in the same ..."},"32":{y:0,u:"../Content/filter/Determine_Doc_Reader.htm",l:-1,t:"File Formats and Document Readers",i:0.00324579318078012,a:"The configuration file  formats.ini contains a section named [Formats]. Each line in this section matches a file format with the reader to use to parse the format. For most file formats there is only one suitable reader, but for some formats you can choose a reader to use. Each file format has the ..."},"33":{y:0,u:"../Content/Shared/_KV_Refine_Detection_of_Text.htm",l:-1,t:"Refine Detection of Text Files",i:0.00115444194995053,a:"During text detection, File Content Extraction analyzes the start and end of the document. It compares the usage of printable ASCII characters to other types of character to detect if the document is a plain text file. Depending on the type of documents you are working with, the default settings ..."},"34":{y:0,u:"../Content/Shared/_KV_AdditionalFormatInfo.htm",l:-1,t:"Additional Format Information",i:0.00131234493284956,a:"File Content Extraction returns basic information about a document\u0027s format, but sometimes it can be useful to have additional information. The file formats_description.tsv, which can be found in the bin directory, provides a mapping between file format ID, human-readable format description, and the ..."},"35":{y:0,u:"../Content/filter/Convert_Character_Sets.htm",l:-1,t:"Character Encoding",i:0.00274012068058066,a:"To ensure that all filtered text is output in the same character encoding, File Content Extraction performs character encoding conversion. In most cases, File Content Extraction can determine the character encoding used in a source file, and automatically converts the filtered text to the encoding ..."},"36":{y:0,u:"../Content/filter/Filter_Hidden_Data.htm",l:-1,t:"Choose What to Filter",i:0.00115444194995053,a:"Choose What to Filter File Content Extraction can be configured - through the API or its configuration file - to filter additional or hidden information, that is not filtered by default. You can also choose to skip certain types of information if they are not useful for your application."},"37":{y:0,u:"../Content/filter/Extract_Excel_Formulas.htm",l:-1,t:"Formulas - Microsoft Excel",i:0.00255245562598406,a:"When you filter a Microsoft Excel spreadsheet, File Content Extraction extracts the value of each cell. The value of a cell might be calculated from a formula, but the formula is not included in the output unless you configure File Content Extraction to include it. You can extract the cell value, ..."},"38":{y:0,u:"../Content/filter/Filter_Hidden_Text_Excel.htm",l:-1,t:"Hidden Text - Microsoft Excel",i:0.00255245562598406,a:"Normally, Filter does not filter hidden text from a Microsoft Excel spreadsheet because it is assumed the text should not be exposed. You can change this default behavior, and extract text from hidden rows, columns, and sheets from Excel spreadsheets by adding the following lines to the formats.ini ..."},"39":{y:0,u:"../Content/C/filter_api/Hidden_Data_Excel.htm",l:-1,t:"Hidden Data in Microsoft Excel Documents",i:0.00657858017613965,a:"The options in this section are deprecated in File Content Extraction version 24.4 and later. Use the  fpSetConfig()  function KVFLT_SHOWHIDDENTEXT and KVFLT_SHOWFORMULAS options instead. The options in this section are still available, but might be removed in a future release. There are several ..."},"40":{y:0,u:"../Content/filter/Filter_Worksheet_Names.htm",l:-1,t:"Worksheet Names",i:0.00115444194995053,a:"Worksheet Names Normally, Filter does not extract worksheet names from a spreadsheet because it is assumed that the text should not be exposed. To extract worksheet names, add the following lines to the formats.ini file: [Options]\ngetsheetnames=1"},"41":{y:0,u:"../Content/filter/Filter_Deleted_Text.htm",l:-1,t:"Deleted Text",i:0.00131234493284956,a:"Some applications have revision tracking features—such as Microsoft Word\u0027s Track Changes—that identify changes to a document. When these features are used, text that was deleted from a document might still be stored in the file. File Content Extraction does not filter deleted text by default, but ..."},"42":{y:0,u:"../Content/filter/Hidden_Data_HTML.htm",l:-1,t:"Hidden Text - HTML",i:0.00115444194995053,a:"File Content Extraction can filter comments from HTML documents. To enable comment filtering, you must set a flag in the formats.ini file. To enable filtering of comments from HTML files Open the formats.ini file in a text editor. Under [Options], set the following flag. GetHTMLHiddenInfo= 1"},"43":{y:0,u:"../Content/filter/Filter_Source_Code.htm",l:-1,t:"Source Code",i:0.00115444194995053,a:"When you enable  source code identification , file format detection can identify source code files by language. For example, a file containing Python code would be detected as Python_Fmt (531) rather than ASCII_Text_Fmt (2). The default reader for many of the source code formats is codesr. This ..."},"44":{y:0,u:"../Content/filter/Filter_Tagged_PDF_Content.htm",l:-1,t:"Tagged Content - PDF",i:0.00115444194995053,a:"A tagged PDF contains an additional layer of text for visually impaired readers. This text is used in text-to-speech features in various PDF viewing programs. You can enable filtering of tagged PDF text in the API. Filtering the extra layer of tagged content might result in duplicate text in the ..."},"45":{y:0,u:"../Content/filter/Skip_Embedded_Fonts.htm",l:-1,t:"Embedded Fonts - PDF",i:0.00115444194995053,a:"Text in PDF files sometimes contains embedded fonts. If you experience difficulties filtering embedded fonts, you can skip this type of text. If you skip embedded fonts, none of the content that contains embedded fonts is included in the output. To skip text that uses embedded fonts In the C API, ..."},"46":{y:0,u:"../Content/Shared/_KV_No_Phonetic_Guides.htm",l:-1,t:"Japanese Guide Text - Microsoft Excel",i:0.00115444194995053,a:"This option prevents output of Japanese phonetic guide text when Microsoft Excel (.xlsx) files are processed. To prevent output of Japanese phonetic guide text In the C API, call the function fpSetConfig and set the flag KVFLT_NOPHONETICGUIDES. In  formats.ini, set the following parameter. (This is ..."},"47":{y:0,u:"../Content/filter/FilterPwdProtectedFiles.htm",l:-1,t:"Filter Password Protected Files",i:0.00155766913704255,a:"Filter Password Protected Files This section describes how to filter password-protected files. To filter password-protected files In the C API, call the  fpSetConfig()  function with the following arguments. For example: (*fpSetConfig)(pKVFilter, KVFLT_SETSRCPASSWORD, 8, \"password\");"},"48":{y:0,u:"../Content/filter/Filter_PDF_Files.htm",l:-1,t:"Filter PDF Files",i:0.00131234493284956,a:"Filter PDF Files Filter has special configuration options that allow greater control over the conversion of Adobe Acrobat PDF files."},"49":{y:0,u:"../Content/filter/pdf2sr.htm",l:-1,t:"Use the pdf2sr Reader",i:0.00208184253119661,a:"The pdf2sr reader is an alternative that can be used instead of pdfsr for filtering PDF files. It uses a different parsing technology. The pdf2sr reader has the following features: supports standard and custom metadata (non-XMP) supports basic text extraction supports password protected PDFs ..."},"50":{y:0,u:"../Content/filter/Filter_PDF_LogicalOrder.htm",l:-1,t:"Filter PDF Files to a Logical Reading Order",i:0.00164509035833651,a:"The order of the text inside a PDF file has no relation to the layout of the text on the page or screen. By default, File Content Extraction extracts paragraphs in the order in which they are stored in the file, not the order in which they appear on the page. For example, a three-column article ..."},"51":{y:0,u:"../Content/filter/Rotated_Text.htm",l:-1,t:"Rotated Text",i:0.00115444194995053,a:"When a PDF that contains rotated text is filtered, the rotated text is extracted after the text at the end of the PDF page on which the rotated text appears. If the PDF is filtered with logical order enabled, and the amount of rotated text on a page surpasses a predefined threshold,  the page is ..."},"52":{y:0,u:"../Content/filter/Control_Hyphenation.htm",l:-1,t:"Control Hyphenation",i:0.00115444194995053,a:"There are two types of hyphens in a PDF document: A soft hyphen is added to a word by a word processor to divide the word across two lines. This is a discretionary hyphen and is used to ensure proper text flow in justified text. A hard hyphen is intentionally added to a word regardless of the word\u0027s ..."},"53":{y:0,u:"../Content/filter/Filter_Portfolio_PDF.htm",l:-1,t:"Filter Portfolio PDF Files",i:0.00115444194995053,a:"Filter Portfolio PDF Files Portfolio PDF files contain subfiles and an ActionScript interface for navigating between them. You can use the extraction API to extract the subfiles.  See  Extract Subfiles from PDF Files ."},"54":{y:0,u:"../Content/Shared/_KV_Table_Detection_PDF.htm",l:-1,t:"Table Detection for PDF Files",i:0.00115444194995053,a:"PDF files often contain data presented in a tabular form. However, there is no information about the table stored within the PDF itself – the text is simply placed in  an arrangement that looks like a table to the human eye. When this data is filtered, it can be very difficult to reconstruct the ..."},"55":{y:0,u:"../Content/C/filter_api/Filter_RMS_PDF_Files.htm",l:-1,t:"Filter RMS Protected PDF Files",i:0.00115444194995053,a:"RMS-protected PDF files have two parts. The first is an unencrypted \"outer\" PDF, which contains standard text stating that the document is protected. The second is an encrypted \"inner\" PDF, which is attached to the outer PDF and contains the actual content. To filter both parts separately, filter ..."},"56":{y:0,u:"../Content/filter/Filter_Spreadsheet_Files.htm",l:-1,t:"Filter Spreadsheet Files",i:0.00115444194995053,a:"Filter Spreadsheet Files Filter has special configuration options that enable greater control over the conversion of spreadsheet files."},"57":{y:0,u:"../Content/filter/Specify_Date_and_Time_Fo.htm",l:-1,t:"Specify Date and Time Format on UNIX Systems",i:0.00115444194995053,a:"In Microsoft Excel you can choose to format dates and times according to the system locale.  On Windows, File Content Extraction uses the system locale settings to determine how these dates and times should be formatted.  In other operating systems, File Content Extraction uses the U.S. short date ..."},"58":{y:0,u:"../Content/filter/large_numbers_excel.htm",l:-1,t:"Filter Very Large Numbers in Spreadsheet Cells to Precision Numbers",i:0.00115444194995053,a:"Numbers in Microsoft Excel files can be extracted and written to the output without formatting. By default, numbers are extracted in the format specified by the Excel file (for example, General, Currency and Date). Spreadsheets might contain cells that have very large numbers in them. Excel displays ..."},"59":{y:0,u:"../Content/Shared/_KV_Standardize_Cell_Formats.htm",l:-1,t:"Standardize Cell Formats",i:0.00115444194995053,a:"In Microsoft Excel you can format cell values. For example, the date \"15/09/2021\" could be formatted as \"15 September 2021\" or \"2021-09-15\". By default, File Content Extraction extracts cell values with formatting, as they would appear in Excel. If you prefer, you can configure File Content ..."},"60":{y:0,u:"../Content/filter/Tab_Delimited_Output.htm",l:-1,t:"Tab Delimited Output for Spreadsheets and Embedded Tables",i:0.00115444194995053,a:"You can use File Content Extraction to convert spreadsheets, embedded tables in Word Processing documents (for example, Microsoft Word documents), and tables detected by Optical Character Recognition (OCR), to tab-delimited form. In this format, File Content Extraction inserts a tab character ..."},"61":{y:0,u:"../Content/filter/Presentation_LogicalOrder.htm",l:-1,t:"Filter Presentation Files to a Logical Reading Order",i:0.00115444194995053,a:"With some file formats, for example Microsoft PowerPoint presentations, the order of the text inside the file has no relation to the layout of the text on the page or screen. Recently modified text might appear at the end of a file, even though that text belongs at the beginning of the document. You ..."},"62":{y:0,u:"../Content/filter/Filter_XML_Files.htm",l:-1,t:"Filter XML Files",i:0.00131234493284956,a:"File Content Extraction can detect many types of XML file, including: Generic XML Microsoft Office 2003 XML (Word, Excel, and Visio) StarOffice/OpenOffice XML (text document, presentation, and spreadsheet) When you filter XML, you can tell File Content Extraction which elements to treat as content ..."},"63":{y:0,u:"../Content/filter/Configure_Element_Extrac.htm",l:-1,t:"Configure Element Extraction for XML Documents",i:0.00254543388405379,a:"When filtering XML files, you can specify which elements and attributes to extract according to the file\u0027s format ID or root element. This option is useful when you want to extract only relevant text elements, such as abstracts from reports, or a list of authors from an anthology.  A root element is ..."},"64":{y:0,u:"../Content/filter/Optical_Character_Recognition.htm",l:-1,t:"Optical Character Recognition",i:0.00115444194995053,a:"When processing raster image files, File Content Extraction can perform Optical Character Recognition (OCR) to attempt to filter text that might be visible in the image. If text is detected to form part of a table, it will be filtered in the same way as tables in Word Processing documents. File ..."},"65":{y:0,u:"../Content/filter/OCR.htm",l:-1,t:"Optimize OCR Performance",i:0.00541587813898319,a:"The default settings for OCR attempt to detect as much text as possible. For example, File Content Extraction attempts to detect text in multiple languages and alphabets, and rotated text in increments of 90 degrees from upright. This increases the amount of text that can be detected, prioritizing ..."},"66":{y:0,u:"../Content/filter/OCR_Config_Examples.htm",l:-1,t:"Configure OCR",i:0.00115444194995053,a:"In the following examples, OCR is configured to process scanned pages that contain only English or only Japanese text. Providing information about the input can result in a performance improvement, but OCR may fail to recognize text that does not match your configuration. For more information about ..."},"67":{y:0,u:"../Content/filter/Export_to_HTML.htm",l:-1,t:"Export Files to HTML",i:0.00115444194995053,a:"The File Content Extraction Filter SDK allows you to extract text from many different file formats. File Content Extraction also supports conversion of many file formats into HTML so that documents can be viewed in a web browser. File Content Extraction has a dedicated HTML Export API, but if you ..."},"68":{y:0,u:"../Content/kv_RMS/_KV_RMS_ConfigureProxyForRMS.htm",l:-1,t:"Configure the Proxy for RMS",i:0.00115444194995053,a:"When File Content Extraction needs to access contents that are protected by the Microsoft Rights Management System (RMS), it must make HTTP requests. By default, File Content Extraction uses the system proxy settings for these requests.  To use different proxy settings, you can configure them  in ..."},"69":{y:0,u:"../Content/filter/DocumentRestrictions.htm",l:-1,t:"Document Restrictions",i:0.00387006560178366,a:"Some applications, and corresponding file formats, allow users to restrict the ways in which a document can be used. For example, you might be able to read a document but additional credentials (such as a password) could be required to modify the document content, add comments, or print the ..."},"70":{y:0,u:"../Content/filter/Unexpected_ZIP_Detection.htm",l:-1,t:"Unexpected ZIP Detection",i:0.00115444194995053,a:"Concatenating a ZIP file onto another file, such as a JPEG, is a well-known method for attempting to hide files from inspection. Users can zip up their sensitive files, then concatenate them on to the other file by using something like the Windows copy command-line tool. The result is a file that ..."},"71":{y:0,u:"../Content/filter/Enable_Diagnostic_Logging.htm",l:-1,t:"Enable Diagnostic Logging",i:0.00115444194995053,a:"You can enable logging in File Content Extraction to diagnose issues that might occur. You configure logging in the formats.ini configuration file, by creating a [Logging] section. Turning on logging can slow File Content Extraction down and quickly generate many large log files. OpenText recommends ..."},"72":{y:0,u:"../Content/metadata/_KV_Using_Metadata_API.htm",l:-1,t:"Use the Metadata API",i:0.0167173230072364,a:"Use the Metadata API This section describes how to use File Content Extraction to access metadata."},"73":{y:0,u:"../Content/metadata/_KV_What_Is_Metadata.htm",l:-1,t:"What is Metadata?",i:0.00193585889801337,a:"Documents may contain information about the document itself: we call this metadata. For instance, a raster image file contains metadata recording the image\u0027s width and height; a word processing document may contain metadata recording the document\u0027s author and title. Metadata can be represented by ..."},"74":{y:0,u:"../Content/metadata/Mail_Metadata.htm",l:-1,t:"Mail Metadata",i:0.00191801842621349,a:"An e-mail message body and any attachments are considered by File Content Extraction as subfiles of the mail container (such as an MSG or EML file). The metadata for the message (the header fields such as \"From\", \"Sent\", \"To\", \"CC\", \"Subject\", and so on) are typically stored in the container and can ..."},"75":{y:0,u:"../Content/metadata/_KV_Understanding_Metadata.htm",l:-1,t:"Field Standardization",i:0.00337554144077723,a:"Common metadata fields such as \"Title\", \"Author\", and \"Subject\" exist in many different file formats, but can be stored in different ways. For instance, one raster image format may store the image width as a key-value pair with key Width. Another format may store the image width in bytes 16-19 of ..."},"76":{y:0,u:"../Content/metadata/_KV_Process_Fields.htm",l:-1,t:"Metadata Elements",i:0.00115444194995053,a:"This section explains how to process metadata elements. Standardized Elements When File Content Extraction understands the meaning of a metadata field in a document, it outputs that data in a standardized element - a  KVMetadataElement  object where: eKey is the standard key, which indicates the ..."},"77":{y:0,u:"../Content/metadata/_KV_Metadata_Examples.htm",l:-1,t:"Metadata Examples",i:0.00115444194995053,a:"If you want to process both standardized and non-standardized metadata elements, you can loop through a  KVMetadataList  without checking the eKey member – both standardized and non-standardized metadata can be handled in the same way. However, standardization allows you to handle particular ..."},"78":{y:0,u:"../Content/metadata/_KV_Standardized_Metadata_Fields.htm",l:-1,t:"Standardized Metadata Elements",i:0.00253870782374737,a:"The following table describes the standardized metadata elements that File Content Extraction can create. Accessed, created, and modified dates are retrieved from values stored within the document, and are typically set by the creating application. For documents that exist on disk, this may differ ..."},"79":{y:0,u:"../Content/Chapter_Samples.htm",l:-1,t:"Sample Programs",i:0.00155839944807167,a:"Sample Programs This section describes the sample programs provided with the Filter SDK."},"80":{y:0,u:"../Content/C/samples/Introduction.htm",l:-1,t:"Introduction",i:0.00115444194995053,a:"The C sample programs demonstrate how to use the C implementation of the Filter API. The sample code is intended to provide a starting point for your own applications or to be used for reference purposes. The following C sample programs are provided: tstxtract filter The source code and makefile ( ..."},"81":{y:0,u:"../Content/C/samples/filter_tutorial.htm",l:-1,t:"filter_tutorial",i:0.00115444194995053,a:"The filter_tutorial sample program demonstrates how to use the File Extraction, Filter, and Metadata APIs together to get content from a file. Given an input file, it extracts text, optionally gets format information, extracts metadata, and extracts subfiles. It recursively processes each subfile ..."},"82":{y:0,u:"../Content/Shared/_KV_xtract_samples_c.htm",l:-1,t:"tstxtract",i:0.00164509035833651,a:"The tstxtract sample program demonstrates the File Extraction API. It opens a file, extracts subfiles from the file, and repeats the extraction process until all subfiles are extracted. It also demonstrates how to extract the default set of metadata and pass integer or string names to extract ..."},"83":{y:0,u:"../Content/C/samples/filter.htm",l:-1,t:"filter",i:0.00204340136989469,a:"The filter sample program demonstrates the advanced functionality of the Filter API. It is composed of the following files: filter.c—command line interface filtersupport.c—contains core functionality, such as file filtering, stream filtering, metadata extraction, and format detection. ..."},"84":{y:0,u:"../Content/filter/Performance_Optimization.htm",l:-1,t:"Performance Optimization",i:0.00115444194995053,a:"This section provides guidance and best practice for optimizing the performance when using File Content Extraction.  API Usage Initializing and shutting down a Filter session takes time. For best performance, initialize a session once before you process any files, and shut it down only when you have ..."},"85":{y:0,u:"../Content/kv_security/_KV_SecurityBestPractises.htm",l:-1,t:"Security Best Practices",i:0.00121757702110689,a:"This section outlines some security best practices to consider when using File Content Extraction. Keep File Content Extraction Up to Date. New releases may include security updates, including updates to third-party libraries. See  Third-Party Library Upgrade Policy . Protect the Temporary ..."},"86":{y:0,u:"../Content/kv_security/_KV_ProtectTempDir.htm",l:-1,t:"Protect the Temporary Directory",i:0.0019363911205699,a:"Filter writes temporary files to the temporary directory. These temporary files frequently include the contents of files that you are processing, including decrypted parts of encrypted files. Sensitive information is therefore exposed in the temporary directory, so it is important that only users ..."},"87":{y:0,u:"../Content/kv_security/_KV_RunMinimalPrivileges.htm",l:-1,t:"Run Filter with Minimal Privileges",i:0.00240851051081878,a:"OpenText recommends that you run Filter with only those privileges that are necessary for it to function correctly, which follows best practice for any application. In particular, Filter needs access only to the following directories: the Filter bin directory. any input and output locations. the ..."},"88":{y:0,u:"../Content/kv_security/_KV_DLLPreloading.htm",l:-1,t:"Mitigate Against DLL Pre-Loading",i:0.00136143458451161,a:"When an  application loads a shared library such as  kvfilter.dll or kvfilter.so , the Operating System or runtime linker might search several locations. This search can allow DLL pre-loading attacks if an attacker is able to place a malicious binary in one of the locations searched. It might also ..."},"89":{y:0,u:"../Content/Chapter_AdvancedTopics.htm",l:-1,t:"Advanced Topics",i:0.00131799141941252,a:"Advanced Topics This section describes some advanced topics that apply to both the Filter API and the Extraction API."},"90":{y:0,u:"../Content/filter/Architectural_Overview.htm",l:-1,t:"Architectural Overview",i:0.00269589747807064,a:"The general architecture of File Content Extraction is the same across all supported platforms and is illustrated in the following diagram. Each component is described in the following table. Out-of-Process Filtering By default, Filter runs independently from the calling application process. This ..."},"91":{y:0,u:"../Content/filter/Persist_the_Child_Proces.htm",l:-1,t:"Persist the Child Process",i:0.00200776229527765,a:"In out-of-process filtering, the parent process maintains a persistent connection with the child server after each file is filtered. While the connection is preserved in this way, subsequent filtering requests are processed more quickly because the server is already prepared to receive data.  You ..."},"92":{y:0,u:"../Content/filter/Run_Filter_In_Process.htm",l:-1,t:"Run Filter In Process",i:0.00183667139580426,a:"By default, Filter runs independently from the calling application process. This is called out-of-process filtering. Out-of-process filtering protects the stability of the calling application in the rare case when a malformed document causes Filter to fail. You can configure Filter to run in the ..."},"93":{y:0,u:"../Content/filter/Testing_poisonsr.htm",l:-1,t:"Test Error Handling and Recovery",i:0.00115444194995053,a:"The File Content Extraction Filter SDK includes a testing utility, poisonsr, which you can use to test error handling and recovery. This utility can replicate various error conditions, such as crashes and hangs. This reader is for testing purposes only. You must not redistribute it. Setup  The ..."},"94":{y:0,u:"../Content/filter/Out_of_Process_Dump_Files.htm",l:-1,t:"Troubleshooting with Out-of-Process Dump Files",i:0.00115444194995053,a:"On Linux platforms, the File Content Extraction child process can produce a dump file in the event of an abnormal termination.  To enable dump files Set the environment variable KVOOP_DUMP_ENABLE to 1. When you enable out-of-process dump files, if an abnormal termination occurs, File Content ..."},"95":{y:0,u:"../Content/Troubleshooting/Troubleshooting_Intro.htm",l:-1,t:"Troubleshooting",i:0.00131799141941252,a:"This section describes various methods you can use to diagnose issues if File Content Extraction is not working as you expect. It also includes guidance about when to contact OpenText Support, and the kind of information it is helpful to provide when you do. When you need to troubleshoot issues with ..."},"96":{y:0,u:"../Content/Troubleshooting/Common_Problems.htm",l:-1,t:"Common Problems and Solutions",i:0.00139976615414352,a:"This section describes some common problems you might come across while using File Content Extraction, and some suggestions for how to solve the problem. Output does not contain content that you expect If you do not see content that you expect to see in the output: Check that you are using the right ..."},"97":{y:0,u:"../Content/Troubleshooting/ErrorCodes.htm",l:-1,t:"Error Codes",i:0.00115444194995053,a:"Error Codes The following table describes various error codes that you might encounter while running File Content Extraction, and how to troubleshoot the problem."},"98":{y:0,u:"../Content/Troubleshooting/General_Considerations.htm",l:-1,t:"General Troubleshooting Considerations",i:0.00115444194995053,a:"This section provides some advice about things that might cause problems in File Content Extraction. Antivirus Software File Content Extraction generally works fine on a computer that has antivirus or anti-malware software installed. However, these programs can sometimes cause issues by preventing ..."},"99":{y:0,u:"../Content/Troubleshooting/Create_Support_Tickets.htm",l:-1,t:"Create a Support Ticket",i:0.00139240749508223,a:"The following section provides some guidelines for creating a ticket with OpenText support to ensure that it can be dealt with as quickly and effectively as possible.  Reproduce issues with a test program It is easier and faster for support to diagnose your issue if you can provide a command that ..."},"100":{y:0,u:"../Content/Part_C_API_Ref.htm",l:-1,t:"C API Reference",i:0.00115444194995053,a:"C API Reference This section provides detailed reference information for the C-language implementation of the File Extraction and Filter APIs. File Extraction API Functions File Extraction API Structures Filter API Functions Filter API Structures Enumerated Types"},"101":{y:0,u:"../Content/C/extract/XTRACT_functions.htm",l:-1,t:"File Extraction API Functions",i:0.00277037181190664,a:"This section describes the functions in the File Extraction API. The File Extraction functions open a container file, and extract the container’s subfiles so that the subfiles are exposed and available for  filtering . Subfiles can be files within a Zip archive, messages in a mail store, attachments ..."},"102":{y:0,u:"../Content/C/extract/fpCloseFile.htm",l:-1,t:"fpCloseFile()",i:0.0113127590609027,a:"This function frees the memory allocated by  fpOpenFile()  and closes the file. Syntax int (pascal *fpCloseFile) (void *pFile); Arguments Returns If the file is closed, the return value is KVERR_Success.  If the file is not closed, the return value is an error code. Lifetime and Memory Management ..."},"103":{y:0,u:"../Content/C/extract/fpCloseSubFile.htm",l:-1,t:"fpCloseSubFile()",i:0.00340603450843727,a:"Closes a stream opened by  fpOpenSubFile() . Syntax int (pascal *fpCloseSubFile) (\n        KVInputStream *stream); Arguments Returns If the subfile is closed, the return value is KVERR_Success If the subfile is not closed, the return value is an error code. Lifetime and Memory Management After you ..."},"104":{y:0,u:"../Content/C/extract/fpExtractSubFile.htm",l:-1,t:"fpExtractSubFile()",i:0.0230127762711939,a:"This function extracts a subfile from a container file to a user-defined path or output stream. This call returns file format information when file is extracted to a path. Syntax int (pascal *fpExtractSubFile)  (\n\t            void                          *pFile, \n\t\t    ..."},"105":{y:0,u:"../Content/C/extract/fpFreeStruct.htm",l:-1,t:"fpFreeStruct()",i:0.0102570502084212,a:"This function frees the memory allocated by  fpGetMainFileInfo() ,  fpGetSubFileInfo() ,  fpGetSubFileMetaData() , and  fpExtractSubFile() . Syntax int (pascal *fpFreeStruct) (\n    void      *pFile, \n    void      *obj);  Arguments Returns If the allocated memory is freed, the return value is ..."},"106":{y:0,u:"../Content/C/extract/fpGetExtractInfo.htm",l:-1,t:"fpGetExtractInfo()",i:0.00194709126798395,a:"This function returns information about a stream opened by  fpOpenSubFile() . Syntax int (pascal *fpGetExtractInfo) (\n       KVInputStream *stream,\n       KVSubFileExtractInfo *extractInfo);\n Arguments Returns If an issue occurs when obtaining the extraction information, the return value is an error ..."},"107":{y:0,u:"../Content/C/extract/fpGetExtractStatus.htm",l:-1,t:"fpGetExtractStatus()",i:0.00126347492959186,a:"This function returns the status of an input stream opened by  fpOpenSubFile() . Syntax int (pascal *fpGetExtractStatus) (\n       KVInputStream *stream);\n Arguments Returns If an error occurred when one of the input stream function pointers was last called, this function should return the associated ..."},"108":{y:0,u:"../Content/C/extract/fpGetMainFileInfo.htm",l:-1,t:"fpGetMainFileInfo()",i:0.0110945981323686,a:"This function determines whether a file is a container file—that is, whether it contains subfiles—and should be extracted further.  Syntax int (pascal *fpGetMainFileInfo) (\n    void               *pFile, \n    KVMainFileInfo     *fileInfo);  Arguments Returns If the file information is retrieved, the ..."},"109":{y:0,u:"../Content/C/extract/fpGetSubFileInfo.htm",l:-1,t:"fpGetSubFileInfo()",i:0.0148735115273566,a:"This function gets information about a subfile in a container file. Syntax int (pascal *fpGetSubFileInfo)  (\n    void                    *pFile, \n    int                      index,\n    KVSubFileInfo           *subFileInfo); Arguments Returns If the file information is retrieved, the return value is ..."},"110":{y:0,u:"../Content/C/extract/fpGetSubFileMetadataList.htm",l:-1,t:"fpGetSubFileMetadataList()",i:0.00747012379407303,a:"Containers can store metadata about their subfiles that is independent of the metadata stored within those subfiles. This function allows you to retrieve the metadata stored within the container about a particular subfile. Syntax KVErrorCode pascal fpGetSubFileMetadataList(\n    void* const pFile,\n   ..."},"111":{y:0,u:"../Content/C/extract/fpGetSubFileMetaData.htm",l:-1,t:"fpGetSubFileMetaData()",i:0.0046186614901253,a:"The function fpGetSubFileMetaData() is deprecated in KeyView 23.2.0 and later. OpenText recommends that you use the function  fpGetSubFileMetadataList()  instead. This function is still available for existing implementations, but it might be incompatible with new functionality and might be removed ..."},"112":{y:0,u:"../Content/C/extract/fpOpenDocumentFromSubfile.htm",l:-1,t:"fpOpenDocumentFromSubFile()",i:0.00261906846812695,a:"This function opens a subfile as a KVDocument, which can be passed directly to other FilterSDK interface functions. Syntax KVErrorCode (pascal* fpOpenDocumentFromSubFile)(\n    void* pFile,\n    KVOpenDocumentFromSubFileArg openArg,\n    KVDocument* ppDocument,\n    KVSubFileExtractInfo* extractInfo\n); ..."},"113":{y:0,u:"../Content/C/extract/fpOpenFile.htm",l:-1,t:"fpOpenFile()",i:0.0186774518484558,a:"This function opens a file to make the file accessible for subfile extraction. Syntax int (pascal *fpOpenFile) (\n    void                      *pContext,\n    KVOpenFileArg              openArg,\n    void                      **pFile); Arguments Returns If the file is opened, the return value is ..."},"114":{y:0,u:"../Content/C/extract/fpOpenFileFromFilterSession.htm",l:-1,t:"fpOpenFileFromFilterSession()",i:0.0084583581269067,a:"This function opens a container file so that you can extract its subfiles. Syntax KVErrorCode (pascal *fpOpenFileFromFilterSession)(\n    KVFilterSession session,\n    KVOpenFileArg openArg,\n    void** pFile\n); Arguments Returns If the file is opened successfully, the return value is KVERR_Success.  ..."},"115":{y:0,u:"../Content/C/extract/fpOpenSubFile.htm",l:-1,t:"fpOpenSubFile()",i:0.00654602287103193,a:"This function opens a subfile as a stream, which can be used directly or passed to other File Content Extraction interfaces. Syntax int (pascal *fpOpenSubFile) (\n        void                  *pFile,\n        KVExtractSubFileArg    extractArg,\n        KVInputStream        **stream); Arguments Returns ..."},"116":{y:0,u:"../Content/C/extract/XTRACT_struct.htm",l:-1,t:"File Extraction API Structures",i:0.00135070131330492,a:"File Extraction API Structures This section provides information on the structures used by the File Extraction API. These structures define the input and output parameters required to extract subfiles from a container file, and are defined in kvxtract.h. "},"117":{y:0,u:"../Content/C/extract/KVCredential.htm",l:-1,t:"KVCredential",i:0.0030917076952288,a:"This structure contains a count of the number of credential elements, and a pointer to the first element of the array of individual elements. The structure is initialized by calling  fpOpenFile() , and is defined in kvxtract.h. typedef struct  ..."},"118":{y:0,u:"../Content/C/extract/KVCredentialComponent.htm",l:-1,t:"KVCredentialComponent",i:0.0123647084431614,a:"This structure contains the value of a credential item. The structure is defined in kvxtract.h. typedef struct  tag_KVCredentialComponent\n{\n    KVCredKeyType      keytype;\n    union\n    {\n        void           *pkey;\n        char           *skey;\n        unsigned ..."},"119":{y:0,u:"../Content/C/extract/KVExtractInterface.htm",l:-1,t:"KVExtractInterface",i:0.00190058937948563,a:"The members of this structure are pointers to the file extraction functions described in  File Extraction API Functions . Calling the  fpGetExtractInterface()  function assigns the function pointers in the structure. The structure is defined in kvxtract.h.  typedef struct  ..."},"120":{y:0,u:"../Content/C/extract/KVExtractSubFileArg.htm",l:-1,t:"KVExtractSubFileArg",i:0.0367948732363682,a:"This structure defines the input parameters required to extract a subfile. See  fpExtractSubFile() . The structure is defined in kvxtract.h. typedef struct ..."},"121":{y:0,u:"../Content/C/extract/KVGetSubFileMetaArg.htm",l:-1,t:"KVGetSubFileMetaArg",i:0.00270654776482891,a:"This structure defines the metadata tags whose values are retrieved by  fpGetSubFileMetaData() . This structure is defined in kvxtract.h. typedef struct  ..."},"122":{y:0,u:"../Content/C/extract/KVGetSubFileMetadataListArg.htm",l:-1,t:"KVGetSubFileMetadataListArg",i:0.0020113229688611,a:"This structure defines the input parameters required to retrieve subfile metadata using the function  fpGetSubFileMetadataList() . This structure is defined in kvxtract.h. \ntypedef struct tag_KVGetSubfileMetadataListArg\n{\n    KVStructHeader;\n    int index; /*The sub file index*/\n    KVCharSet ..."},"123":{y:0,u:"../Content/C/extract/KVMainFileInfo.htm",l:-1,t:"KVMainFileInfo",i:0.00482485246803706,a:"This structure contains information about a main file that is open for extraction. It is initialized by calling  fpGetMainFileInfo() .  This structure is defined in kvxtract.h. typedef struct  ..."},"124":{y:0,u:"../Content/C/extract/KVMetadataElem.htm",l:-1,t:"KVMetadataElem",i:0.00256458120995637,a:"The KVMetadataElem structure is deprecated in KeyView 23.2.0 and later. OpenText recommends that you access metadata using the metadata API described in  Use the Metadata API . This structure is still available for existing implementations, but it might be incompatible with new functionality and ..."},"125":{y:0,u:"../Content/C/extract/KVMetaName.htm",l:-1,t:"KVMetaName",i:0.00294373027470151,a:"The KVMetaName structure is deprecated in KeyView 23.2.0 and later. OpenText recommends that you access metadata using the metadata API described in  Use the Metadata API . This structure is still available for existing implementations, but it might be incompatible with new functionality and might ..."},"126":{y:0,u:"../Content/C/extract/KVOpenDocumentFromSubfileArg.htm",l:-1,t:"KVOpenDocumentFromSubFileArg",i:0.00156611895093447,a:"This structure defines the input parameters required to open a subfile as a document. See  fpOpenDocumentFromSubFile() . The structure is defined in kvxtract.h. typedef struct tag_KVOpenDocumentFromSubFileArg\n{\n    KVStructHeader;\n\n    unsigned int index; /* The sub file index */\n    \n    DWORD      ..."},});