Difference between revisions of "Document Import Tool"

From Hornbill
Jump to navigation Jump to search
Line 33: Line 33:
 
* <code>debug</code>: true/false, defaults to false - Log extended debug information
 
* <code>debug</code>: true/false, defaults to false - Log extended debug information
 
* <code>dryrun</code>: true/false, defaults to false - Allow the Import to run without Creating Documents
 
* <code>dryrun</code>: true/false, defaults to false - Allow the Import to run without Creating Documents
* <code>instanceid:</code>: The ID of the Hornbill Instance to connect to
+
* <code>instanceid:</code>: This is the name of your Hornbill instance and can be found within the URL you use to navigate to it: live.hornbill.com/[instance name]/. E.g. if the URL you use to access your instance is live.hornbill.com/arescomputing/, then your instance id would be "arescomputing". Remember, this value is case sensitive.
 
* <code>version</code>: Output the tools version number, and exit
 
* <code>version</code>: Output the tools version number, and exit
  

Revision as of 10:13, 21 January 2021

About the Hornbill Document Manager Document Import Tool

The utility provides a simple, safe and secure way to bulk-import documents into Hornbill Document Manager. The tool is designed to facilitate the initial upload of content to Hornbill Document Manager cater for the initial upload and therefore does not perform updates to existing Document Manager documents.

The tool is designed to run behind your corporate firewall and connects to your Hornbill instance in the cloud over HTTPS/SSL. So as long as you have standard internet access then you should be able to use the tool without the need to make any firewall configuration changes.

Open Source

The Hornbill Document Import Tool is provided open source under the Hornbill Community Licence and can be found on GitHub

Installation Overview

  • Download the OS and architecture-specific ZIP archive
  • Extract zip into a folder you would like the application to run from e.g. C:\docimport\
  • Open the CSV files and add in the necessary configuration
  • Open a Command Line Prompt as Administrator
  • Change Directory to the folder containing the utility C:\docimport\
  • Run the command relevant to the OS of the machine you are running this on:

Windows:
goHornbillDocumentImport.exe -instanceid=yourinstanceid -apikey=yourapikey -csvd=docs_main.csv -csvc=docs_collections.csv -csvs=docs_shares.csv -csvt=docs_tags.csv -dryrun=true

Command Line Parameters

  • apikey: API Key to use as Authentication when connecting to Hornbill Instance
  • apitimeout: Number of Seconds to Timeout an API Connection (default 60)
  • csvc: Name of the CSV file containing document collection data
  • csvd: Name of the CSV file containing main document data
  • csvs: Name of the CSV file containing document sharing data
  • csvt: Name of the CSV file containing document tag data
  • debug: true/false, defaults to false - Log extended debug information
  • dryrun: true/false, defaults to false - Allow the Import to run without Creating Documents
  • instanceid:: This is the name of your Hornbill instance and can be found within the URL you use to navigate to it: live.hornbill.com/[instance name]/. E.g. if the URL you use to access your instance is live.hornbill.com/arescomputing/, then your instance id would be "arescomputing". Remember, this value is case sensitive.
  • version: Output the tools version number, and exit

CSV Files Overview

This tool uses one or more CSV files to import documents into Document Manager. These CSV files contain metadata required to perform the imports. Demonstration CSV files are provided within the package.

csvd

The CSV file name provided for this command line argument (-csvd) is the main template and is used to create the documents in Hornbill. It contains the main details of the files being imported. An example csv template is provided in the download package and must always contain the 6 columns listed below. If you don't wish to use the columns, just leave the rows blank but make sure the column name always exists in the csv file, in the order listed:

  • Filepath: MANDATORY - The full filepath of the file being imported into a document
  • Title: Optional - The title of the new document. If an empty string is provided, then the name of the source file (minus the extension) will be used as the document title
  • Status: MANDATORY - The status of the new document. Can be active, draft or retired
  • Description: Optional - The description of the new document
  • ReviewDate: Optional - The next review date for the new document (must be in the format "YYYY-mm-dd HH:mm:ss")
  • VersioningEnabled: MANDATORY - true/false, should versioning be enabled for the new document

csvc

The CSV file name provided by the csvc argument contains the Collections that the files being imported should be added to. The CSV should contain 2 columns, and a header row. The columns should be, and in this order:

  • Filepath: MANDATORY - The full filepath of the file being imported into a document - this should match Filepath values from the csvd file
  • Collection: MANDATORY - The Primary Key ID of the Collection that the new document should be added to

csvs

The CSV file name provided by the csvs argument contains the share data that the files being imported should create. The CSV should contain 5 columns, and a header row. The columns should be, and in this order:

  • Filepath: MANDATORY - The full filepath of the file being imported into a document - this should match Filepath values from the csvd file
  • URN: MANDATORY - The URN of the Library or User to create the share against
  • Read: MANDATORY - true/false, should the share allow Read permissions
  • ModifyContent: MANDATORY - true/false, should the share allow Modify Content permissions
  • ModifyMetaData: MANDATORY - true/false, should the share allow Modify Meta Data permissions

csvt

The CSV file name provided by the csvt argument contains the Tags that should be associated to the files being imported. The CSV should contain 2 columns, and a header row. The columns should be, and in this order:

  • Filepath: MANDATORY - The full filepath of the file being imported into a document - this should match Filepath values from the csvd file
  • Tag: MANDATORY - The Tag that the new document should be added to

HTTP Proxies

If you use a proxy for all of your internet traffic, the HTTP_PROXY Environment variable needs to be set. The https_proxy environment variable holds the hostname or IP address of your proxy server. It is a standard environment variable and like any such variable, the specific steps you use to set it depends on your operating system.

For windows machines, it can be set from the command line using the following:
set HTTP_PROXY=HOST:PORT
Where "HOST" is the IP address or host name of your Proxy Server and "PORT" is the specific port number.

Testing Overview

If you run the application with the argument -dryrun=true then no documents will be imported - the XML used to make the API calls will be saved in the log file so you can ensure the values are correct before running the import.

goHornbillDocumentImport.exe -dryrun=true

Logging Overview

All logging output is saved in the log directory, in the same directory as the executable. The file name contains the date and time the import was run docimport_201511061426130000.log

Change Log

Click "Read More" to view the Change Log.

v1.0.0 - 11/04/2019

Initial Release