Difference between revisions of "ISO:Operations"

From Hornbill
Jump to navigation Jump to search
 
(26 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 +
This document can now be found at its new location in the [https://docs.hornbill.com/hornbill-cloud/iso/operations/ Hornbill Document Library].
 +
 +
[[file:hornbill-document-library.png|ISO Operations|link=https://docs.hornbill.com/hornbill-cloud/iso/operations/]]
 +
 +
<!--
 
==Capacity management==
 
==Capacity management==
 
 
Network utilization, disk utilisation and server load is monitored by the Nagios tools. This tool provides automatic alerts when pre-set thresholds are exceeded. The Chief Technical Officer is responsible for ensuring this monitoring is conducted and action taken to resolve any issues.
+
Network utilization, disk utilisation and server load is monitored. This provides automatic alerts when pre-set thresholds are exceeded. The Chief Technical Officer is responsible for ensuring this monitoring is conducted and action taken to resolve any issues.
  
 
We have hardware available for our expected growth of Hornbill and this is reviewed\increased every 3 months with the purchasing of additional hypervisors\rack space. If required, we can also create a instance or complete replica of the Hornbill infrastructure in AWS in record time meaning that capacity and scalability is never an issue. This scalability along with the underlying server code also removes all limitations for user increase as new servers can be added as demand increases.  
 
We have hardware available for our expected growth of Hornbill and this is reviewed\increased every 3 months with the purchasing of additional hypervisors\rack space. If required, we can also create a instance or complete replica of the Hornbill infrastructure in AWS in record time meaning that capacity and scalability is never an issue. This scalability along with the underlying server code also removes all limitations for user increase as new servers can be added as demand increases.  
  
Network utilization, disk utilisation and server load is monitored realtime by collection of over 100 data points (CPU\RAM\HDD  utilization for all services etc) for use in graphing and additionally collected Nagios. These tools\charts\engines provides automatic alerts when pre-set thresholds are exceeded. The Chief Technical Officer is responsible for ensuring this monitoring is conducted and action taken to resolve any issues.
+
Network utilization, disk utilisation and server load is monitored realtime by collection of over 1000 data points (CPU\RAM\HDD  utilization for all services etc) for use in graphing\realtime monitoring. These tools\charts\engines provides automatic alerts when pre-set thresholds are exceeded. The Chief Technical Officer is responsible for ensuring this monitoring is conducted and action taken to resolve any issues.
  
 
==Monitoring ==
 
==Monitoring ==
All Instance, Services and Hardware are monitored from several locations around the world (Each monitor server acts as a backup to the primary and results compared). We check over 100 different metrics per instance (and anything that an instance may require) every 5 minutes to ensure all is well. Any warning is logged and escalated to the Cloud Team.  
+
All Instance, Services and Hardware are monitored from several locations around the world (Each monitor server acts as a backup to the primary and results compared). We check over 1000 different metrics per instance (and anything that an instance may require) every 5 minutes to ensure all is well. Any warning is logged and escalated to the Cloud Team.  
  
 
Checks include (Not comprehensive list)
 
Checks include (Not comprehensive list)
Line 20: Line 25:
 
* Backups (Sync checks, replication checks, Off instance checks etc)  
 
* Backups (Sync checks, replication checks, Off instance checks etc)  
 
* Sanity (Checks for Mail Queues, Expected load etc).
 
* Sanity (Checks for Mail Queues, Expected load etc).
 +
* SIEM (APIs\Resource Usuage\Network Traffic and DB Access\Requests)
 +
 +
 +
Hornbill also maintains a fingerprint for each instance for each hour of the day across different days for key metrics (APIs\Resource Usage\Network Traffic\DB Access\Count of Emails In\Out etc) which are compared with the live instance metrics every 15 minute. This allows us to detect any abnormal patterns which may indicate internal issues, threats, security issues, misconfiguration or other strangeness near realtime. Anything outside of a standard deviation from normal for 1 or more the key metrics for each fingerprint is subjected to further automatic review and the outcome of this will escalate under conditions to the Cloud Team. After review this may be escalated to the instance contacts for clarification or notification of possible issues. In extreme circumstances (Either exceptional load, possible security issue or similar) Hornbill will act to prevent harm to the instance or platform and the contact for instance informed of the action taken and reason.
  
 
==Backups ==
 
==Backups ==
Line 41: Line 50:
  
 
Therefore, should a failure exist on Primary hardware we can recover from replicated files (Max 15mins) or in complete disaster tertiary 3rd backups
 
Therefore, should a failure exist on Primary hardware we can recover from replicated files (Max 15mins) or in complete disaster tertiary 3rd backups
 +
 +
Backups are checked for integrity automatically at time of taking, upload to S3 and at different levels either Weekly or Monthly.
 +
 +
It is noted that for GDPR removal requests any deletion of data from an instance will not be deleted from any backups and will be removed via the cycling of the backup (Upto 90 days.). On any restore Hornbill will then re-remove the requested records as per the Service Manager incident request.  The customer can request to have all backups deleted (On the understanding that no historic backups will be restorable prior to this) that contain the removal request data.
 +
 +
==Access==
 +
Access to any system is restricted. All default passwords are changed. All Logins to systems processing customer data will automatically send a report to the Hornbill Login Workspace (allows anyone in company to highlight or ask questions on why access and provides transparency) and raises a request. This Login must then be associated with a given service manager request or Hornbill workspace post to ensure a valid reason exists to login. These Logins are then audited by the Security manager to ensure no unauthorised access was performed.
 +
 +
Passwords on all systems are changed on leaver or schedule.
  
 
Backups are restored (and therefore restore process tested) nightly to ZIP before being pushed to offsite location and a random backup restore is performed on schedule basis to ensure that backups are correct\valid.
 
Backups are restored (and therefore restore process tested) nightly to ZIP before being pushed to offsite location and a random backup restore is performed on schedule basis to ensure that backups are correct\valid.
 +
 +
==Temporary Files==
 +
All temporary files on systems that process customer data are deleted after 24 hours.  All other systems are set to clear temp folders on reboot. These are purged at 0100 on Nightly basis from all nodes
 +
 +
==Customer Access to Logs==
 +
Customers can access their own logs via the Admin portal, these logs are restricted to their instance and any shared logs (Such as Web front end, DataService logs etc) are not available.  This ensures that the cloud service customer can only access records that relate to that cloud service customer’s activities and cannot access any log records which relate to the activities of other cloud service customer.
 +
Customer accessible logs are available for the current day (Note that this is different to Audit Logs, such as security logs which are available for 6months) via the portal and on request for 2 months from Hornbill (requests should be submitted to data.processor-hornbill@live.hornbill.com ) by the nominated contacts for a given instance (Technical or Data Security).
 +
 +
=== Customer Access to Audit and Access Logs. ===
 +
The above logs are more aligned to identifying issues or misconfiguration in processes or other aspects of the application rather than Audit\Security or Access. The logs used for Audit\Security or Access are typically far larger based on the sheer volume of data and are kept for 7 days. These should therefore be exported by the customer (via Scheduled report or other integration) to a repository of their choice.
 +
 +
The logs
 +
 +
* Primary Security Log - Contains all Login Requests and the source IP, timestamp, target (portal, live, admin etc) , result of the action and Unique ID.
 +
* Primary Audit Log - Contains all pages\entities accessed and the UniqueID of actor and timestamp of action.
 +
* Application Audit Log - Each application has its own Log table containing timestamp, action (Insert, Update, Delete records etc), Unique ID (linked to above), result of the action and previous and subsequent values. See each applications
 +
documentation for a full list of Audited actions.
 +
 +
== Software ==
 +
Only approved Software may be installed on all desktops\servers utilised by Hornbill. The source for which is stored within a central repository to ensure that even in a disaster we can install as required. All software utilised by Hornbill is reviewed on a scheduled basis.
 +
 +
Software is managed\deployed through central systems (Anisble\Hornbill Tools\Hornbill ITOM) to ensure correct deployment and configuration.
 +
 +
All software is hardened inline with Vendor, Industry and Hornbills own polices\standards. This includes, only required software\services per machine, locked down ports\individual users and service account etc. All hardening is confirmed via monitoring and any changes would automatically escalated and automatically reverted within 5 minutes of any unsanctioned change.
 +
 +
== Hardware ==
 +
Only hardware provided by the IT team and obtained via existing approved vendors may be used to access the management or customer networks. All Clocks are syncronized with NTP and checked to be within 1 minute of primary servers. All default passwords changed. All hardening is inline with Vendor, Industry and Hornbills own polices\standards. All hardening is confirmed via monitoring and any changes would automatically escalated and automatically reverted within 5 minutes of any unsanctioned change.
 +
 +
-->
 +
<!-- hornbill-cloud/iso/operations -->
 +
[[Category:HDOC]]

Latest revision as of 19:56, 11 April 2024

This document can now be found at its new location in the Hornbill Document Library.

ISO Operations