Architecture and hardware requirements

This document acts as a reference point for the logical design of Canopy, and also provides guidelines on specific deployment scenarios.

Architecture

Canopy is comprised of three main components:

  • Application/web service (web tier, including application server and local file storage system)

  • Database layer

  • Document generation service (DOCX and PDF)

It is possible to run all three components on the same hardware - assuming the minimum requirements are met and planning of storage is undertaken. However, it is equally possible to deploy Canopy in different configurations - e.g. placing the web tier and database service on different services and storage on a NAS/SAN.

Alternative deployment guides will be made available shortly. For any specific questions, please contact support@checksec.com.

Logical design

image0

Services and Ports

Canopy communicates over the following TCP ports:

Service Group

Service

Port (Protocol)

Publicly Accessible?

Security

Notes

Canopy application server

Web Server

443 (https)

Yes

Yes - hardened TLS configuration out of the box

This is a standard web service, running on nginx by default (Apache also supported). It is also the primary, and only, interface that needs to be accessible to users of the application. The Web Service acts as a reverse proxy to the Application Service, for both user access and API (RESTful) access.

A self-signed certificate is used by default, which should be replaced with a production ready certificate.

Canopy application server

Application Service

8000 (http)

No

Localhost configuration by default.

The Application Service is built on top of Django and is typically bound to localhost only. If network communication is required (e.g. for large scale deployments), this service is wrapped (reverse proxy) via another Web Server layer, which would use the default TLS hardened configuration.

Canopy application server

RabbitMQ server

5672 (amqp)

No

Localhost configuration by default; default username/password.

RabbitMQ is a backend message queue for running asynchronous jobs via Celery.

RabbitMQ can be configured to run over TLS, which may be a requirement under larger/enterprise deployments. Specific guidelines are available on the RabbiMQ site: https://www.rabbitmq.com/ssl.html

It is recommended that the default username/password be changed on install, even though the service is restricted to localhost.

Canopy database server

PostgreSQL (default)

5432 (pgql)

No

Localhost configuration by default; default username/password accessible through a user-restricted admin script.

PostgreSQL is the primary database supported by Canopy. Oracle may also be used (REF).

Additional PostgreSQL hardening guidelines are provided at: REF.

Canopy report server

Docserver

8181 (http)

No

Localhost configuration by default; no authentication required.

This service accepts a DOCX document template and an XML data source and returns a generated DOCX file to the Application Service.

This service can be run on an alternative server (for load distribution). In such a scenario, we recommend using nginx as a front-end proxy, which can be secured via TLS.

Canopy report server

PDF converter

9016 (http)

No

Localhost configuration by default; no authentication required.

This service accepts a DOCX and returns a DOCX and PDF to the calling application.

This service can be run on an alternative server (for load distribution). In such a scenario, we recommend using nginx as a front-end proxy, which can be secured via TLS.

Other

Mail routing

25 (smtp)

No

Outbound only service, for mail routing.

This is a dependency for sending email-based notifications to users. Outbound firewall requirement.

Deployment scenarios

Single server

By default Canopy will set up a single server instance using the standard service protocols listed above.

Larger and enterprise deployments

Within enterprise environments, service layers may be available for databases. Canopy supports separation of the following modules on separate servers:

Web server

the web server can be run on its own instance. The web server configuration would need to be configured to connect to the application server on the exposed port (default: 8000). Multiple servers can be deployed in high availability environments.

Application server

the application server can be configured on a separate server.

Database server

Canopy requires a single database server. This database can be hosted on a network connected server. The database URL and port must be configured in the canopy.ini file on the application server. Database replication is not currently supported.

Report server

both the Docserver and the PDF converter can be deployed to a separate server (or servers) in order to offload the resource intensive operation of document generation. Both of these services can be deployed using TLS to encrypt network communications.

For configuration guidelines, see: TBC.

The following requirements are intended for Canopy server in multi-user environments.

Item

Minimum requirement

Recommended requirement (small to medium sized consultancies)

Recommended requirements (large consultancies and enterprises)

Processor

2 cores

4+ cores

16+ cores

Memory (RAM)

4GB

8GB-16GB

32GB

Storage

20GB+. Usage based.

100GB+. Usage based.

500GB+. Usage based.

See File locations and storage and Calculating storage space for more detail on disk usage.

Performance benchmarks

The following performance benchmark data can be used to help determine suitable hardware deployments within environments. In summary:

  • Requests Per Second (RPS) was 7 requests is 1 action in Canopy on average.

  • 1 action corresponded with a typical task performed by a single user; therefore RPS

  • Recommended concurrent users (RCU) is the recommended number of concurrent users to avoid performance degradation. This is an estimate based on the test scenarios. It may be the case that if there are a similar amount of users performing heavier operations, this could lead to a resource consumption issue earlier than suggested.

  • Maximum concurrent users (MCU) is the maximum upper limit of the performance tests, at which point the performance of the server was significantly affected. In all cases, the upper limit was reached due to CPU related bottlenecks.

Host

vCPUs

RAM(GB)

RCU

MCU

RPS per CPU

Gunicorn workers

Bottleneck

CPU

AWS EC2 t2.medium

2

3.5

25

50

35

10

CPU

Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz

AWS EC2 t2.2xlarge

8

32

100-125

250

43.75

8

CPU

Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz

AWS EC2 c4.4xlarge

16

30

200-220

357

31.25

16

CPU

Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz

All benchmarking was conducted on AWS instances. Performance results on other virtualised platforms, or on direct hardware, may be different. This data should be used for guidance only.

File locations and storage

Canopy stores data in the following locations:

Location

Type

Requires backup

Disk space required

/etc/canopy/

Canopy system configuration and license files.

Yes

<1MB

/opt/checksec/canopy/

Canopy binary data that is managed by the Linux distribution’s package manager.

No

+-1GB

/var/opt/checksec/canopy/

User data that is imported into Canopy and generated by Canopy e.g. imported tool data, uploaded screenshots, report template files, generated reports, custom plugins.

Yes

Depends on amount of data stored and generated inside of Canopy. This ranges from a 1-5GB per year for small installations (< 10 users) and 50GB+ per year for larger installations.

/var/..

Many services that Canopy depends on will store their data here, e.g. local DB (if in use) and rabbitmq. Other system related daemons will also store their data here, e.g. systemd logging

Not directly. DB should be backed up separately and using DB specific procedures.

Database (RDBMS specific)

Yes, using RDBMS specific backup procedures as recommended by the RDBMS vendor.

As files are NOT stored in the DB, it ends up being somewhat small in comparison to the main data directory. Depends on RDBMS. Postgresql would require maybe 1GB per year for small/medium installations.

Calculating storage space

Storage requirements for Canopy vary greatly based on the planned use of the system. If Canopy is going to be used to store data, including state files from proxy tools, code repository dumps, etc., then more disk space will be required. If such information is stored on an external environment, references can be used inside of Canopy instead via the description fields.

A typical usage scenario calculation might consist of:

  • Average data per project: 1GB

  • Estimated number of projects per year: 250 (distributed across the team)

  • Estimate space requirement (per year): 250 * 1GB = 250GB + 2-5GB for the database server

Total estimate for 1 year might be ~255GB, which should arguably rounded up to 300GB.

It may be appropriate to project for a 3-5 year period, towards an upper limit.