Architecture and hardware requirements¶
This document acts as a reference point for the logical design of Canopy, and also provides guidelines on specific deployment scenarios.
Architecture¶
Canopy is comprised of three main components:
Application/web service (web tier, including application server and local file storage system)
Database layer
Document generation service (DOCX and PDF)
It is possible to run all three components on the same hardware - assuming the minimum requirements are met and planning of storage is undertaken. However, it is equally possible to deploy Canopy in different configurations - e.g. placing the web tier and database service on different services and storage on a NAS/SAN.
Alternative deployment guides will be made available shortly. For any specific questions, please contact support@checksec.com.
Logical design¶
Services and Ports¶
Canopy communicates over the following TCP ports:
Service Group |
Service |
Port (Protocol) |
Publicly Accessible? |
Security |
Notes |
---|---|---|---|---|---|
Canopy application server |
Web Server |
443 (https) |
Yes |
Yes - hardened TLS configuration out of the box |
This is a standard web service, running on nginx by default (Apache also supported). It is also the primary, and only, interface that needs to be accessible to users of the application. The Web Service acts as a reverse proxy to the Application Service, for both user access and API (RESTful) access. A self-signed certificate is used by default, which should be replaced with a production ready certificate. |
Canopy application server |
Application Service |
8000 (http) |
No |
Localhost configuration by default. |
The Application Service is built on top of Django and is typically bound to localhost only. If network communication is required (e.g. for large scale deployments), this service is wrapped (reverse proxy) via another Web Server layer, which would use the default TLS hardened configuration. |
Canopy application server |
RabbitMQ server |
5672 (amqp) |
No |
Localhost configuration by default; default username/password. |
RabbitMQ is a backend message queue for running asynchronous jobs via Celery. RabbitMQ can be configured to run over TLS, which may be a requirement under larger/enterprise deployments. Specific guidelines are available on the RabbiMQ site: https://www.rabbitmq.com/ssl.html It is recommended that the default username/password be changed on install, even though the service is restricted to localhost. |
Canopy database server |
PostgreSQL (default) |
5432 (pgql) |
No |
Localhost configuration by default; default username/password accessible through a user-restricted admin script. |
PostgreSQL is the primary database supported by Canopy. |
Canopy report server |
Docserver |
8181 (http) |
No |
Localhost configuration by default; no authentication required. |
This service accepts a DOCX document template and an XML data source and returns a generated DOCX file to the Application Service. This service can be run on an alternative server (for load distribution). In such a scenario, we recommend using nginx as a front-end proxy, which can be secured via TLS. |
Canopy report server |
PDF converter |
9016 (http) |
No |
Localhost configuration by default; no authentication required. |
This service accepts a DOCX and returns a DOCX and PDF to the calling application. This service can be run on an alternative server (for load distribution). In such a scenario, we recommend using nginx as a front-end proxy, which can be secured via TLS. |
Other |
Mail routing |
25 (smtp) |
No |
Outbound only service, for mail routing. |
This is a dependency for sending email-based notifications to users. Outbound firewall requirement. |
Deployment scenarios¶
Single server¶
By default Canopy will set up a single server instance using the standard service protocols listed above.
Larger and enterprise deployments¶
Within enterprise environments, service layers may be available for databases. Canopy supports separation of the following modules on separate servers:
- Web server
the web server can be run on its own instance. The web server configuration would need to be configured to connect to the application server on the exposed port (default: 8000). Multiple servers can be deployed in high availability environments.
- Application server
the application server can be configured on a separate server.
- Database server
Canopy requires a single database server. This database can be hosted on a network connected server. The database URL and port must be configured in the
canopy.ini
file on the application server. Database replication is not currently supported.- Report server
both the Docserver and the PDF converter can be deployed to a separate server (or servers) in order to offload the resource intensive operation of document generation. Both of these services can be deployed using TLS to encrypt network communications.
For configuration guidelines, see: TBC.
Recommended hardware configurations¶
The following requirements are intended for Canopy server in multi-user environments.
Item |
Minimum requirement |
Recommended requirement (small to medium sized consultancies) |
Recommended requirements (large consultancies and enterprises) |
---|---|---|---|
Processor |
2 cores |
4+ cores |
16+ cores |
Memory (RAM) |
4GB |
8GB-16GB |
32GB |
Storage |
20GB+. Usage based. |
100GB+. Usage based. |
500GB+. Usage based. |
See File locations and storage and Calculating storage space for more detail on disk usage.
Performance benchmarks¶
The following performance benchmark data can be used to help determine suitable hardware deployments within environments. In summary:
Requests Per Second (RPS) was 7 requests is 1 action in Canopy on average.
1 action corresponded with a typical task performed by a single user; therefore RPS
Recommended concurrent users (RCU) is the recommended number of concurrent users to avoid performance degradation. This is an estimate based on the test scenarios. It may be the case that if there are a similar amount of users performing heavier operations, this could lead to a resource consumption issue earlier than suggested.
Maximum concurrent users (MCU) is the maximum upper limit of the performance tests, at which point the performance of the server was significantly affected. In all cases, the upper limit was reached due to CPU related bottlenecks.
Host |
vCPUs |
RAM(GB) |
RCU |
MCU |
RPS per CPU |
Gunicorn workers |
Bottleneck |
CPU |
---|---|---|---|---|---|---|---|---|
AWS EC2 t2.medium |
2 |
3.5 |
25 |
50 |
35 |
10 |
CPU |
Intel(R) Xeon(R) CPU E5-2670 v2 @ 2.50GHz |
AWS EC2 t2.2xlarge |
8 |
32 |
100-125 |
250 |
43.75 |
8 |
CPU |
Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz |
AWS EC2 c4.4xlarge |
16 |
30 |
200-220 |
357 |
31.25 |
16 |
CPU |
Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz |
All benchmarking was conducted on AWS instances. Performance results on other virtualised platforms, or on direct hardware, may be different. This data should be used for guidance only.
File locations and storage¶
Canopy stores data in the following locations:
Location |
Type |
Requires backup |
Disk space required |
---|---|---|---|
|
Canopy system configuration and license files. |
Yes |
<1MB |
|
Canopy binary data that is managed by the Linux distribution’s package manager. |
No |
+-1GB |
|
User data that is imported into Canopy and generated by Canopy e.g. imported tool data, uploaded screenshots, report template files, generated reports, custom plugins. |
Yes |
Depends on amount of data stored and generated inside of Canopy. This ranges from a 1-5GB per year for small installations (< 10 users) and 50GB+ per year for larger installations. |
|
Many services that Canopy depends on will store their data here, e.g. local DB (if in use) and rabbitmq. Other system related daemons will also store their data here, e.g. systemd logging |
Not directly. DB should be backed up separately and using DB specific procedures. |
|
Database (RDBMS specific) |
Yes, using RDBMS specific backup procedures as recommended by the RDBMS vendor. |
As files are NOT stored in the DB, it ends up being somewhat small in comparison to the main data directory. Depends on RDBMS. Postgresql would require maybe 1GB per year for small/medium installations. |
Calculating storage space¶
Storage requirements for Canopy vary greatly based on the planned use of the system. If Canopy is going to be used to store data, including state files from proxy tools, code repository dumps, etc., then more disk space will be required. If such information is stored on an external environment, references can be used inside of Canopy instead via the description fields.
A typical usage scenario calculation might consist of:
Average data per project: 1GB
Estimated number of projects per year: 250 (distributed across the team)
Estimate space requirement (per year): 250 * 1GB = 250GB + 2-5GB for the database server
Total estimate for 1 year might be ~255GB, which should arguably rounded up to 300GB.
It may be appropriate to project for a 3-5 year period, towards an upper limit.