Image Analysis Process

There are two types of image analysis:

Centralized Analysis
Distributed Analysis

Image analysis is performed as a distinct, asynchronous, and scheduled task driven by queues that analyzer workers periodically poll.

Image analysis_status states:

stateDiagram
    [*] --> not_analyzed: analysis queued
    not_analyzed --> analyzing: analyzer starts processing
    analyzing --> analyzed: analysis completed successfully
    analyzing --> analysis_failed: analysis fails
    analyzing --> not_analyzed: re-queue by timeout or analyzer shutdown
    analysis_failed --> not_analyzed: re-queued by user request
    analyzed --> not_analyzed: re-queued for re-processing by user request

Centralized Analysis

The analysis process is composed of several steps and utilizes several system components. The basic flow of that task as shown in the following example:

Centralized analysis high level summary:

sequenceDiagram
    participant A as AnchoreCTL
    participant R as Registry
    participant E as Anchore Deployment
    A->>E: Request Image Analysis
    E->>R: Get Image content
    R-->>E: Image Content
    E->>E: Analyze Image Content (Generate SBOM and secret scans etc) and store results
    E->>E: Scan sbom for vulns and evaluate compliance

Here is an example of initiating this process using anchorectl

anchorectl image add docker.io/library/nginx:latest

This command tells the Anchore deployment to pull the image from Docker Hub, unpack it, generate an SBOM and then conduct a and vulnerability scan and policy evaluation.

The analyzers operate in a task loop for analysis tasks as shown below:

alt text

Adding more detail, the API call trace between services looks similar to the following example flow:

alt text

Distributed Analysis

In distributed analysis, the analysis of image content takes place outside the Anchore deployment and the result is imported into the deployment. The image has the same state machine transitions, but the ‘analyzing’ processing of an imported analysis is the processing of the import data (vuln scanning, policy checks, etc) to prepare the data for internal use, but does not download or touch any image content.

High level example with anchorectl:

sequenceDiagram
    participant A as AnchoreCTL
    participant R as Registry/Docker Daemon
    participant E as Anchore Deployment
    A->>R: Get Image content
    R-->>A: Image Content
    A->>A: Analyze Image Content (Generate SBOM and secret scans etc)
    A->>E: Import SBOM, secret search, fs metadata
    E->>E: Scan sbom for vulns and evaluate compliance

Here is an example anchorectl command initiating distributed analysis:

anchorectl image add docker.io/library/nginx:latest --from registry

This command uses anchorectl to generate an SBOM locally and then uploads it to the Anchore deployment for vulnerability scanning and compliance evaluation.

Next Steps

Now let’s get familiar with Watching Images and Tags with Anchore.

1 - Malware Scanning

Overview

Anchore Enterprise provides malware scanning with the use ClamAV. ClamAV is an open-source antivirus solution designed to detect malicious code embedded in container images.

When enabled, malware scanning occurs with Centralized Analysis, when the image content itself is available. Any findings are available via:

API /images/{image_digest}/content/malware)
AnchoreCTL anchorectl image content <image_digest> -t malware
UI Image SBOM Tab

The Malware Policy Gate also provides compliance rules around any findings.

Please Note: Files in an image which are greater than 2GB will be skipped due to a limitation in ClamAV. Any skipped file will be identified with a Malware Signature as ANCHORE.FILE_SKIPPED.MAX_FILE_SIZE_EXCEEDED.

Signature DB Updates

Each analyzer service will run a malware signature update before analyzing each image. This does add some latency to the overall analysis time but ensures the signatures are as up-to-date as possible for each image analyzed. The update behavior can be disabled if you prefer to manage the freshness of the db via another route, such as a shared filesystem mounted to all analyzer nodes that is updated on a schedule. See the configuration section for details on disabling the db update.

The status of the db update is present in each scan output for each image.

Scan Results

The malware content type is a list of scan results. Each result is the run of a malware scanner, by default clamav.

The list of files found to contain malware signature matches is in the findings property of each scan result. An empty array value indicates no matches found.

The metadata property provides generic metadata specific to the scanner. For the ClamAV implementation, this includes the version data about the signature db used and if the db update was enabled during the scan. If the db update is disabled, then the db_version property of the metadata will not have values since the only way to get the version metadata is during a db update.

{
    "content": [
        {
            "findings": [
                {
                    "path": "/somebadfile",
                    "signature": "Unix.Trojan.MSShellcode-40"
                },
                {
                    "path": "/somedir/somepath/otherbadfile",
                    "signature": "Unix.Trojan.MSShellcode-40"
                }
            ],
            "metadata": {
                "db_update_enabled": true,
                "db_version": {
                    "bytecode": "331",
                    "daily": "25890",
                    "main": "59"
                }
            },
            "scanner": "clamav"
        }
    ],
    "content_type": "malware",
    "imageDigest": "sha256:0eb874fcad5414762a2ca5b2496db5291aad7d3b737700d05e45af43bad3ce4d"
}