This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Image Analysis Process

There are two types of image analysis:

  1. Centralized Analysis
  2. Distributed Analysis

Image analysis is performed as a distinct, asynchronous, and scheduled task driven by queues that analyzer workers periodically poll.

Image analysis_status states:

stateDiagram
    [*] --> not_analyzed: analysis queued
    not_analyzed --> analyzing: analyzer starts processing
    analyzing --> analyzed: analysis completed successfully
    analyzing --> analysis_failed: analysis fails
    analyzing --> not_analyzed: re-queue by timeout or analyzer shutdown
    analysis_failed --> not_analyzed: re-queued by user request
    analyzed --> not_analyzed: re-queued for re-processing by user request

Centralized Analysis

The analysis process is composed of several steps and utilizes several system components. The basic flow of that task as shown in the following example:

Centralized analysis high level summary:

sequenceDiagram
    participant A as AnchoreCTL
    participant R as Registry
    participant E as Anchore Deployment
    A->>E: Request Image Analysis
    E->>R: Get Image content
    R-->>E: Image Content
    E->>E: Analyze Image Content (Generate SBOM and secret scans etc) and store results
    E->>E: Scan sbom for vulns and evaluate compliance

The analyzers operate in a task loop for analysis tasks as shown below:

alt text

Adding more detail, the API call trace between services looks similar to the following example flow:

alt text

Distributed Analysis

In distributed analysis, the analysis of image content takes place outside the Anchore deployment and the result is imported into the deployment. The image has the same state machine transitions, but the ‘analyzing’ processing of an imported analysis is the processing of the import data (vuln scanning, policy checks, etc) to prepare the data for internal use, but does not download or touch any image content.

High level example with AnchoreCTL:

sequenceDiagram
    participant A as AnchoreCTL
    participant R as Registry/Docker Daemon
    participant E as Anchore Deployment
    A->>R: Get Image content
    R-->>A: Image Content
    A->>A: Analyze Image Content (Generate SBOM and secret scans etc)
    A->>E: Import SBOM, secret search, fs metadata
    E->>E: Scan sbom for vulns and evaluate compliance

Next Steps

Now let’s get familiar with Watching Images and Tags with Anchore.

1 - Malware Scanning

Overview

Anchore Enterprise now supports the use of the open-source ClamAV malware scanner to detect malicious code embedded in container images. This scan occurs only at analysis time when the image content itself is available, and the scan results are available via the API as well as for consumption in new policy gates to allow gating of image with malware findings.

Signature DB Updates

Each analyzer service will run a malware signature update before analyzing each image. This does add some latency to the overall analysis time but ensures the signatures are as up-to-date as possible for each image analyzed. The update behavior can be disabled if you prefer to manage the freshness of the db via another route, such as a shared filesystem mounted to all analyzer nodes that is updated on a schedule. See the configuration section for details on disabling the db update.

The status of the db update is present in each scan output for each image.

Scan Results

The malware content type is a list of scan results. Each result is the run of a malware scanner, by default clamav.

The list of files found to contain malware signature matches is in the findings property of each scan result. An empty array value indicates no matches found.

The metadata property provides generic metadata specific to the scanner. For the ClamAV implementation, this includes the version data about the signature db used and if the db update was enabled during the scan. If the db update is disabled, then the db_version property of the metadata will not have values since the only way to get the version metadata is during a db update.

{
    "content": [
        {
            "findings": [
                {
                    "path": "/somebadfile",
                    "signature": "Unix.Trojan.MSShellcode-40"
                },
                {
                    "path": "/somedir/somepath/otherbadfile",
                    "signature": "Unix.Trojan.MSShellcode-40"
                }
            ],
            "metadata": {
                "db_update_enabled": true,
                "db_version": {
                    "bytecode": "331",
                    "daily": "25890",
                    "main": "59"
                }
            },
            "scanner": "clamav"
        }
    ],
    "content_type": "malware",
    "imageDigest": "sha256:0eb874fcad5414762a2ca5b2496db5291aad7d3b737700d05e45af43bad3ce4d"
}

Policy Rules

A policy gate called malware is available with two new triggers:

  • scans trigger will fire for each file and signature combination found in the image so that you can fail an evaluation of an image if malware was detected during the analysis scans
  • scan_not_run trigger will fire if there are no malware scans (even empty) available for the image

See policy checks for more details

2 - Content Hints

Anchore Enterprise includes the ability to read a user-supplied ‘hints’ file to allow users to add software artifacts to Anchore’s analysis report. The hints file, if present, contains records that describe a software package’s characteristics explicitly, and are then added to the software bill of materials (SBOM). For example, if the owner of a CI/CD container build process knows that there are some software packages installed explicitly in a container image, but Anchore’s regular analyzers fail to identify them, this mechanism can be used to include that information in the image’s SBOM, exactly as if the packages were discovered normally.

Hints cannot be used to modify the findings of Anchore’s analyzer beyond adding new packages to the report. If a user specifies a package in the hints file that is found by Anchore’s image analyzers, the hint is ignored and a warning message is logged to notify the user of the conflict.

Configuration

See Configuring Content Hints

Once enabled, the analyzer services will look for a file with a specific name, location and format located within the container image - /anchore_hints.json.
The format of the file is illustrated using some examples, below.

OS Package Records

OS Packages are those that will represent packages installed using OS / Distro style package managers. Currently supported package types are rpm, dpkg, apkg for RedHat, Debian, and Alpine flavored package managers respectively. Note that, for OS Packages, the name of the package is unique per SBOM, meaning that only one package named ‘somepackage’ can exist in an image’s SBOM, and specifying a name in the hints file that conflicts with one with the same name discovered by the Anchore analyzers will result in the record from the hints file taking precedence (override).

  • Minimum required values for a package record in anchore_hints.json
	{
	    "name": "musl",
	    "version": "1.1.20-r8",
	    "type": "apkg"
	}
  • Complete record demonstrating all of the available characteristics of a software package that can be specified
	{
	    "name": "musl",
	    "version": "1.1.20",
	    "release": "r8",
	    "origin": "Timo Ter\u00e4s <[email protected]>",
	    "license": "MIT",
	    "size": "61440",
	    "source": "musl",
	    "files": ["/lib/ld-musl-x86_64.so.1", "/lib/libc.musl-x86_64.so.1", "/lib"],
	    "type": "apkg"
	}

Non-OS/Language Package Records

Non-OS / language package records are similar in form to the OS package records, but with some extra/different characteristics being supplied, namely the location field. Since multiple non-os packages can be installed that have the same name, the location field is particularly important as it is used to distinguish between package records that might otherwise be identical. Valid types for non-os packages are currently java, python, gem, npm, nuget, go, binary.
For the latest types that are available, see the anchorectl image content <someimage> output, which lists available types for any given deployment of Anchore Enterprise.

  • Minimum required values for a package record in anchore_hints.json
	{
	    "name": "wicked",
	    "version": "0.6.1",  
	    "type": "gem"
	}
  • Complete record demonstrating all of the available characteristics of a software package that can be specified
	{
	    "name": "wicked",
	    "version": "0.6.1",
	    "location": "/app/gems/specifications/wicked-0.9.0.gemspec",
	    "origin": "schneems",
	    "license": "MIT",
	    "source": "http://github.com/schneems/wicked",
	    "files": ["README.md"],
	    "type": "gem"	    
	}

Putting it all together

Using the above examples, a complete anchore_hints.json file, when discovered by Anchore Enterprise located in /anchore_hints.json inside any container image, is provided here:

{
    "packages": [
	{
	    "name": "musl",
	    "version": "1.1.20-r8",
	    "type": "apkg"
	},
	{
	    "name": "wicked",
	    "version": "0.6.1",  
	    "type": "gem"
	}
    ]
}

With such a hints file in an image based for example on alpine:latest, the resulting image content would report these two package/version records as part of the SBOM for the analyzed image, when viewed using anchorectl image content <image> -t os and anchorectl image content <image> -t gem to view the musl and wicked package records, respectively.

Note about using the hints file feature

The hints file feature is disabled by default, and is meant to be used in very specific circumstances where a trusted entity is entrusted with creating and installing, or removing an anchore_hints.json file from all containers being built. It is not meant to be enabled when the container image builds are not explicitly controlled, as the entity that is building container images could override any SBOM entry that Anchore would normally discover, which affects the vulnerability/policy status of an image. For this reason, the feature is disabled by default and must be explicitly enabled in configuration only if appropriate for your use case .