This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Monitoring

After you have installed Anchore Enterprise, there are various ways to monitor its operations:

1 - System Health

Overview

Added in Anchore Enterprise 2.2, the Health section within the System tab is an administrator’s new display for investigating the operational status of their system’s various services and feeds. Leverage this view to understand when your system is ready or if it requires intervention.

The following sections in this document describe how to determine system readiness, the state of your services, and the progression of your feed sync.

For more information on the overall architecture of a full Anchore Enterprise deployment, please refer to the Architecture documentation. Or refer to the Feeds Overview if you’re interested in the feeds-side of things.

System Readiness

Ready

(Tentatively) Ready

Not Ready

The indicator for system readiness can be seen from any screen by viewing the System tab header:

The system readiness status relies on the service and feed data which are routinely updated every 5 minutes. Using the example indicator provided above, once all the feed groups are successfully synced, the status icon will turn green.

For up-to-date information outside of the normal update cycle, navigate to the Health section within the System tab and click on Refresh Service Health, Refresh Feed Data, or manually refresh the page.

Services

As shown above and as of 2.2, there are five services required by the system to function (API, Catalog, Policy Engine, SimpleQueue, and Analyzer).

For every service, the Base URL, Host ID, and Version is displayed. As long as one instance of each service is up and available, the main system is regarded as ready. In the example image provided above, we see that we have multiple instances of the Policy Engine and Analyzer services.

For the full, filterable list of instances for that service, click on the numbers provided. In the case of the Policy Engine, that would be the 1/2 Available.

Note that orphaned services are filtered out by default in this view (with a toggle to include it again) but will still impact the availability count on the main page.

In the case of service errors, they are logged within the Events & Notifications tab so we recommend following up there for more information or browse our Troubleshooting documentation for remediation guidance.

Feeds Sync

Listed in this section are the various feed groups your system relies on for vulnerability and package data. This data comes from a variety of upstream sources which is vital for policy engine operations such as evaluating policies or listing vulnerabilities.

As shown, you can keep track of your sync progression using the Last Sync column. To manually update the feed data displayed outside of its normal 5-minute cycle, click the Refresh Feed Data button or refresh the page.

If you’d rather have them grouped by feed rather than listed out individually, you can toggle the layout from list to cards using the buttons in the top-right corner above the table:

Similar to the service cards, if you decide to have them grouped as we show below using the layout buttons, you can click on the number of groups synced to view the full, filterable list within.

When viewing a list of feed groups - whether through the default list or through a specific feed card - you can filter for a specific value using the input provided or click on the button attached to filter by category. In this case, groups can be filtered by whether they are synced or unsynced.

In the case of feed sync errors, they are logged within the Events & Notifications tab so we recommend following up there for more information or browse our Troubleshooting documentation for remediation guidance.

Or if you’re interested in an overview of the various drivers Enterprise Feeds uses, check out our Feeds Overview.

2 - Prometheus

Anchore Enterprise exposes prometheus metrics in the API of each service if the config.yaml that is used by that service has the metrics.enabled key set to true.

Each service exports its own metrics and is typically scraped by a Prometheus installation to gather the metrics. Anchore does not aggregate or distribute metrics between services. You should configure your Prometheus deployment or integration to check each Anchore service’s API using the same port it exports for the /metrics route.

Monitoring in Kubernetes and/or Helm Chart

Prometheus is very commonly used for monitoring Kubernetes clusters. Prometheus is supported by core Kubernetes services. There are many guides on using Prometheus to monitor a cluster and services deployed within, and also many other monitoring systems can consume Prometheus metrics.

The Anchore Helm Chart includes a quick way to enable the Prometheus metrics on each service container:

  • Set: helm install --name myanchore anchore/anchore-engine --set anchoreGlobal.enableMetrics=true

  • Or, set it directly in your customized values.yaml

The specific strategy for monitoring services with prometheus is outside the scope of this document. But, because Anchore exposes metrics on the /metrics route of all service ports, it should be compatible with most monitoring approaches (daemon sets, side-cars, etc).

Metrics of Note

Anchore services export a range of metrics. The following list shows some Anchore services that can help you determine the health and load of an Anchore deployment.

  • anchore_queue_length, specifically for queuename: “images_to_analyze”
    • This is the number of images pending analysis, in the not_analyzed state.
    • As this number grows you can expect longer analysis times.
    • Adding more analyzers to a system can help drain the queue faster and keep wait times to a minimum.
    • Example: anchore_queue_length{instance=“engine-simpleq:8228”,job=“anchore-simplequeue”,queuename=“images_to_analyze”}.
    • This metric is exported from all simplequeue service instances, but is based on the database state, so they should all present a consistent view of the length of the queue.
  • anchore_monitor_runtime_seconds_count
    • These metrics, one for each monitor, record the duration of the async processes as they execute on a duty cycle.
    • As the system grows, these will become longer to account for more tags to check for updates, repos to scan for new tags, and user notifications to process.
  • anchore_tmpspace_available_bytes
    • This metric tracks the available space in the “tmp_dir” location for each container. This is most important for the instances that are analyzers where this can indicate how much disk is being used for analysis and how much overhead there is for analyzing large images.
    • This is expected to be consumed in cycles, with usage growing during analysis and then flushing upon completion. A consistent growth pattern here may indicate left over artifacts from analysis failures or a large layer_cache setting that is not yet full. The layer cache (see Layer Caching) is located in this space and thus will affect the metric.
  • process_resident_memory_bytes
    • This is the memory actually consumed by the instance, where each instance is a service process of Anchore. Anchore is fairly memory intensive for large images and in deployments with lots of analyzed images due to lots of json parsing and marshalling, so monitoring this metric will help inform capacity requirements for different components based on your specific workloads. Lots of variables affect memory usage, so while we give recommendations in the Capacity Planning document, there is no substitute for profiling and monitoring your usage carefully.

3 - Event Log

Introduction

The event log subsystem provides the users with a mechanism to inspect asynchronous events occurring across various Anchore Enterprise services. Anchore events include periodically triggered activities such as vulnerability data feed syncs in the policy-engine service, image analysis failures originating from the analyzer service, and other informational or system fault events. The catalog service may also generate events for any repositories or image tags that are being watched, when the engine encounters connectivity, authentication, authorization or other errors in the process of checking for updates. The event log is aimed at troubleshooting most common failure scenarios (especially those that happen during asynchronous engine operations) and to pinpoint the reasons for failures, that can be used subsequently to help with corrective actions. Events can be cleared from anchore-engine in bulk or individually.

The Anchore events (drawn from the event log) can be accessed through the Anchore Enterprise API and AnchoreCTL, or can be emitted as webhooks if your Anchore Enterprise is configured to send webhook notifications. For API usage refer to the document on using the Anchore Enterprise API.

Accessing Events

The anchorectl command can be used to list events and filter through the results, get the details for a specific event and delete events matching certain criteria.

# anchorectl event --help
Event related operations

Usage:
   event [command]

Available Commands:
  delete      Delete an event by its ID or set of filters
  get         Lookup an event by its event ID
  list        Returns a paginated list of events in the descending order of their occurrence

Flags:
  -h, --help   help for event

Use " event [command] --help" for more information about a command.

For help regarding global flags, run --help on the root command

For a list of the most recent events:


anchorectl event list
 ✔ List events
┌──────────────────────────────────┬──────────────────────────────────────────────┬───────┬─────────────────────────────────────────────────────────────────────────┬─────────────────┬────────────────┬────────────────────┬─────────────────────────────┐
│ UUID                             │ EVENT TYPE                                   │ LEVEL │ RESOURCE ID                                                             │ RESOURCE TYPE   │ SOURCE SERVICE │ SOURCE HOST        │ TIMESTAMP                   │
├──────────────────────────────────┼──────────────────────────────────────────────┼───────┼─────────────────────────────────────────────────────────────────────────┼─────────────────┼────────────────┼────────────────────┼─────────────────────────────┤
│ 8c179a3b27a543fe9285cf4feb65561d │ system.image_analysis.registry_lookup_failed │ error │ docker.io/alpine:3.4                                                    │ image_reference │ catalog        │ anchore-quickstart │ 2022-08-24T23:08:30.54001Z  │
│ 48c18a84575d45efbf5b41e0f3a87177 │ system.image_analysis.registry_lookup_failed │ error │ docker.io/alpine:latest                                                 │ image_reference │ catalog        │ anchore-quickstart │ 2022-08-24T23:08:30.510193Z │
│ f6084efd159c43a1a0518b6df5e58505 │ system.image_analysis.registry_lookup_failed │ error │ docker.io/alpine:3.12                                                   │ image_reference │ catalog        │ anchore-quickstart │ 2022-08-24T23:08:30.480625Z │
│ 4464b8f83df046388152067122c03610 │ system.image_analysis.registry_lookup_failed │ error │ docker.io/alpine:3.8                                                    │ image_reference │ catalog        │ anchore-quickstart │ 2022-08-24T23:08:30.450983Z │
...
│ 60f14821ff1d407199bc0bde62f537df │ system.image_analysis.restored_from_archive  │ info  │ sha256:89020cd33be2767f3f894484b8dd77bc2e5a1ccc864350b92c53262213257dfc │ image_digest    │ catalog        │ anchore-quickstart │ 2022-08-24T22:53:12.662535Z │
│ cd749a99dca8493889391ae549d1bbc7 │ system.analysis_archive.image_archived       │ info  │ sha256:89020cd33be2767f3f894484b8dd77bc2e5a1ccc864350b92c53262213257dfc │ image_digest    │ catalog        │ anchore-quickstart │ 2022-08-24T22:48:45.719941Z │
...
└──────────────────────────────────┴──────────────────────────────────────────────┴───────┴─────────────────────────────────────────────────────────────────────────┴─────────────────┴────────────────┴────────────────────┴─────────────────────────────┘

Note: Events are ordered by the timestamp of their occurrence, the most recent events are at the top of the list and the least recent events at the bottom.

There are a number of ways to filter the event list output (see anchorectl event list --help for filter options):

For troubleshooting events related to a specific event type:

# anchorectl event list --event-type system.analysis_archive.image_archive_failed
 ✔ List events
┌──────────────────────────────────┬──────────────────────────────────────────────┬───────┬──────────────┬───────────────┬────────────────┬────────────────────┬────────────────────────────┐
│ UUID                             │ EVENT TYPE                                   │ LEVEL │ RESOURCE ID  │ RESOURCE TYPE │ SOURCE SERVICE │ SOURCE HOST        │ TIMESTAMP                  │
├──────────────────────────────────┼──────────────────────────────────────────────┼───────┼──────────────┼───────────────┼────────────────┼────────────────────┼────────────────────────────┤
│ 35114639be6c43a6b79d1e0fef71338a │ system.analysis_archive.image_archive_failed │ error │ nginx:latest │ image_digest  │ catalog        │ anchore-quickstart │ 2022-08-24T22:48:23.18113Z │
└──────────────────────────────────┴──────────────────────────────────────────────┴───────┴──────────────┴───────────────┴────────────────┴────────────────────┴────────────────────────────┘

To filter events by level such as ERROR or INFO:

anchorectl event list --level info
 ✔ List events
┌──────────────────────────────────┬─────────────────────────────────────────────┬───────┬─────────────────────────────────────────────────────────────────────────┬───────────────┬────────────────┬────────────────────┬─────────────────────────────┐
│ UUID                             │ EVENT TYPE                                  │ LEVEL │ RESOURCE ID                                                             │ RESOURCE TYPE │ SOURCE SERVICE │ SOURCE HOST        │ TIMESTAMP                   │
├──────────────────────────────────┼─────────────────────────────────────────────┼───────┼─────────────────────────────────────────────────────────────────────────┼───────────────┼────────────────┼────────────────────┼─────────────────────────────┤
│ 60f14821ff1d407199bc0bde62f537df │ system.image_analysis.restored_from_archive │ info  │ sha256:89020cd33be2767f3f894484b8dd77bc2e5a1ccc864350b92c53262213257dfc │ image_digest  │ catalog        │ anchore-quickstart │ 2022-08-24T22:53:12.662535Z │
│ cd749a99dca8493889391ae549d1bbc7 │ system.analysis_archive.image_archived      │ info  │ sha256:89020cd33be2767f3f894484b8dd77bc2e5a1ccc864350b92c53262213257dfc │ image_digest  │ catalog        │ anchore-quickstart │ 2022-08-24T22:48:45.719941Z │
...

Note: Event listing response is paginated, anchorectl displays the first 100 events matching the filters. For all the results use the –all flag.

All available options for listing events:


# anchorectl event list --help
Returns a paginated list of events in the descending order of their occurrence. Optional query parameters may be used for filtering results

Usage:
   event list [flags]

Flags:
      --all                    return all events (env: ANCHORECTL_EVENT_ALL)
      --before string          return events that occurred before the ISO8601 formatted UTC timestamp
                               (env: ANCHORECTL_EVENT_BEFORE)
      --event-type string      filter events by a prefix match on the event type (e.g. "user.image.")
                               (env: ANCHORECTL_EVENT_TYPE)
  -h, --help                   help for list
      --host string            filter events by the originating host ID (env: ANCHORECTL_EVENT_SOURCE_HOST_ID)
      --level string           filter events by the level - INFO or ERROR (env: ANCHORECTL_EVENT_LEVEL)
  -o, --output string          the format to show the results (allowable: [text json json-raw id]; env: ANCHORECTL_FORMAT) (default "text")
      --page int32             return the nth page of results starting from 1. Defaults to first page if left empty
                               (env: ANCHORECTL_PAGE)
      --resource-type string   filter events by the type of resource - tag, imageDigest, repository etc
                               (env: ANCHORECTL_EVENT_RESOURCE_TYPE)
      --service string         filter events by the originating service (env: ANCHORECTL_EVENT_SOURCE_SERVICE_NAME)
      --since string           return events that occurred after the ISO8601 formatted UTC timestamp
                               (env: ANCHORECTL_EVENT_SINCE)

For help regarding global flags, run --help on the root command

Event listing displays a brief summary of the event, to get more detailed information about the event such as the host where the event has occurred or the underlying the error:


# anchorectl event get c31eb023c67a4c9e95278473a026970c
 ✔ Fetched event
UUID: c31eb023c67a4c9e95278473a026970c
Event:
  Event Type: system.image_analysis.registry_lookup_failed
  Level: error
  Message: Referenced image not found in registry
  Resource:
    Resource ID: docker.io/aerospike:latest
    Resource Type: image_reference
    User Id: admin
  Source:
    Source Service: catalog
    Base Url: http://catalog:8228
    Source Host: anchore-quickstart
    Request Id:
  Timestamp: 2022-08-24T22:08:28.811441Z
  Category:
  Details: cannot fetch image digest/manifest from registry
Created At: 2022-08-24T22:08:28.812749Z

Clearing Events

Events can be cleared/deleted from the system in bulk or individually. Bulk deletion allows for specifying filters to clear the events within a certain time window. To delete all events from the system:

# anchorectl event delete --all
Use the arrow keys to navigate: ↓ ↑ → ←
? Are you sure you want to delete all events:
  ▸ Yes
    No
 ⠙ Deleting event
c31eb023c67a4c9e95278473a026970c
329ff24aa77549458e2656f1a6f4c98f
649ba60033284b87b6e3e7ab8de51e48
4010f105cf264be6839c7e8ca1a0c46e
...

Delete events before a specified timestamp (can also use --since instead of --before to delete events that were generated after a specified timestamp):

# anchorectl event delete --before 2022-08-24T22:08:28.629543Z
 ✔ Deleted event
ce26f1fa1baf4adf803d35c86d7040b7
081394b6e62f4708a10e521a960c54d7
d21b587dea5844cc9c330ba2b3d02d2e
7784457e6bf84427a175658f134f3d6a
...

Delete a specific event:

# anchorectl event delete fa110d517d2e43faa8d8e2dfbb0596af
 ✔ Deleted event
fa110d517d2e43faa8d8e2dfbb0596af

Sending Events as Webhook Notifications

In addition to access via API and AnchoreCTL, the Anchore Enterprise may be configured to send notifications for events as they are generated in the system via its webhook subsystem. Webhook notifications for event log records is turned off by default. To turn enable the ’event_update’ webhook, uncomment the ’event_log’ section under ‘services->catalog’ in config.yaml, as in the following example:

services:
  ...
  catalog:
    ...
    event_log:
      notification:    
        enabled: True
        # (optional) notify events that match these levels. If this section is commented, notifications for all events are sent
        level:
        - error

Note: In order for events to be sent via webhook notifications, you’ll need to ensure that the webhook subsystem is configured in config.yaml (if it isn’t already) - refer to the document on subscriptions and notifications for information on how to enable webhooks in Anchore Enterprise. Event notifications will be sent to ’event_update’ webhook endpoint if it is defined, and the ‘general’ webhook endpoint otherwise.

Events via the UI

The Events tab is your gateway to current and historical activity happening in your system. View various events such as policy evaluation and vulnerability updates, system errors, feed syncs, and more.

The following sections in this document describe how to view event details, how to filter for specific events you’re interested in, and how to manage events with bulk deletion.

UI Events View

Viewing Events

In order to view events, navigate to the Events & Notifications > View Events tab. By default, the most recent activity (up to 1000 events) is shown and is automatically updated for you every 5 minutes. Note that if you have applied any filters through the search bar, your results will need to be refreshed manually.

Top-level details such as the event’s level (whether it’s an INFO or ERROR event), type, message, and affected resource is shown. Dig in to a specific event by clicking View Details under its Actions column to expand the row.

UI Event Details

Additional information such as the origininating service and host ID are available in the expanded row. Any details given by the service are also provided in JSON format to view or copy to clipboard.

Filtering Events

UI Events Search

Often, you might want to search for a specific event type or events that happened after a certain time. In this case, use the Search Events bar near the top of the page to select a filter to search on. These include:

Level
Filter events by level - INFO or ERROR
Event Type
Filter events by a match on the event type (e.g. “user.image.*”)
Since
Return events that occurred after the timestamp
Before
Return events that occurred before the timestamp
Source Servicename
Filter events by the originating service
Source Host ID
Filter events by the originating host ID
Resource Type
Filter events by the type of resource - tag, imageDigest, repository, etc.
Resource ID
Filter events by the id of the resource

Once you have selected and populated the filter fields you’re interested in, click Apply Filters to search and show those filtered results.

An alternative way to filter your results is through the in-table filter input. Note that this only applies against any data already fetched. To increase what you’re filtering on, click Fetch More near the top-right of the table for up to an additional 1000 items.

To remove any filters and reset to the default view, click Clear Filters.

Deleting Events

To assist with event management, event deletion has been added in the Enterprise 2.3 release.

Deleting individual events can be done simply through clicking Delete under the Actions column and selecting Yes to confirm. Note that after deletion, events are not recoverable.

Multi-select is available for deleting multiple events at a time. Upon selecting an event using the checkbox in the far-left column, a toolbar-like component will slide in at the bottom of the table. The number of events selected is shown along with the selection type, Clear Selection, and Delete Events options.

Checking the box in the header will select all events within that page.

UI Events Bulk Deletion

By default, it is viewed as a Custom selection. Choosing to select All Retrieved events auto-selects everything already fetched and present in the table (i.e. if a filter is applied, events not matching the filter are not selected but will be upon removal of the filter). In this state, deselecting an item will trigger a custom selection again.

Selecting All events will again auto-select all events already fetched and present in the table but while applying a filter may modify what’s viewable, this option is solely for clearing the entire backlog of events - including those not shown. In this state, deselecting an item will also trigger a custom selection.

Once you have selected the events you wish to remove, click Delete Events to open a modal and review up to 50 items. Any events you don’t wish to delete anymore can be deselected as well. To continue with removal, click Yes to confirm and start the process.

Note that events are account-wide and that any events removed will be mirrored across all users in the account.