Anchore Enterprise exposes prometheus metrics in the API of each service if the config.yaml that is used by that service has the metrics.enabled key set to true.

Each service exports its own metrics and is typically scraped by a Prometheus installation to gather the metrics. Anchore does not aggregate or distribute metrics between services. You should configure your Prometheus deployment or integration to check each Anchore service’s API using the same port it exports for the /metrics route.

Monitoring in Kubernetes and/or Helm Chart

Prometheus is very commonly used for monitoring Kubernetes clusters. Prometheus is supported by core Kubernetes services. There are many guides on using Prometheus to monitor a cluster and services deployed within, and also many other monitoring systems can consume Prometheus metrics.

The Anchore Helm Chart includes a quick way to enable the Prometheus metrics on each service container:

  • Set: helm install --name myanchore anchore/anchore-engine --set anchoreGlobal.enableMetrics=true

  • Or, set it directly in your customized values.yaml

The specific strategy for monitoring services with prometheus is outside the scope of this document. But, because Anchore exposes metrics on the /metrics route of all service ports, it should be compatible with most monitoring approaches (daemon sets, side-cars, etc).

Metrics of Note

Anchore services export a range of metrics. The following list shows some Anchore services that can help you determine the health and load of an Anchore deployment.

  • anchore_queue_length, specifically for queuename: “images_to_analyze”
    • This is the number of images pending analysis, in the not_analyzed state.
    • As this number grows you can expect longer analysis times.
    • Adding more analyzers to a system can help drain the queue faster and keep wait times to a minimum.
    • Example: anchore_queue_length{instance=“engine-simpleq:8228”,job=“anchore-simplequeue”,queuename=“images_to_analyze”}.
    • This metric is exported from all simplequeue service instances, but is based on the database state, so they should all present a consistent view of the length of the queue.
  • anchore_monitor_runtime_seconds_count
    • These metrics, one for each monitor, record the duration of the async processes as they execute on a duty cycle.
    • As the system grows, these will become longer to account for more tags to check for updates, repos to scan for new tags, and user notifications to process.
  • anchore_tmpspace_available_bytes
    • This metric tracks the available space in the “tmp_dir” location for each container. This is most important for the instances that are analyzers where this can indicate how much disk is being used for analysis and how much overhead there is for analyzing large images.
    • This is expected to be consumed in cycles, with usage growing during analysis and then flushing upon completion. A consistent growth pattern here may indicate left over artifacts from analysis failures or a large layer_cache setting that is not yet full. The layer cache (see Layer Caching) is located in this space and thus will affect the metric.
  • process_resident_memory_bytes
    • This is the memory actually consumed by the instance, where each instance is a service process of Anchore. Anchore is fairly memory intensive for large images and in deployments with lots of analyzed images due to lots of json parsing and marshalling, so monitoring this metric will help inform capacity requirements for different components based on your specific workloads. Lots of variables affect memory usage, so while we give recommendations in the Capacity Planning document, there is no substitute for profiling and monitoring your usage carefully.
Last modified April 8, 2024