cmd/zed/agents/README.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112


    Contents⊕
  
  
  Fault Management Logic
  for ZED
  
  ZED+FM Phase 1
  ZED+FM Phase 2 (WIP)
  ZED+FM Phase 3
  ZFS Fault Management
  Overview
  FMD Components in ZED
  Implementation Notes
  
  
Fault Management Logic for
ZED
The integration of Fault Management Daemon (FMD) logic from illumos
is being deployed in three phases. This logic is encapsulated in several
software modules inside ZED.
ZED+FM Phase 1
All the phase 1 work is in current Master branch. Phase I work
includes:

Add new paths to the persistent VDEV label for device matching.
Add a disk monitor for generating disk-add and
disk-change events.
Add support for automated VDEV auto-online, auto-replace and
auto-expand.
Expand the statechange event to include all VDEV state
transitions.

ZED+FM Phase 2 (WIP)
The phase 2 work primarily entails the Diagnosis Engine and
the Retire Agent modules. It also includes infrastructure to
support a crude FMD environment to host these modules. For additional
information see the FMD Components in ZED and
Implementation Notes sections below.
ZED+FM Phase 3
Future work will add additional functionality and will likely
include:

Add FMD module garbage collection (periodically call
fmd_module_gc()).
Add real module property retrieval (currently hard-coded in
accessors).
Additional diagnosis telemetry (like latency outliers and SMART
data).
Export FMD module statistics.
Zedlet parallel execution and resiliency (add watchdog).

ZFS Fault Management
Overview
The primary purpose with ZFS fault management is automated diagnosis
and isolation of VDEV faults. A fault is something we can associate with
an impact (e.g. loss of data redundancy) and a corrective action (e.g.
offline or replace a disk). A typical ZFS fault management stack is
comprised of error detectors (e.g.
zfs_ereport_post()), a disk monitor, a
diagnosis engine and response agents.
After detecting a software error, the ZFS kernel module sends error
events to the ZED user daemon which in turn routes the events to its
internal FMA modules based on their event subscriptions. Likewise, if a
disk is added or changed in the system, the disk monitor sends disk
events which are consumed by a response agent.
FMD Components in ZED
There are three FMD modules (aka agents) that are now built into
ZED.

A Diagnosis Engine module
(agents/zfs_diagnosis.c)
A Retire Agent module
(agents/zfs_retire.c)
A Disk Add Agent module
(agents/zfs_mod.c)

To begin with, a Diagnosis Engine consumes per-vdev
I/O and checksum ereports and feeds them into a Soft Error Rate
Discrimination (SERD) algorithm which will generate a corresponding
fault diagnosis when the tracked VDEV encounters N
events in a given T time window. The initial N and T
values for the SERD algorithm are estimates inherited from illumos (10
errors in 10 minutes).
In turn, a Retire Agent responds to diagnosed faults
by isolating the faulty VDEV. It will notify the ZFS kernel module of
the new VDEV state (degraded or faulted). The retire agent is also
responsible for managing hot spares across all pools. When it encounters
a device fault or a device removal it will replace the device with an
appropriate spare if available.
Finally, a Disk Add Agent responds to events from a
libudev disk monitor (EC_DEV_ADD or
EC_DEV_STATUS) and will online, replace or expand the
associated VDEV. This agent is also known as the zfs_mod or
Sysevent Loadable Module (SLM) on the illumos platform. The added disk
is matched to a specific VDEV using its device id, physical path or VDEV
GUID.
Note that the auto-replace feature (aka hot plug) is opt-in
and you must set the pool's autoreplace property to enable
it. The new disk will be matched to the corresponding leaf VDEV by
physical location and labeled with a GPT partition before replacing the
original VDEV in the pool.
Implementation Notes

The FMD module API required for logic modules is emulated and
implemented in the fmd_api.c and fmd_serd.c
source files. This support includes module registration, memory
allocation, module property accessors, basic case management, one-shot
timers and SERD engines. For detailed information on the FMD module API,
see the document -- "Fault Management Daemon Programmer's Reference
Manual".
The event subscriptions for the modules (located in a module
specific configuration file on illumos) are currently hard-coded into
the ZED zfs_agent_dispatch() function.
The FMD modules are called one at a time from a single thread
that consumes events queued to the modules. These events are sourced
from the normal ZED events and also include events posted from the
diagnosis engine and the libudev disk event monitor.
The FMD code modules have minimal changes and were intentionally
left as similar as possible to their upstream source files.
The sysevent namespace in ZED differs from illumos. For
example:

illumos uses
"resource.sysevent.EC_zfs.ESC_ZFS_vdev_remove"
Linux uses "sysevent.fs.zfs.vdev_remove"

The FMD Modules port was produced by Intel Federal, LLC under
award number B609815 between the U.S. Department of Energy (DOE) and
Intel Federal, LLC.