Additional capabilities compared to the inotify(7) API include the ability to monitor all of the objects in a mounted filesystem, the ability to make access permission decisions, and the possibility to read or modify files before access by other applications.
The following system calls are used with this API: fanotify_init(2), fanotify_mark(2), read(2), write(2), and close(2).
An fanotify notification group is a kernel-internal object that holds a list of files, directories, and mount points for which events shall be created.
For each entry in an fanotify notification group, two bit masks exist: the mark mask and the ignore mask. The mark mask defines file activities for which an event shall be created. The ignore mask defines activities for which no event shall be generated. Having these two types of masks permits a mount point or directory to be marked for receiving events, while at the same time ignoring events for specific objects under that mount point or directory.
The fanotify_mark(2) system call adds a file, directory, or mount to a notification group and specifies which events shall be reported (or ignored), or removes or modifies such an entry.
A possible usage of the ignore mask is for a file cache. Events of interest for a file cache are modification of a file and closing of the same. Hence, the cached directory or mount point is to be marked to receive these events. After receiving the first event informing that a file has been modified, the corresponding cache entry will be invalidated. No further modification events for this file are of interest until the file is closed. Hence, the modify event can be added to the ignore mask. Upon receiving the close event, the modify event can be removed from the ignore mask and the file cache entry can be updated.
The entries in the fanotify notification groups refer to files and directories via their inode number and to mounts via their mount ID. If files or directories are renamed or moved, the respective entries survive. If files or directories are deleted or mounts are unmounted, the corresponding entries are deleted.
Two types of events are generated: notification events and permission events. Notification events are merely informative and require no action to be taken by the receiving application except for closing the file descriptor passed in the event (see below). Permission events are requests to the receiving application to decide whether permission for a file access shall be granted. For these events, the recipient must write a response which decides whether access is granted or not.
An event is removed from the event queue of the fanotify group when it has been read. Permission events that have been read are kept in an internal list of the fanotify group until either a permission decision has been taken by writing to the fanotify file descriptor or the fanotify file descriptor is closed.
After a successful read(2), the read buffer contains one or more of the following structures:
struct fanotify_event_metadata { __u32 event_len; __u8 vers; __u8 reserved; __u16 metadata_len; __aligned_u64 mask; __s32 fd; __s32 pid; };
For performance reasons, it is recommended to use a large buffer size (for example, 4096 bytes), so that multiple events can be retrieved by a single read(2).
The return value of read(2) is the number of bytes placed in the buffer, or -1 in case of an error (but see BUGS).
The fields of the fanotify_event_metadata structure are as follows:
The bit mask in mask indicates which events have occurred for a single filesystem object. Multiple bits may be set in this mask, if more than one event occurred for the monitored filesystem object. In particular, consecutive events for the same filesystem object and originating from the same process may be merged into a single event, with the exception that two permission events are never merged into one queue entry.
The bits that may appear in mask are as follows:
To check for any close event, the following bit mask may be used:
FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE
The following macros are provided to iterate over a buffer containing fanotify event metadata returned by a read(2) from an fanotify file descriptor:
In addition, there is:
struct fanotify_response { __s32 fd; __u32 response; };
The fields of this structure are as follows:
If access is denied, the requesting application call will receive an EPERM error.
When all file descriptors referring to the fanotify notification group are closed, the fanotify group is released and its resources are freed for reuse by the kernel. Upon close(2), outstanding permission events will be set to allowed.
In addition to the usual errors for write(2), the following errors can occur when writing to the fanotify file descriptor:
The fanotify API does not report file accesses and modifications that may occur because of mmap(2), msync(2), and munmap(2).
Events for directories are created only if the directory itself is opened, read, and closed. Adding, removing, or changing children of a marked directory does not create events for the monitored directory itself.
Fanotify monitoring of directories is not recursive: to monitor subdirectories under a directory, additional marks must be created. (But note that the fanotify API provides no way of detecting when a subdirectory has been created under a marked directory, which makes recursive monitoring difficult.) Monitoring mounts offers the capability to monitor a whole directory tree.
The event queue can overflow. In this case, events are lost.
The following output was recorded while editing the file /home/user/temp/notes. Before the file was opened, a FAN_OPEN_PERM event occurred. After the file was closed, a FAN_CLOSE_WRITE event occurred. Execution of the program ends when the user presses the ENTER key.
# ./fanotify_example /home Press enter key to terminate. Listening for events. FAN_OPEN_PERM: File /home/user/temp/notes FAN_CLOSE_WRITE: File /home/user/temp/notes Listening for events stopped.
#define _GNU_SOURCE /* Needed to get O_LARGEFILE definition */ #include <errno.h> #include <fcntl.h> #include <limits.h> #include <poll.h> #include <stdio.h> #include <stdlib.h> #include <sys/fanotify.h> #include <unistd.h> /* Read all available fanotify events from the file descriptor 'fd' */ static void handle_events(int fd) { const struct fanotify_event_metadata *metadata; struct fanotify_event_metadata buf[200]; ssize_t len; char path[PATH_MAX]; ssize_t path_len; char procfd_path[PATH_MAX]; struct fanotify_response response; /* Loop while events can be read from fanotify file descriptor */ for(;;) { /* Read some events */ len = read(fd, (void *) &buf, sizeof(buf)); if (len == -1 && errno != EAGAIN) { perror("read"); exit(EXIT_FAILURE); } /* Check if end of available data reached */ if (len <= 0) break; /* Point to the first event in the buffer */ metadata = buf; /* Loop over all events in the buffer */ while (FAN_EVENT_OK(metadata, len)) { /* Check that run-time and compile-time structures match */ if (metadata->vers != FANOTIFY_METADATA_VERSION) { fprintf(stderr, "Mismatch of fanotify metadata version.\n"); exit(EXIT_FAILURE); } /* metadata->fd contains either FAN_NOFD, indicating a queue overflow, or a file descriptor (a nonnegative integer). Here, we simply ignore queue overflow. */ if (metadata->fd >= 0) { /* Handle open permission event */ if (metadata->mask & FAN_OPEN_PERM) { printf("FAN_OPEN_PERM: "); /* Allow file to be opened */ response.fd = metadata->fd; response.response = FAN_ALLOW; write(fd, &response, sizeof(struct fanotify_response)); } /* Handle closing of writable file event */ if (metadata->mask & FAN_CLOSE_WRITE) printf("FAN_CLOSE_WRITE: "); /* Retrieve and print pathname of the accessed file */ snprintf(procfd_path, sizeof(procfd_path), "/proc/self/fd/%d", metadata->fd); path_len = readlink(procfd_path, path, sizeof(path) - 1); if (path_len == -1) { perror("readlink"); exit(EXIT_FAILURE); } path[path_len] = '\0'; printf("File %s\n", path); /* Close the file descriptor of the event */ close(metadata->fd); } /* Advance to next event */ metadata = FAN_EVENT_NEXT(metadata, len); } } } int main(int argc, char *argv[]) { char buf; int fd, poll_num; nfds_t nfds; struct pollfd fds[2]; /* Check mount point is supplied */ if (argc != 2) { fprintf(stderr, "Usage: %s MOUNT\n", argv[0]); exit(EXIT_FAILURE); } printf("Press enter key to terminate.\n"); /* Create the file descriptor for accessing the fanotify API */ fd = fanotify_init(FAN_CLOEXEC | FAN_CLASS_CONTENT | FAN_NONBLOCK, O_RDONLY | O_LARGEFILE); if (fd == -1) { perror("fanotify_init"); exit(EXIT_FAILURE); } /* Mark the mount for: - permission events before opening files - notification events after closing a write-enabled file descriptor */ if (fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT, FAN_OPEN_PERM | FAN_CLOSE_WRITE, -1, argv[1]) == -1) { perror("fanotify_mark"); exit(EXIT_FAILURE); } /* Prepare for polling */ nfds = 2; /* Console input */ fds[0].fd = STDIN_FILENO; fds[0].events = POLLIN; /* Fanotify input */ fds[1].fd = fd; fds[1].events = POLLIN; /* This is the loop to wait for incoming events */ printf("Listening for events.\n"); while (1) { poll_num = poll(fds, nfds, -1); if (poll_num == -1) { if (errno == EINTR) /* Interrupted by a signal */ continue; /* Restart poll() */ perror("poll"); /* Unexpected error */ exit(EXIT_FAILURE); } if (poll_num > 0) { if (fds[0].revents & POLLIN) { /* Console input is available: empty stdin and quit */ while (read(STDIN_FILENO, &buf, 1) > 0 && buf != '\n') continue; break; } if (fds[1].revents & POLLIN) { /* Fanotify events are available */ handle_events(fd); } } } printf("Listening for events stopped.\n"); exit(EXIT_SUCCESS); }