Memory Protection Keys for Userspace (PKU aka PKEYs) is a CPU feature
which will be found on future Intel CPUs.

This feature is available on PowerPC 5 and higher CPUs.

Memory Protection Keys provide a mechanism for enforcing page-based
protections, but without requiring modification of the page tables when an
application changes protection domains.

It works by dedicating bits in each page table entry to a "protection key".
There is also a user-accessible register with two separate bits for each
key.  Being a CPU register, the user-accessible register is inherently
thread-local, potentially giving each thread a different set of protections
from every other thread.

On Intel:

	Four previously ignored bits in the page table entry are used giving
	16 possible keys.

	The user accessible register(PKRU) has a bit each per key to disable
	access and to disable write.

	The feature is only available in 64-bit mode, even though there is
	theoretically space in the PAE PTEs.  These permissions are enforced on
	data access only and have no effect on instruction fetches.

On PowerPC:

	Five bits in the page table entry are used giving 32 possible keys.
	This support is currently for Hash Page Table mode only.

	The user accessible register(AMR) has a bit each per key to disable
	read and write. Access-disable can be achieved by disabling
	read and write.

	'mtspr 0xd, mem' reads the AMR register
	'mfspr mem, 0xd' writes into the AMR register.

	Execution can be disabled by allocating a key with execute-disabled
	permission. The execute-permissions on the key; however, cannot be
	changed through a user accessible register.  The CPU will not allow
	execution of instruction in pages that are associated with
	execute-disabled key.

=========================== Syscalls ===========================

There are 3 system calls which directly interact with pkeys:

	int pkey_alloc(unsigned long flags, unsigned long init_access_rights)
	int pkey_free(int pkey);
	int pkey_mprotect(unsigned long start, size_t len,
			  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated with
pkey_alloc().  An application calls the WRPKRU/AMR instruction
directly in order to change access permissions to memory covered
with a key.  In this example WRPKRU/AMR is wrapped by a C function
called pkey_set().

	int real_prot = PROT_READ|PROT_WRITE;
	pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);
	ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
	ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);
	... application runs here

Now, if the application needs to update the data at 'ptr', it can
gain access, do the update, then remove its write access:

	pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE
	*ptr = foo; // assign something
	pkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again

Now when it frees the memory, it will also free the pkey since it
is no longer in use:

	munmap(ptr, PAGE_SIZE);
	pkey_free(pkey);

(Note: pkey_set() is a wrapper for the RDPKRU,WRPKRU or AMR instructions.
 An example implementation can be found in
 tools/testing/selftests/vm/protection_keys.c)

=========================== Behavior =================================

The kernel attempts to make protection keys consistent with the
behavior of a plain mprotect().  For instance if you do this:

	mprotect(ptr, size, PROT_NONE);
	something(ptr);

you can expect the same effects with protection keys when doing this:

	pkey = pkey_alloc(0, PKEY_DISABLE_ACCESS);
	pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);
	something(ptr);

That should be true whether something() is a direct access to 'ptr'
like:

	*ptr = foo;

or when the kernel does the access on the application's behalf like
with a read():

	read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be set
to SEGV_PKERR when violating protection keys versus SEGV_ACCERR when
the plain mprotect() permissions are violated.

====================================================================
		Differences

The following differences exist between x86 and power.

a) powerpc (PowerPC8 onwards) *also* allows creation of a key with
   execute-disabled.
	The following is allowed
	pkey = pkey_alloc(0, PKEY_DISABLE_EXECUTE);

b) On powerpc the access/write permission on a key can be modified by
   programming the AMR register from the signal handler. The changes
   persist across signal boundaries. On x86, the PKRU specific fpregs
   entry must be modified to change the access/write permission on
   a key.
=====================================================================
