From: Peter Xu Subject: mm: check against orig_pte for finish_fault() This patch allows do_fault() to trigger on !pte_none() cases too. This prepares for the pte markers to be handled by do_fault() just like none pte. To achieve this, instead of unconditionally check against pte_none() in finish_fault(), we may hit the case that the orig_pte was some pte marker so what we want to do is to replace the pte marker with some valid pte entry. Then if orig_pte was set we'd want to check the current *pte (under pgtable lock) against orig_pte rather than none pte. Right now there's no solid way to safely reference orig_pte because when pmd is not allocated handle_pte_fault() will not initialize orig_pte, so it's not safe to reference it. There's another solution proposed before this patch to do pte_clear() for vmf->orig_pte for pmd==NULL case, however it turns out it'll break arm32 because arm32 could have assumption that pte_t* pointer will always reside on a real ram32 pgtable, not any kernel stack variable. To solve this, we add a new flag FAULT_FLAG_ORIG_PTE_VALID, and it'll be set along with orig_pte when there is valid orig_pte, or it'll be cleared when orig_pte was not initialized. It'll be updated every time we call handle_pte_fault(), so e.g. if a page fault retry happened it'll be properly updated along with orig_pte. [1] https://lore.kernel.org/lkml/710c48c9-406d-e4c5-a394-10501b951316@samsung.com/ Link: https://lkml.kernel.org/r/20220405014836.14077-1-peterx@redhat.com Signed-off-by: Peter Xu Reviewed-by: Alistair Popple Cc: Andrea Arcangeli Cc: Axel Rasmussen Cc: David Hildenbrand Cc: Hugh Dickins Cc: Jerome Glisse Cc: "Kirill A . Shutemov" Cc: Matthew Wilcox Cc: Mike Kravetz Cc: Mike Rapoport Cc: Nadav Amit Cc: Marek Szyprowski Signed-off-by: Andrew Morton --- mm/memory.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-check-against-orig_pte-for-finish_fault +++ a/mm/memory.c @@ -4237,7 +4237,7 @@ vm_fault_t finish_fault(struct vm_fault vmf->address, &vmf->ptl); ret = 0; /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) do_set_pte(vmf, page, vmf->address); else ret = VM_FAULT_NOPAGE; @@ -4705,6 +4705,13 @@ static vm_fault_t handle_pte_fault(struc * concurrent faults and from rmap lookups. */ vmf->pte = NULL; + /* + * Always initialize orig_pte. This matches with below + * code to have orig_pte to be the none pte if pte==NULL. + * This makes the rest code to be always safe to reference + * it, e.g. in finish_fault() we'll detect pte changes. + */ + pte_clear(vmf->vma->vm_mm, vmf->address, &vmf->orig_pte); } else { /* * If a huge pmd materialized under us just retry later. Use _