Message ID | 20140424133055.GA13269@redhat.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers |
Received: from mail.tu-berlin.de ([130.149.7.33]) by www.linuxtv.org with esmtp (Exim 4.72) (envelope-from <linux-media-owner@vger.kernel.org>) id 1WdJlD-00037E-1R; Thu, 24 Apr 2014 15:32:19 +0200 X-tubIT-Incoming-IP: 209.132.180.67 Received: from vger.kernel.org ([209.132.180.67]) by mail.tu-berlin.de (exim-4.72/mailfrontend-8) with esmtp id 1WdJlA-0003XJ-mM; Thu, 24 Apr 2014 15:32:18 +0200 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757537AbaDXNbX (ORCPT <rfc822;mkrufky@linuxtv.org> + 1 other); Thu, 24 Apr 2014 09:31:23 -0400 Received: from mx1.redhat.com ([209.132.183.28]:44177 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756848AbaDXNbT (ORCPT <rfc822;linux-media@vger.kernel.org>); Thu, 24 Apr 2014 09:31:19 -0400 Received: from int-mx01.intmail.prod.int.phx2.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s3ODV0rs025250 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Thu, 24 Apr 2014 09:31:01 -0400 Received: from tranklukator.brq.redhat.com (dhcp-1-104.brq.redhat.com [10.34.1.104]) by int-mx01.intmail.prod.int.phx2.redhat.com (8.13.8/8.13.8) with SMTP id s3ODUunT003050; Thu, 24 Apr 2014 09:30:56 -0400 Received: by tranklukator.brq.redhat.com (nbSMTP-1.00) for uid 500 oleg@redhat.com; Thu, 24 Apr 2014 15:30:59 +0200 (CEST) Date: Thu, 24 Apr 2014 15:30:55 +0200 From: Oleg Nesterov <oleg@redhat.com> To: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>, Andrew Morton <akpm@linux-foundation.org>, Jan Kara <jack@suse.cz>, Roland Dreier <roland@kernel.org>, Konstantin Khlebnikov <koct9i@gmail.com>, "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>, Mauro Carvalho Chehab <m.chehab@samsung.com>, Omar Ramirez Luna <omar.ramirez@copitl.com>, Inki Dae <inki.dae@samsung.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-rdma@vger.kernel.org, linux-media@vger.kernel.org Subject: Re: [PATCH] mm: get_user_pages(write,force) refuse to COW in shared areas Message-ID: <20140424133055.GA13269@redhat.com> References: <alpine.LSU.2.11.1404040120110.6880@eggly.anvils> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <alpine.LSU.2.11.1404040120110.6880@eggly.anvils> User-Agent: Mutt/1.5.18 (2008-05-17) X-Scanned-By: MIMEDefang 2.67 on 10.5.11.11 Sender: linux-media-owner@vger.kernel.org Precedence: bulk List-ID: <linux-media.vger.kernel.org> X-Mailing-List: linux-media@vger.kernel.org X-PMX-Version: 6.0.0.2142326, Antispam-Engine: 2.7.2.2107409, Antispam-Data: 2014.4.24.131518 X-PMX-Spam: Gauge=IIIIIIIII, Probability=9%, Report=' MULTIPLE_RCPTS 0.1, HTML_00_01 0.05, HTML_00_10 0.05, MSGID_ADDED_BY_MTA 0.05, BODYTEXTP_SIZE_3000_LESS 0, BODY_SIZE_2000_2999 0, BODY_SIZE_5000_LESS 0, BODY_SIZE_7000_LESS 0, URI_ENDS_IN_HTML 0, __ANY_URI 0, __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0, __CD 0, __CP_URI_IN_BODY 0, __CT 0, __CT_TEXT_PLAIN 0, __FORWARDED_MSG 0, __HAS_FROM 0, __HAS_MSGID 0, __HAS_X_MAILING_LIST 0, __IN_REP_TO 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __MULTIPLE_RCPTS_CC_X2 0, __SANE_MSGID 0, __SUBJ_ALPHA_END 0, __SUBJ_ALPHA_NEGATE 0, __TO_MALFORMED_2 0, __URI_NO_WWW 0, __URI_NS , __USER_AGENT 0' |
Commit Message
Oleg Nesterov
April 24, 2014, 1:30 p.m. UTC
Hi Hugh, Sorry for late reply. First of all, to avoid the confusion, I think the patch is fine. When I saw this patch I decided that uprobes should be updated accordingly, but I just realized that I do not understand what should I write in the changelog. On 04/04, Hugh Dickins wrote: > > + if (gup_flags & FOLL_WRITE) { > + if (!(vm_flags & VM_WRITE)) { > + if (!(gup_flags & FOLL_FORCE)) > + goto efault; > + /* > + * We used to let the write,force case do COW > + * in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so > + * ptrace could set a breakpoint in a read-only > + * mapping of an executable, without corrupting > + * the file (yet only when that file had been > + * opened for writing!). Anon pages in shared > + * mappings are surprising: now just reject it. > + */ > + if (!is_cow_mapping(vm_flags)) { > + WARN_ON_ONCE(vm_flags & VM_MAYWRITE); > + goto efault; > + } OK. But could you please clarify "Anon pages in shared mappings are surprising" ? I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" mentioned above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ? Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is only VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will not work. But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in this case, (although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE. However, currently uprobes assumes that a cowed anon page is fine in this case, and this differs from gup(). So, what do you think about the patch below? It is probably fine in any case, but is there any "strong" reason to follow the gup's behaviour and forbid the anon page in VM_MAYSHARE && !VM_MAYWRITE vma? Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Comments
On Thu, 24 Apr 2014, Oleg Nesterov wrote: > Hi Hugh, > > Sorry for late reply. First of all, to avoid the confusion, I think the > patch is fine. > > When I saw this patch I decided that uprobes should be updated accordingly, > but I just realized that I do not understand what should I write in the > changelog. Thanks a lot for considering similar issues in uprobes, Oleg: I merely checked that its uses of get_user_pages() would not be problematic, and didn't look around to rediscover the worrying mm business that goes on down there in kernel/events. > > On 04/04, Hugh Dickins wrote: > > > > + if (gup_flags & FOLL_WRITE) { > > + if (!(vm_flags & VM_WRITE)) { > > + if (!(gup_flags & FOLL_FORCE)) > > + goto efault; > > + /* > > + * We used to let the write,force case do COW > > + * in a VM_MAYWRITE VM_SHARED !VM_WRITE vma, so > > + * ptrace could set a breakpoint in a read-only > > + * mapping of an executable, without corrupting > > + * the file (yet only when that file had been > > + * opened for writing!). Anon pages in shared > > + * mappings are surprising: now just reject it. > > + */ > > + if (!is_cow_mapping(vm_flags)) { > > + WARN_ON_ONCE(vm_flags & VM_MAYWRITE); > > + goto efault; > > + } > > OK. But could you please clarify "Anon pages in shared mappings are surprising" ? > I mean, does this only apply to "VM_MAYWRITE VM_SHARED !VM_WRITE vma" mentioned > above or this is bad even if a !FMODE_WRITE file was mmaped as MAP_SHARED ? Good question. I simply didn't consider that - and (as you have realized) didn't need to consider it, because I was just stopping the problematic behaviour in gup(), and didn't need to consider whether other behaviour prohibited by gup() was actually unproblematic. > > Yes, in this case this vma is not VM_SHARED and it is not VM_MAYWRITE, it is only > VM_MAYSHARE. This is in fact private mapping except mprotect(PROT_WRITE) will not > work. > > But with or without this patch gup(FOLL_WRITE | FOLL_FORCE) won't work in this case, "this" meaning my patch rather than yours below > (although perhaps it could ?), is_cow_mapping() == F because of !VM_MAYWRITE. > > However, currently uprobes assumes that a cowed anon page is fine in this case, and > this differs from gup(). > > So, what do you think about the patch below? It is probably fine in any case, > but is there any "strong" reason to follow the gup's behaviour and forbid the > anon page in VM_MAYSHARE && !VM_MAYWRITE vma? I don't think there is a "strong" reason to forbid it. The strongest reason is simply that it's much safer if uprobes follows the same conventions as mm, and get_user_pages() happens to have forbidden that all along. The philosophical reason to forbid it is that the user mmapped with MAP_SHARED, and it's merely a kernel-internal detail that we flip off VM_SHARED and treat these read-only shared mappings very much like private mappings. The user asked for MAP_SHARED, and we prefer to respect that by not letting private COWs creep in. We could treat those mappings even more like private mappings, and allow the COWs; but better to be strict about it, so long as doing so doesn't give you regressions. > > Oleg. > > --- x/kernel/events/uprobes.c > +++ x/kernel/events/uprobes.c > @@ -127,12 +127,13 @@ struct xol_area { > */ > static bool valid_vma(struct vm_area_struct *vma, bool is_register) > { > - vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED; > + vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC; I think a one-line patch changing VM_SHARED to VM_MAYSHARE would do it, wouldn't it? And save you from having to export is_cow_mapping() from mm/memory.c. (I used is_cow_mapping() because I had to make the test more complex anyway, just to exclude the case which had been oddly handled before.) Hugh > > if (is_register) > flags |= VM_WRITE; > > - return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC; > + return vma->vm_file && is_cow_mapping(vma->vm_flags) && > + (vma->vm_flags & flags) == VM_MAYEXEC; > } > > static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset) -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/24, Hugh Dickins wrote: > > On Thu, 24 Apr 2014, Oleg Nesterov wrote: > > > So, what do you think about the patch below? It is probably fine in any case, > > but is there any "strong" reason to follow the gup's behaviour and forbid the > > anon page in VM_MAYSHARE && !VM_MAYWRITE vma? > > I don't think there is a "strong" reason to forbid it. > > The strongest reason is simply that it's much safer if uprobes follows > the same conventions as mm, and get_user_pages() happens to have > forbidden that all along. > > The philosophical reason to forbid it is that the user mmapped with > MAP_SHARED, and it's merely a kernel-internal detail that we flip off > VM_SHARED and treat these read-only shared mappings very much like > private mappings. The user asked for MAP_SHARED, and we prefer to > respect that by not letting private COWs creep in. > > We could treat those mappings even more like private mappings, and > allow the COWs; but better to be strict about it, so long as doing > so doesn't give you regressions. Great, thanks a lot! I was worried I missed something subtle. And I forgot to mention, there is another reason why I would like to change uprobes to follow the same convention. I still think it would be better to kill __replace_page() and use gup(FOLL_WRITE | FORCE) in uprobe_write_opcode(). > > --- x/kernel/events/uprobes.c > > +++ x/kernel/events/uprobes.c > > @@ -127,12 +127,13 @@ struct xol_area { > > */ > > static bool valid_vma(struct vm_area_struct *vma, bool is_register) > > { > > - vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED; > > + vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC; > > I think a one-line patch changing VM_SHARED to VM_MAYSHARE would do it, > wouldn't it? And save you from having to export is_cow_mapping() > from mm/memory.c. (I used is_cow_mapping() because I had to make the > test more complex anyway, just to exclude the case which had been > oddly handled before.) Indeed, thanks! Oleg. -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 25 Apr 2014, Oleg Nesterov wrote: > > And I forgot to mention, there is another reason why I would like to > change uprobes to follow the same convention. I still think it would > be better to kill __replace_page() and use gup(FOLL_WRITE | FORCE) > in uprobe_write_opcode(). Oh, please please do! I assumed there was a good atomicity reason for doing it that way, but if you can do it with gup() please do so. I went off into a little rant about __replace_page() in my reply to you; but then felt that you had approached with an honest enquiry, and didn't deserve a rant in response, so I deleted it. And, of course, I'm conscious that I have from start to finish withheld my involvement from uprobes, and refused to review that __replace_page() (beyond insisting that it not be put in a common place for sharing with KSM, because I just couldn't guarantee it for uprobes). I'm afraid that it's much like HWPoison to me, a complexity (nastiness?) way beyond what I have time to support myself. Hugh -- To unsubscribe from this list: send the line "unsubscribe linux-media" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
--- x/kernel/events/uprobes.c +++ x/kernel/events/uprobes.c @@ -127,12 +127,13 @@ struct xol_area { */ static bool valid_vma(struct vm_area_struct *vma, bool is_register) { - vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC | VM_SHARED; + vm_flags_t flags = VM_HUGETLB | VM_MAYEXEC; if (is_register) flags |= VM_WRITE; - return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC; + return vma->vm_file && is_cow_mapping(vma->vm_flags) && + (vma->vm_flags & flags) == VM_MAYEXEC; } static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)