From a4965520f67e0a3084dd71d4380e9e6899aca48b Mon Sep 17 00:00:00 2001 From: Bruce Momjian Date: Tue, 27 Aug 2002 04:09:01 +0000 Subject: [PATCH] Add to mmap discussion. --- doc/TODO.detail/mmap | 1188 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 1188 insertions(+) diff --git a/doc/TODO.detail/mmap b/doc/TODO.detail/mmap index 77dc3993af..aafba644ad 100644 --- a/doc/TODO.detail/mmap +++ b/doc/TODO.detail/mmap @@ -575,3 +575,1191 @@ shm_open() TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) +From pgsql-hackers-owner+M24146@postgresql.org Tue Jun 25 02:27:29 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5P6RSF12626 + for ; Tue, 25 Jun 2002 02:27:28 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 2C72F475EF6; Tue, 25 Jun 2002 02:27:28 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Tue Jun 25 02:27:28 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 42AAB475B26; Tue, 25 Jun 2002 02:07:04 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id A8D13475A06 + for ; Tue, 25 Jun 2002 02:07:01 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Tue Jun 25 02:07:01 2002 +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by postgresql.org (Postfix) with ESMTP id F3C264760A1 + for ; Tue, 25 Jun 2002 01:05:49 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id 5F61CF820; Tue, 25 Jun 2002 05:05:47 +0000 (UTC) +Date: Tue, 25 Jun 2002 14:05:45 +0900 (JST) +From: Curt Sampson +To: "J. R. Nield" +cc: Bruce Momjian , Tom Lane , + PostgreSQL Hacker +Subject: [HACKERS] Buffer Management +In-Reply-To: <1024951786.1793.865.camel@localhost.localdomain> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-5.3 required=5.0 + tests=IN_REP_TO,X_NOT_PRESENT + version=2.30 +Status: OR + +I'm splitting off this buffer mangement stuff into a separate thread. + +On 24 Jun 2002, J. R. Nield wrote: + +> I'll back off on that. I don't know if we want to use the OS buffer +> manager, but shouldn't we try to have our buffer manager group writes +> together by files, and pro-actively get them out to disk? + +The only way the postgres buffer manager can "get [data] out to disk" +is to do an fsync(). For data files (as opposed to log files), this can +only slow down overall system throughput, as this would only disrupt the +OS's write management. + +> Right now, it +> looks like all our write requests are delayed as long as possible and +> the order in which they are written is pretty-much random, as is the +> backend that writes the block, so there is no locality of reference even +> when the blocks are adjacent on disk, and the write calls are spread-out +> over all the backends. + +It doesn't matter. The OS will introduce locality of reference with its +write algorithms. Take a look at + + http://www.cs.wisc.edu/~solomon/cs537/disksched.html + +for an example. Most OSes use the elevator or one-way elevator +algorithm. So it doesn't matter whether it's one back-end or many +writing, and it doesn't matter in what order they do the write. + +> Would it not be the case that things like read-ahead, grouping writes, +> and caching written data are probably best done by PostgreSQL, because +> only our buffer manager can understand when they will be useful or when +> they will thrash the cache? + +Operating systems these days are not too bad at guessing guessing what +you're doing. Pretty much every OS I've seen will do read-ahead when +it detects you're doing sequential reads, at least in the forward +direction. And Solaris is even smart enough to mark the pages you've +read as "not needed" so that they quickly get flushed from the cache, +rather than blowing out your entire cache if you go through a large +file. + +> Would O_DSYNC|O_RSYNC turn off the cache? + +No. I suppose there's nothing to stop it doing so, in some +implementations, but the interface is not designed for direct I/O. + +> Since you know a lot about NetBSD internals, I'd be interested in +> hearing about what postgresql looks like to the NetBSD buffer manager. + +Well, looks like pretty much any program, or group of programs, +doing a lot of I/O. :-) + +> Am I right that strings of successive writes get randomized? + +No; as I pointed out, they in fact get de-randomized as much as +possible. The more proceses you have throwing out requests, the better +the throughput will be in fact. + +> What do our cache-hit percentages look like? I'm going to do some +> experimenting with this. + +Well, that depends on how much memory you have and what your working +set is. :-) + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + + + +---------------------------(end of broadcast)--------------------------- +TIP 6: Have you searched our list archives? + +http://archives.postgresql.org + + + +From cjs@cynic.net Tue Jun 25 09:52:23 2002 +Return-path: +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PDqKF07478 + for ; Tue, 25 Jun 2002 09:52:22 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id D9242F820; Tue, 25 Jun 2002 13:52:18 +0000 (UTC) +Date: Tue, 25 Jun 2002 22:52:14 +0900 (JST) +From: Curt Sampson +To: "J. R. Nield" +cc: Bruce Momjian , Tom Lane , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + + +So, while we're at it, what's the current state of people's thinking +on using mmap rather than shared memory for data file buffers? I +see some pretty powerful advantages to this approach, and I'm not +(yet :-)) convinced that the disadvantages are as bad as people think. +I think I can address most of the concerns in doc/TODO.detail/mmap. + +Is this worth pursuing a bit? (I.e., should I spend an hour or two +writing up the advantages and thoughts on how to get around the +problems?) Anybody got objections that aren't in doc/TODO.detail/mmap? + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + +From tgl@sss.pgh.pa.us Tue Jun 25 10:09:07 2002 +Return-path: +Received: from sss.pgh.pa.us (root@[192.204.191.242]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PE96F08922 + for ; Tue, 25 Jun 2002 10:09:06 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PE92107301; + Tue, 25 Jun 2002 10:09:02 -0400 (EDT) +To: Curt Sampson +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: +References: +Comments: In-reply-to Curt Sampson + message dated "Tue, 25 Jun 2002 22:52:14 +0900" +Date: Tue, 25 Jun 2002 10:09:02 -0400 +Message-ID: <7298.1025014142@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Curt Sampson writes: +> So, while we're at it, what's the current state of people's thinking +> on using mmap rather than shared memory for data file buffers? + +There seem to be a couple of different threads in doc/TODO.detail/mmap. + +One envisions mmap as a one-for-one replacement for our current use of +SysV shared memory, the main selling point being to get out from under +kernels that don't have SysV support or have it configured too small. +This might be worth doing, and I think it'd be relatively easy to do +now that the shared memory support is isolated in one file and there's +provisions for selecting a shmem implementation at configure time. +The only thing you'd really have to think about is how to replace the +current behavior that uses shmem attach counts to discover whether any +old backends are left over from a previous crashed postmaster. I dunno +if mmap offers any comparable facility. + +The other discussion seemed to be considering how to mmap individual +data files right into backends' address space. I do not believe this +can possibly work, because of loss of control over visibility of data +changes to other backends, timing of write-backs, etc. + +But as long as you stay away from interpretation #2 and go with +mmap-as-a-shmget-substitute, it might be worthwhile. + +(Hey Marc, can one do mmap in a BSD jail?) + + regards, tom lane + +From pgsql-hackers-owner+M24158@postgresql.org Tue Jun 25 10:20:42 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEKgF10228 + for ; Tue, 25 Jun 2002 10:20:42 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 7259547609E; Tue, 25 Jun 2002 10:20:35 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Tue Jun 25 10:20:35 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 8E79647604C; Tue, 25 Jun 2002 10:20:33 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id C3EB1476002 + for ; Tue, 25 Jun 2002 10:20:30 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Tue Jun 25 10:20:30 2002 +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by postgresql.org (Postfix) with ESMTP id 887F9475B2F + for ; Tue, 25 Jun 2002 10:20:16 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id 16CCDF820; Tue, 25 Jun 2002 14:20:19 +0000 (UTC) +Date: Tue, 25 Jun 2002 23:20:15 +0900 (JST) +From: Curt Sampson +To: Tom Lane +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7298.1025014142@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-5.3 required=5.0 + tests=IN_REP_TO,X_NOT_PRESENT + version=2.30 +Status: OR + +On Tue, 25 Jun 2002, Tom Lane wrote: + +> The only thing you'd really have to think about is how to replace the +> current behavior that uses shmem attach counts to discover whether any +> old backends are left over from a previous crashed postmaster. I dunno +> if mmap offers any comparable facility. + +Sure. Just mmap a file, and it will be persistent. + +> The other discussion seemed to be considering how to mmap individual +> data files right into backends' address space. I do not believe this +> can possibly work, because of loss of control over visibility of data +> changes to other backends, timing of write-backs, etc. + +I don't understand why there would be any loss of visibility of changes. +If two backends mmap the same block of a file, and it's shared, that's +the same block of physical memory that they're accessing. Changes don't +even need to "propagate," because the memory is truly shared. You'd keep +your locks in the page itself as well, of course. + +Can you describe the problem in more detail? + +> But as long as you stay away from interpretation #2 and go with +> mmap-as-a-shmget-substitute, it might be worthwhile. + +It's #2 that I was really looking at. :-) + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + + + +---------------------------(end of broadcast)--------------------------- +TIP 2: you can get off all lists at once with the unregister command + (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) + + + +From pgsql-hackers-owner+M24159@postgresql.org Tue Jun 25 10:25:21 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEPKF10831 + for ; Tue, 25 Jun 2002 10:25:20 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id AA2EF475C46; Tue, 25 Jun 2002 10:25:13 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 10:25:13 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 9657447603B; Tue, 25 Jun 2002 10:23:23 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 364D0475FC2 + for ; Tue, 25 Jun 2002 10:23:18 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 10:23:18 2002 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (Postfix) with ESMTP id C063F47594B + for ; Tue, 25 Jun 2002 10:20:35 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g5PEKT310222; + Tue, 25 Jun 2002 10:20:29 -0400 (EDT) +From: Bruce Momjian +Message-ID: <200206251420.g5PEKT310222@candle.pha.pa.us> +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7298.1025014142@sss.pgh.pa.us> +To: Tom Lane +Date: Tue, 25 Jun 2002 10:20:29 -0400 (EDT) +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +X-Mailer: ELM [version 2.4ME+ PL97 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-3.4 required=5.0 + tests=IN_REP_TO + version=2.30 +Status: OR + +Tom Lane wrote: +> Curt Sampson writes: +> > So, while we're at it, what's the current state of people's thinking +> > on using mmap rather than shared memory for data file buffers? +> +> There seem to be a couple of different threads in doc/TODO.detail/mmap. +> +> One envisions mmap as a one-for-one replacement for our current use of +> SysV shared memory, the main selling point being to get out from under +> kernels that don't have SysV support or have it configured too small. +> This might be worth doing, and I think it'd be relatively easy to do +> now that the shared memory support is isolated in one file and there's +> provisions for selecting a shmem implementation at configure time. +> The only thing you'd really have to think about is how to replace the +> current behavior that uses shmem attach counts to discover whether any +> old backends are left over from a previous crashed postmaster. I dunno +> if mmap offers any comparable facility. +> +> The other discussion seemed to be considering how to mmap individual +> data files right into backends' address space. I do not believe this +> can possibly work, because of loss of control over visibility of data +> changes to other backends, timing of write-backs, etc. + +Agreed. Also, there was in intresting thread that mmap'ing /dev/zero is +the same as anonmap for OS's that don't have anonmap. That should cover +most of them. The only downside I can see is that SysV shared memory is +locked into RAM on some/most OS's while mmap anon probably isn't. +Locking in RAM is good in most cases, bad in others. + +This will also work well when we have non-SysV semaphore support, like +Posix semaphores, so we would be able to run with no SysV stuff. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + + + +---------------------------(end of broadcast)--------------------------- +TIP 4: Don't 'kill -9' the postmaster + + + +From pgsql-hackers-owner+M24160@postgresql.org Tue Jun 25 10:27:40 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEReF11147 + for ; Tue, 25 Jun 2002 10:27:40 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id B33CD476047; Tue, 25 Jun 2002 10:27:16 -0400 (EDT) +Mailbox-Line: From lkindness@csl.co.uk Tue Jun 25 10:27:16 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 3091247606D; Tue, 25 Jun 2002 10:23:24 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 6C39D476002 + for ; Tue, 25 Jun 2002 10:23:19 -0400 (EDT) +Mailbox-Line: From lkindness@csl.co.uk Tue Jun 25 10:23:19 2002 +Received: from internet.csl.co.uk (internet.csl.co.uk [194.130.52.3]) + by postgresql.org (Postfix) with ESMTP id AC203475C46 + for ; Tue, 25 Jun 2002 10:20:49 -0400 (EDT) +Received: from euphrates.csl.co.uk (host-194-67.csl.co.uk [194.130.52.67]) + by internet.csl.co.uk (8.12.1/8.12.1) with ESMTP id g5PEKonH023514; + Tue, 25 Jun 2002 15:20:50 +0100 +Received: from kelvin.csl.co.uk by euphrates.csl.co.uk (8.9.3/ConceptI 2.4) + id PAA08847; Tue, 25 Jun 2002 15:20:52 +0100 (BST) +Received: by kelvin.csl.co.uk (8.11.6) id g5PEKoT28846; Tue, 25 Jun 2002 15:20:50 +0100 +From: Lee Kindness +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Transfer-Encoding: 7bit +Message-ID: <15640.31809.970880.320561@kelvin.csl.co.uk> +Date: Tue, 25 Jun 2002 15:20:49 +0100 +To: Tom Lane +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7298.1025014142@sss.pgh.pa.us> +References: + <7298.1025014142@sss.pgh.pa.us> +X-Mailer: VM 7.00 under 21.4 (patch 6) "Common Lisp" XEmacs Lucid +cc: Lee Kindness , pgsql-hackers@postgresql.org +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-3.4 required=5.0 + tests=IN_REP_TO + version=2.30 +Status: OR + +Tom Lane writes: + > There seem to be a couple of different threads in + > doc/TODO.detail/mmap. + > [ snip ] + +A place where mmap could be easily used and would offer a good +performance increase is for COPY FROM. + +Lee. + + + +---------------------------(end of broadcast)--------------------------- +TIP 5: Have you checked our extensive FAQ? + +http://www.postgresql.org/users-lounge/docs/faq.html + + + +From cjs@cynic.net Tue Jun 25 10:24:49 2002 +Return-path: +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEOmF10749 + for ; Tue, 25 Jun 2002 10:24:49 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id F2629F820; Tue, 25 Jun 2002 14:24:47 +0000 (UTC) +Date: Tue, 25 Jun 2002 23:24:44 +0900 (JST) +From: Curt Sampson +To: Bruce Momjian +cc: Tom Lane , "J. R. Nield" , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <200206251420.g5PEKT310222@candle.pha.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Tue, 25 Jun 2002, Bruce Momjian wrote: + +> The only downside I can see is that SysV shared memory is +> locked into RAM on some/most OS's while mmap anon probably isn't. + +It is if you mlock() it. :-) + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + +From tgl@sss.pgh.pa.us Tue Jun 25 10:29:53 2002 +Return-path: +Received: from sss.pgh.pa.us (root@[192.204.191.242]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PETpF11341 + for ; Tue, 25 Jun 2002 10:29:52 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PETn107501; + Tue, 25 Jun 2002 10:29:49 -0400 (EDT) +To: Curt Sampson +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: +References: +Comments: In-reply-to Curt Sampson + message dated "Tue, 25 Jun 2002 23:20:15 +0900" +Date: Tue, 25 Jun 2002 10:29:49 -0400 +Message-ID: <7498.1025015389@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Curt Sampson writes: +> On Tue, 25 Jun 2002, Tom Lane wrote: +>> The other discussion seemed to be considering how to mmap individual +>> data files right into backends' address space. I do not believe this +>> can possibly work, because of loss of control over visibility of data +>> changes to other backends, timing of write-backs, etc. + +> I don't understand why there would be any loss of visibility of changes. +> If two backends mmap the same block of a file, and it's shared, that's +> the same block of physical memory that they're accessing. + +Is it? You have a mighty narrow conception of the range of +implementations that's possible for mmap. + +But the main problem is that mmap doesn't let us control when changes to +the memory buffer will get reflected back to disk --- AFAICT, the OS is +free to do the write-back at any instant after you dirty the page, and +that completely breaks the WAL algorithm. (WAL = write AHEAD log; +the log entry describing a change must hit disk before the data page +change itself does.) + + regards, tom lane + +From pgsql-hackers-owner+M24164@postgresql.org Tue Jun 25 10:44:39 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PEicF14506 + for ; Tue, 25 Jun 2002 10:44:38 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id E20F8476322; Tue, 25 Jun 2002 10:44:27 -0400 (EDT) +Mailbox-Line: From tgl@sss.pgh.pa.us Tue Jun 25 10:44:27 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 47B4847609E; Tue, 25 Jun 2002 10:34:29 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 52A5F475E5F + for ; Tue, 25 Jun 2002 10:34:25 -0400 (EDT) +Mailbox-Line: From tgl@sss.pgh.pa.us Tue Jun 25 10:34:25 2002 +Received: from sss.pgh.pa.us (unknown [192.204.191.242]) + by postgresql.org (Postfix) with ESMTP id 458BB476239 + for ; Tue, 25 Jun 2002 10:32:12 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PEWA107527; + Tue, 25 Jun 2002 10:32:10 -0400 (EDT) +To: Bruce Momjian +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <200206251420.g5PEKT310222@candle.pha.pa.us> +References: <200206251420.g5PEKT310222@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 25 Jun 2002 10:20:29 -0400" +Date: Tue, 25 Jun 2002 10:32:10 -0400 +Message-ID: <7524.1025015530@sss.pgh.pa.us> +From: Tom Lane +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-5.3 required=5.0 + tests=IN_REP_TO,X_NOT_PRESENT + version=2.30 +Status: ORr + +Bruce Momjian writes: +> This will also work well when we have non-SysV semaphore support, like +> Posix semaphores, so we would be able to run with no SysV stuff. + +You do realize that we can use Posix semaphores today? The Darwin (OS X) +port uses 'em now. That's one reason I am more interested in mmap as +a shmget substitute than I used to be. + + regards, tom lane + + + +---------------------------(end of broadcast)--------------------------- +TIP 5: Have you checked our extensive FAQ? + +http://www.postgresql.org/users-lounge/docs/faq.html + + + +From pgsql-hackers-owner+M24167@postgresql.org Tue Jun 25 11:02:20 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF2JF16153 + for ; Tue, 25 Jun 2002 11:02:20 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 7FB0F47630C; Tue, 25 Jun 2002 11:02:11 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 11:02:11 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id B755E475C22; Tue, 25 Jun 2002 10:59:45 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 7D058476387 + for ; Tue, 25 Jun 2002 10:59:38 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 10:59:38 2002 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (Postfix) with ESMTP id 49F8C475DC6 + for ; Tue, 25 Jun 2002 10:56:00 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g5PEtst15464; + Tue, 25 Jun 2002 10:55:54 -0400 (EDT) +From: Bruce Momjian +Message-ID: <200206251455.g5PEtst15464@candle.pha.pa.us> +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7524.1025015530@sss.pgh.pa.us> +To: Tom Lane +Date: Tue, 25 Jun 2002 10:55:54 -0400 (EDT) +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +X-Mailer: ELM [version 2.4ME+ PL97 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-3.4 required=5.0 + tests=IN_REP_TO + version=2.30 +Status: OR + +Tom Lane wrote: +> Bruce Momjian writes: +> > This will also work well when we have non-SysV semaphore support, like +> > Posix semaphores, so we would be able to run with no SysV stuff. +> +> You do realize that we can use Posix semaphores today? The Darwin (OS X) +> port uses 'em now. That's one reason I am more interested in mmap as + +No, I didn't realize we had gotten that far. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + + + +---------------------------(end of broadcast)--------------------------- +TIP 2: you can get off all lists at once with the unregister command + (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) + + + +From pgsql-hackers-owner+M24168@postgresql.org Tue Jun 25 11:05:13 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF5CF16398 + for ; Tue, 25 Jun 2002 11:05:13 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 30D2847634D; Tue, 25 Jun 2002 11:05:04 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 11:05:04 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id B49B5475EFA; Tue, 25 Jun 2002 10:59:47 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id A0F20475978 + for ; Tue, 25 Jun 2002 10:59:43 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 10:59:43 2002 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (Postfix) with ESMTP id 8160E4762F0 + for ; Tue, 25 Jun 2002 10:57:03 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g5PEuwO15564; + Tue, 25 Jun 2002 10:56:58 -0400 (EDT) +From: Bruce Momjian +Message-ID: <200206251456.g5PEuwO15564@candle.pha.pa.us> +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7498.1025015389@sss.pgh.pa.us> +To: Tom Lane +Date: Tue, 25 Jun 2002 10:56:58 -0400 (EDT) +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +X-Mailer: ELM [version 2.4ME+ PL97 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-2.3 required=5.0 + tests=IN_REP_TO,DOUBLE_CAPSWORD + version=2.30 +Status: OR + +Tom Lane wrote: +> Curt Sampson writes: +> > On Tue, 25 Jun 2002, Tom Lane wrote: +> >> The other discussion seemed to be considering how to mmap individual +> >> data files right into backends' address space. I do not believe this +> >> can possibly work, because of loss of control over visibility of data +> >> changes to other backends, timing of write-backs, etc. +> +> > I don't understand why there would be any loss of visibility of changes. +> > If two backends mmap the same block of a file, and it's shared, that's +> > the same block of physical memory that they're accessing. +> +> Is it? You have a mighty narrow conception of the range of +> implementations that's possible for mmap. +> +> But the main problem is that mmap doesn't let us control when changes to +> the memory buffer will get reflected back to disk --- AFAICT, the OS is +> free to do the write-back at any instant after you dirty the page, and +> that completely breaks the WAL algorithm. (WAL = write AHEAD log; +> the log entry describing a change must hit disk before the data page +> change itself does.) + +Can we mmap WAL without problems? Not sure if there is any gain to it +because we just write it and rarely read from it. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + + + +---------------------------(end of broadcast)--------------------------- +TIP 2: you can get off all lists at once with the unregister command + (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) + + + +From tgl@sss.pgh.pa.us Tue Jun 25 11:00:20 2002 +Return-path: +Received: from sss.pgh.pa.us (root@[192.204.191.242]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PF0JF15955 + for ; Tue, 25 Jun 2002 11:00:19 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5PF0J107808; + Tue, 25 Jun 2002 11:00:19 -0400 (EDT) +To: Bruce Momjian +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <200206251456.g5PEuwO15564@candle.pha.pa.us> +References: <200206251456.g5PEuwO15564@candle.pha.pa.us> +Comments: In-reply-to Bruce Momjian + message dated "Tue, 25 Jun 2002 10:56:58 -0400" +Date: Tue, 25 Jun 2002 11:00:19 -0400 +Message-ID: <7805.1025017219@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Bruce Momjian writes: +> Can we mmap WAL without problems? Not sure if there is any gain to it +> because we just write it and rarely read from it. + +Perhaps, but I don't see any point to it. + + regards, tom lane + +From pgsql-hackers-owner+M24171@postgresql.org Tue Jun 25 11:14:23 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PFENF17356 + for ; Tue, 25 Jun 2002 11:14:23 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 8EAA3476244; Tue, 25 Jun 2002 11:14:09 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 11:14:09 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id C32024762B0; Tue, 25 Jun 2002 11:10:33 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 1F81C4762A2 + for ; Tue, 25 Jun 2002 11:10:31 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Tue Jun 25 11:10:31 2002 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (Postfix) with ESMTP id CE09D475B33 + for ; Tue, 25 Jun 2002 11:02:10 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g5PF25r16113; + Tue, 25 Jun 2002 11:02:05 -0400 (EDT) +From: Bruce Momjian +Message-ID: <200206251502.g5PF25r16113@candle.pha.pa.us> +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7805.1025017219@sss.pgh.pa.us> +To: Tom Lane +Date: Tue, 25 Jun 2002 11:02:05 -0400 (EDT) +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +X-Mailer: ELM [version 2.4ME+ PL97 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-3.4 required=5.0 + tests=IN_REP_TO + version=2.30 +Status: OR + +Tom Lane wrote: +> Bruce Momjian writes: +> > Can we mmap WAL without problems? Not sure if there is any gain to it +> > because we just write it and rarely read from it. +> +> Perhaps, but I don't see any point to it. + +Agreed. I have been poking around google looking for an article I read +months ago saying that mmap of files is slighly faster in low memory +usage situations, but much slower in high memory usage situations +because the kernel doesn't know as much about the file access in mmap as +it does with stdio. I will find it. :-) + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + + + +---------------------------(end of broadcast)--------------------------- +TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org + + + +From pgsql-hackers-owner+M24179@postgresql.org Tue Jun 25 12:13:40 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5PGDdF22106 + for ; Tue, 25 Jun 2002 12:13:39 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 962BD4762AF; Tue, 25 Jun 2002 12:13:32 -0400 (EDT) +Mailbox-Line: From brad@bradm.net Tue Jun 25 12:13:32 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 06727476181; Tue, 25 Jun 2002 12:13:31 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id AB1CB4760F7 + for ; Tue, 25 Jun 2002 12:13:28 -0400 (EDT) +Mailbox-Line: From brad@bradm.net Tue Jun 25 12:13:28 2002 +Received: from bradm.net (208-59-250-198.c3-0.smr-ubr1.sbo-smr.ma.cable.rcn.com [208.59.250.198]) + by postgresql.org (Postfix) with ESMTP id 594BD476083 + for ; Tue, 25 Jun 2002 12:13:27 -0400 (EDT) +Received: (from brad@localhost) + by bradm.net (8.11.6/8.11.6) id g5PGCjA14829; + Tue, 25 Jun 2002 12:12:45 -0400 +Date: Tue, 25 Jun 2002 12:12:45 -0400 +From: Bradley McLean +To: Tom Lane +cc: Mario Weilguni , + Curt Sampson , "J. R. Nield" , + Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +Message-ID: <20020625121245.A14762@nia.bradm.net> +References: <4D618F6493CE064A844A5D496733D667038E68@freedom.icomedias.com> <7703.1025016772@sss.pgh.pa.us> +MIME-Version: 1.0 +Content-Type: text/plain; charset=us-ascii +Content-Disposition: inline +User-Agent: Mutt/1.2.5.1i +In-Reply-To: <7703.1025016772@sss.pgh.pa.us>; from tgl@sss.pgh.pa.us on Tue, Jun 25, 2002 at 10:52:52AM -0400 +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-4.2 required=5.0 + tests=IN_REP_TO,X_NOT_PRESENT,DOUBLE_CAPSWORD + version=2.30 +Status: OR + +* Tom Lane (tgl@sss.pgh.pa.us) [020625 11:00]: +> +> msync can force not-yet-written changes down to disk. It does not +> prevent the OS from choosing to write changes *before* you invoke msync. +> +> Our problem is that we want to enforce the write ordering "WAL before +> data file". To do that, we write and fsync (or DSYNC, or something) +> a WAL entry before we issue the write() against the data file. We +> don't really care if the kernel delays the data file write beyond that +> point, but we can be certain that the data file write did not occur +> too early. +> +> msync is designed to ensure exactly the opposite constraint: it can +> guarantee that no changes remain unwritten after time T, but it can't +> guarantee that changes aren't written before time T. + +Okay, so instead of looking for constraints from the OS on the data file, +use the constraints on the WAL file. It would work at the cost of a buffer +copy? Er, maybe two: + +mmap the data file and WAL separately. +Copy the data file page to the WAL mmap area. +Modify the page. +msync() the WAL. +Copy the page to the data file mmap area. +msync() or not the data file. + +(This is half baked, just thought I'd see if it stirred further thought). + +As another approach, how expensive is re-MMAPing portions of the files +compared to the copies. + +-Brad + +> +> regards, tom lane +> +> +> +> ---------------------------(end of broadcast)--------------------------- +> TIP 3: if posting/reading through Usenet, please send an appropriate +> subscribe-nomail command to majordomo@postgresql.org so that your +> message can get through to the mailing list cleanly +> + + + +---------------------------(end of broadcast)--------------------------- +TIP 4: Don't 'kill -9' the postmaster + + + +From cjs@cynic.net Wed Jun 26 00:13:45 2002 +Return-path: +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5Q4Dig27201 + for ; Wed, 26 Jun 2002 00:13:45 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id B95E5F820; Wed, 26 Jun 2002 04:13:45 +0000 (UTC) +Date: Wed, 26 Jun 2002 13:13:42 +0900 (JST) +From: Curt Sampson +To: Tom Lane +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <7498.1025015389@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Status: OR + +On Tue, 25 Jun 2002, Tom Lane wrote: + +> Curt Sampson writes: +> +> > I don't understand why there would be any loss of visibility of changes. +> > If two backends mmap the same block of a file, and it's shared, that's +> > the same block of physical memory that they're accessing. +> +> Is it? You have a mighty narrow conception of the range of +> implementations that's possible for mmap. + +It's certainly possible to implement something that you call mmap +that is not. But if you are using the posix-defined MAP_SHARED flag, +the behaviour above is what you see. It might be implemented slightly +differently internally, but that's no concern of postgres. And I find +it pretty unlikely that it would be implemented otherwise without good +reason. + +Note that your proposal of using mmap to replace sysv shared memory +relies on the behaviour I've described too. As well, if you're replacing +sysv shared memory with an mmap'd file, you may end up doing excessive +disk I/O on systems without the MAP_NOSYNC option. (Without this option, +the update thread/daemon may ensure that every buffer is flushed to the +backing store on disk every 30 seconds or so. You might be able to get +around this by using a small file-backed area for things that need to +persist after a crash, and a larger anonymous area for things that don't +need to persist after a crash.) + +> But the main problem is that mmap doesn't let us control when changes to +> the memory buffer will get reflected back to disk --- AFAICT, the OS is +> free to do the write-back at any instant after you dirty the page, and +> that completely breaks the WAL algorithm. (WAL = write AHEAD log; +> the log entry describing a change must hit disk before the data page +> change itself does.) + +Hm. Well ,we could try not to write the data to the page until +after we receive notification that our WAL data is committed to +stable storage. However, new the data has to be availble to all of +the backends at the exact time that the commit happens. Perhaps a +shared list of pending writes? + +Another option would be to just let it write, but on startup, scan +all of the data blocks in the database for tuples that have a +transaction ID later than the last one we updated to, and remove +them. That could pretty darn expensive on a large database, though. + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + +From tgl@sss.pgh.pa.us Wed Jun 26 09:22:05 2002 +Return-path: +Received: from sss.pgh.pa.us (root@[192.204.191.242]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5QDM3g26028 + for ; Wed, 26 Jun 2002 09:22:04 -0400 (EDT) +Received: from sss2.sss.pgh.pa.us (tgl@localhost [127.0.0.1]) + by sss.pgh.pa.us (8.11.4/8.11.4) with ESMTP id g5QDLxv01699; + Wed, 26 Jun 2002 09:21:59 -0400 (EDT) +To: Curt Sampson +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: +References: +Comments: In-reply-to Curt Sampson + message dated "Wed, 26 Jun 2002 13:13:42 +0900" +Date: Wed, 26 Jun 2002 09:21:59 -0400 +Message-ID: <1696.1025097719@sss.pgh.pa.us> +From: Tom Lane +Status: ORr + +Curt Sampson writes: +> Note that your proposal of using mmap to replace sysv shared memory +> relies on the behaviour I've described too. + +True, but I was not envisioning mapping an actual file --- at least +on HPUX, the only way to generate an arbitrary-sized shared memory +region is to use MAP_ANONYMOUS and not have the mmap'd area connected +to any file at all. It's not farfetched to think that this aspect +of mmap might work differently from mapping pieces of actual files. + +In practice of course we'd have to restrict use of any such +implementation to platforms where mmap behaves reasonably ... according +to our definition of "reasonably". + + regards, tom lane + +From pgsql-hackers-owner+M24252@postgresql.org Wed Jun 26 16:14:36 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5QKEag03467 + for ; Wed, 26 Jun 2002 16:14:36 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id B10E9476B4D; Wed, 26 Jun 2002 15:16:32 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Wed Jun 26 15:16:32 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 6635E476DC0; Wed, 26 Jun 2002 14:31:10 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id 13F884765BD + for ; Wed, 26 Jun 2002 14:22:36 -0400 (EDT) +Mailbox-Line: From pgman@candle.pha.pa.us Wed Jun 26 14:22:36 2002 +Received: from candle.pha.pa.us (216-55-132-35.dsl.san-diego.abac.net [216.55.132.35]) + by postgresql.org (Postfix) with ESMTP id 3F02D476EB3 + for ; Wed, 26 Jun 2002 13:11:37 -0400 (EDT) +Received: (from pgman@localhost) + by candle.pha.pa.us (8.11.6/8.10.1) id g5QHBJM15565; + Wed, 26 Jun 2002 13:11:19 -0400 (EDT) +From: Bruce Momjian +Message-ID: <200206261711.g5QHBJM15565@candle.pha.pa.us> +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <1696.1025097719@sss.pgh.pa.us> +To: Tom Lane +Date: Wed, 26 Jun 2002 13:11:19 -0400 (EDT) +cc: Curt Sampson , "J. R. Nield" , + PostgreSQL Hacker +X-Mailer: ELM [version 2.4ME+ PL97 (25)] +MIME-Version: 1.0 +Content-Transfer-Encoding: 7bit +Content-Type: text/plain; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-3.4 required=5.0 + tests=IN_REP_TO + version=2.30 +Status: OR + +Tom Lane wrote: +> Curt Sampson writes: +> > Note that your proposal of using mmap to replace sysv shared memory +> > relies on the behaviour I've described too. +> +> True, but I was not envisioning mapping an actual file --- at least +> on HPUX, the only way to generate an arbitrary-sized shared memory +> region is to use MAP_ANONYMOUS and not have the mmap'd area connected +> to any file at all. It's not farfetched to think that this aspect +> of mmap might work differently from mapping pieces of actual files. +> +> In practice of course we'd have to restrict use of any such +> implementation to platforms where mmap behaves reasonably ... according +> to our definition of "reasonably". + +Yes, I am told mapping /dev/zero is the same as the anon map. + +-- + Bruce Momjian | http://candle.pha.pa.us + pgman@candle.pha.pa.us | (610) 853-3000 + + If your life is a hard drive, | 830 Blythe Avenue + + Christ can be your backup. | Drexel Hill, Pennsylvania 19026 + + + +---------------------------(end of broadcast)--------------------------- +TIP 6: Have you searched our list archives? + +http://archives.postgresql.org + + + +From pgsql-hackers-owner+M24292@postgresql.org Wed Jun 26 23:39:10 2002 +Return-path: +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by candle.pha.pa.us (8.11.6/8.10.1) with ESMTP id g5R3d9g02161 + for ; Wed, 26 Jun 2002 23:39:09 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP + id 88BF4476287; Wed, 26 Jun 2002 23:38:56 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Wed Jun 26 23:38:56 2002 +Received: from postgresql.org (postgresql.org [64.49.215.8]) + by postgresql.org (Postfix) with SMTP + id 3C069476954; Wed, 26 Jun 2002 23:38:17 -0400 (EDT) +Received: from localhost.localdomain (postgresql.org [64.49.215.8]) + by localhost (Postfix) with ESMTP id A0397476941 + for ; Wed, 26 Jun 2002 23:38:12 -0400 (EDT) +Mailbox-Line: From cjs@cynic.net Wed Jun 26 23:38:12 2002 +Received: from academic.cynic.net (academic.cynic.net [63.144.177.3]) + by postgresql.org (Postfix) with ESMTP id 2AA24475C40 + for ; Wed, 26 Jun 2002 23:37:18 -0400 (EDT) +Received: from angelic-academic.cvpn.cynic.net (angelic-academic.cvpn.cynic.net [198.73.220.224]) + by academic.cynic.net (Postfix) with ESMTP + id 179D5F822; Thu, 27 Jun 2002 03:37:20 +0000 (UTC) +Date: Thu, 27 Jun 2002 12:37:18 +0900 (JST) +From: Curt Sampson +To: Tom Lane +cc: "J. R. Nield" , Bruce Momjian , + PostgreSQL Hacker +Subject: Re: [HACKERS] Buffer Management +In-Reply-To: <1696.1025097719@sss.pgh.pa.us> +Message-ID: +MIME-Version: 1.0 +Content-Type: TEXT/PLAIN; charset=US-ASCII +Precedence: bulk +Sender: pgsql-hackers-owner@postgresql.org +X-Spam-Status: No, hits=-5.3 required=5.0 + tests=IN_REP_TO,X_NOT_PRESENT + version=2.30 +Status: OR + +On Wed, 26 Jun 2002, Tom Lane wrote: + +> Curt Sampson writes: +> > Note that your proposal of using mmap to replace sysv shared memory +> > relies on the behaviour I've described too. +> +> True, but I was not envisioning mapping an actual file --- at least +> on HPUX, the only way to generate an arbitrary-sized shared memory +> region is to use MAP_ANONYMOUS and not have the mmap'd area connected +> to any file at all. It's not farfetched to think that this aspect +> of mmap might work differently from mapping pieces of actual files. + +I find it somewhat farfetched, for a couple of reasons: + + 1. Memory mapped with the MAP_SHARED flag is shared memory, + anonymous or not. POSIX is pretty explicit about how this works, + and the "standard" for mmap that predates POSIX is the same. + Anonymous memory does not behave differently. + + You could just as well say that some systems might exist such + that one process can write() a block to a file, and then another + might read() it afterwards but not see the changes. Postgres + should not try to deal with hypothetical systems that are so + completely broken. + + 2. Mmap is implemented as part of a unified buffer cache system + on all of today's operating systems that I know of. The memory + is backed by swap space when anonymous, and by a specified file + when not anonymous; but the way these two are handled is + *exactly* the same internally. + + Even on older systems without unified buffer cache, the behaviour + is the same between anonymous and file-backed mmap'd memory. + And there would be no point in making it otherwise. Mmap is + designed to let you share memory; why make a broken implementation + under certain circumstances? + +> In practice of course we'd have to restrict use of any such +> implementation to platforms where mmap behaves reasonably ... according +> to our definition of "reasonably". + +Of course. As we do already with regular I/O. + +cjs +-- +Curt Sampson +81 90 7737 2974 http://www.netbsd.org + Don't you know, in this new Dark Age, we're all light. --XTC + + + + +---------------------------(end of broadcast)--------------------------- +TIP 3: if posting/reading through Usenet, please send an appropriate +subscribe-nomail command to majordomo@postgresql.org so that your +message can get through to the mailing list cleanly + + +