From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Mon, 22 Mar 1999 14:58:09 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: initial success Bob-- I have the -j portion of your cons.multi changes ported to the current code base, and my initial test shows that things seem to work as expected. All cons-test scripts pass, except for one, which I discuss below. I wanted to get some feedback/input from you on a few issues before asking you to take a look at it. To wit: -- In past mail, you mentioned that there seemed to be one small bug, but that you could no longer remember what it was. Was the bug, perhaps, a failure to stop upon error when using neither -j nor -k? As currently implemented, when execution is single-threaded (no -j specified), *all* of the dependencies get started (and finish) before we check any status. This means that if one compilation fails, the rest get started and execute anyway, as if -k were permanently enabled. I think we can agree that this is not what we want, if only because people will find it very counter-intuitive. Unfortunately, it means passing back a return code from the fstart method that kicks off the commands. This isn't as elegant as how you have it coded currently, because an error condition could then come from *either* fstart or fwait, depending on whether the build is parallel or not, but I think it's the only way to preserve the expected behavior. -- The current -j test I'm using is extremely simple. It has three Command lists that sleep for X seconds and echo some output to a common file. As I indicated above, this works, and produces the expected interleaved sequences of output with and without -j. I'd be glad, however, if you have any suggestions for other test scenarios, especially those involving interaction with other features. (Derived-file caching, perhaps? Maybe this is the right motivating factor to get me to write some cons-test scripts for that...) -- All of the -d, -p, -pa, -pw, and -r flags now work. Their code hadn't been brought up to match the way all the dependency targets are now enumerated before anything else happens. -- I'm completely ignoring the RMX stuff for now. Based on some of the past mailing list discussion, my feeling is that people want the -j functionality sooner, and that if we want to re-open the idea of distributed execution, we can do that with some other Perl rexec package instead of re-inventing the wheel for Cons. -- I punted and left the WIN32 side single-threaded, as I don't know NT. I do have access to an NT system, though, so if you know the right magic for doing the NT equivalent of fork()/exec(), let me know. Let me know if anything above seems out of kilter. Meantime, I'm going to continue polishing and try to have stuff ready for you to look at later this week. --SK From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Tue, 23 Mar 1999 13:57:49 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: initial success Hi Steve, Wow, that's excellent news. As with the last change, one concern I have with all of this is making sure that the -j stuff doesn't overly complexify or slow down cons for the normal case. >I have the -j portion of your cons.multi changes ported to the >current code base, and my initial test shows that things seem to >work as expected. All cons-test scripts pass, except for one, >which I discuss below. > >I wanted to get some feedback/input from you on a few issues before >asking you to take a look at it. To wit: > > -- In past mail, you mentioned that there seemed to be one > small bug, but that you could no longer remember what it > was. Was the bug, perhaps, a failure to stop upon error > when using neither -j nor -k? > I don't really remember, but that might just be it. In order to get this to work, whoever ported it would have had to actually understand the code, which, I gather, you have done. > As currently implemented, when execution is single-threaded > (no -j specified), *all* of the dependencies get started > (and finish) before we check any status. This means that > if one compilation fails, the rest get started and execute > anyway, as if -k were permanently enabled. I think we can > agree that this is not what we want, if only because people > will find it very counter-intuitive. > > Unfortunately, it means passing back a return code from > the fstart method that kicks off the commands. This isn't > as elegant as how you have it coded currently, because an > error condition could then come from *either* fstart or > fwait, depending on whether the build is parallel or not, > but I think it's the only way to preserve the expected > behavior. > > -- The current -j test I'm using is extremely simple. It > has three Command lists that sleep for X seconds and echo > some output to a common file. > > As I indicated above, this works, and produces the expected > interleaved sequences of output with and without -j. I'd > be glad, however, if you have any suggestions for other > test scenarios, especially those involving interaction with > other features. (Derived-file caching, perhaps? Maybe > this is the right motivating factor to get me to write some > cons-test scripts for that...) > I don't have any suggestions. I'm hoping these things are separable enough that there shouldn't be any interaction... > -- All of the -d, -p, -pa, -pw, and -r flags now work. Their > code hadn't been brought up to match the way all the > dependency targets are now enumerated before anything else > happens. > > -- I'm completely ignoring the RMX stuff for now. Based on > some of the past mailing list discussion, my feeling is > that people want the -j functionality sooner, and that if > we want to re-open the idea of distributed execution, we > can do that with some other Perl rexec package instead of > re-inventing the wheel for Cons. > Sounds good. At the time, I hadn't found one that I thought had the right performance characteristics. But I absolutely agree that -j is more important than rmx. BTW, -j 0 is effectively the same as not using -j. I believe my original code made them different. I don't know if this justifys the extra complexity of having two different models. Perhaps it would be justified if -j slows things down. > -- I punted and left the WIN32 side single-threaded, as I > don't know NT. I do have access to an NT system, though, > so if you know the right magic for doing the NT equivalent > of fork()/exec(), let me know. > That's what I would have done, too. >Let me know if anything above seems out of kilter. Meantime, I'm >going to continue polishing and try to have stuff ready for you >to look at later this week. > > --SK Thanks very much, Steve, for your efforts. I really appreciate it. I'm currently totally burned out, for various reasons. Bob From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Wed, 24 Mar 1999 09:29:11 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: Re: initial success Bob-- Thanks for the reply; good to hear from you, as always. > Wow, that's excellent news. As with the last change, one concern I > have with all of this is making sure that the -j stuff doesn't overly > complexify or slow down cons for the normal case. Understood. I don't think it slows down the normal case--your code is pretty tight and the interfaces are well thought out--but I'll some before-and-after benchmarks to make sure. > > -- In past mail, you mentioned that there seemed to be one > > small bug, but that you could no longer remember what it > > was. Was the bug, perhaps, a failure to stop upon error > > when using neither -j nor -k? > > I don't really remember, but that might just be it. In order to get > this to work, whoever ported it would have had to actually understand > the code, which, I gather, you have done. I think so. I'm going to try to add one additional enhancement: Right now, if any given construction rule needs to handle stdin (requiring the job::serial class for execution), it essentially means that -j is illegal to use on that Construct file. I don't think it would be that hard to add an override to the construction environment that would flag the construction of the derived file(s) as requiring job::serial, or its equivalent. Upon building such a file, we'd wait for the outstanding job::async processes to complete, execute the job::serial construction that requires stdin, and then carry on again with job::async. I think this would be a good bit of flexibility that would allow people to build Construct files that take stdin input and still allow the build to be parallelized as much as possible. > > -- The current -j test I'm using is extremely simple. It > > has three Command lists that sleep for X seconds and echo > > some output to a common file. > > > > As I indicated above, this works, and produces the expected > > interleaved sequences of output with and without -j. I'd > > be glad, however, if you have any suggestions for other > > test scenarios, especially those involving interaction with > > other features. (Derived-file caching, perhaps? Maybe > > this is the right motivating factor to get me to write some > > cons-test scripts for that...) > > I don't have any suggestions. I'm hoping these things are separable enough > that there shouldn't be any interaction... It looks like there's no run-time interaction, but the structural change did mean re-coding the non-executable cases of -d, -p, etc. Fortunately, those are already covered by existing cons-test scripts. > > -- I'm completely ignoring the RMX stuff for now. Based on > > some of the past mailing list discussion, my feeling is > > that people want the -j functionality sooner, and that if > > we want to re-open the idea of distributed execution, we > > can do that with some other Perl rexec package instead of > > re-inventing the wheel for Cons. > > Sounds good. At the time, I hadn't found one that I thought had the > right performance characteristics. But I absolutely agree that -j is > more important than rmx. BTW, -j 0 is effectively the same as not > using -j. I believe my original code made them different. I don't know > if this justifys the extra complexity of having two different > models. Perhaps it would be justified if -j slows things down. I wondered about this when I was putting in some command-line parameter checking. Originally, I thought I might as well only make the build parallel if they specified -j > 1. But then I thought there might be debugging value, if nothing else, in allowing -j 1, which would still build things serially but exercise the job::async class. I currently have it coded so that -j <= 0 is illegal, but I like the idea of -j 0 using the normal job::serial class. That might make coding scripts to call Cons a little cleaner. > >Let me know if anything above seems out of kilter. Meantime, I'm > >going to continue polishing and try to have stuff ready for you > >to look at later this week. > > Thanks very much, Steve, for your efforts. I really appreciate it. I'm > currently totally burned out, for various reasons. Sorry to hear it. I had my own burn-out phase after we had an eighty-hour-a-week death march two summers ago trying to win a big contract. We didn't get it, and it took many of us a good year or more to really get our hearts back into things. If my soliciting your input on these issues just adds to the burden, and it would be better if I just worked this stuff out on my own, let me know. As it stands, I'm planning to ask on the mailing list for a few of beta-testers for the -j change, just to try to get some more test coverage. (I also want to see if any WinNT perl gurus out there know of a way to parallelize the build under NT.) Thanks, as always, for the feedback. Take care. --SK From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 24 Mar 1999 17:08:42 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: initial success You said: >I currently have it coded so that -j <= 0 is illegal, but I like >the idea of -j 0 using the normal job::serial class. That might >make coding scripts to call Cons a little cleaner. Actually, what I really meant was that the job::serial class might not be necessary, and that the default could be -j 1. But you mention issues with reading from STDIN, so I'll defer to whatever you decide. [DELETED] Bob From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Wed, 24 Mar 1999 18:34:21 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: Re: initial success On Wed, 24 Mar 1999, Bob Sidebotham wrote: > >I currently have it coded so that -j <= 0 is illegal, but I like > >the idea of -j 0 using the normal job::serial class. That might > >make coding scripts to call Cons a little cleaner. > > Actually, what I really meant was that the job::serial class might not > be necessary, and that the default could be -j 1. But you mention > issues with reading from STDIN, so I'll defer to whatever you decide. I see. I did think of trying to do this in one class, but initially discarded the idea because: 1) you had coded it as two classes initially (and I'm a sucker for deferring to authority!); and 2) the job::async class forks an additional process over job::serial, to control the sequential execution of multiple commands necessary to build a target. That's extra process overhead in the single- threaded cases. But you just spurred me to take a look at it again, and I now think we can do this with a single job class. For efficiency, after we've fork()'ed, we should check how many commands there are to execute to build the target in question. If there's only one command, we go ahead and exec() that process directly. If there's more than one command, we let the child process control the subsequent sequential fork()+exec() for each command in the list. I think this will be orthogonal to the STDIN issue, which I'm planning to address by letting appropriately-flagged jobs single-thread things by waiting for all the outstanding jobs to finish, executing sequentially (able to read from STDIN), and then carrying on. I'll code it up and see if it looks tolerable. This might also help with process clean-up on exit, which isn't really there right now. [DELETED] --SK From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Thu, 25 Mar 1999 01:33:50 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: initial success You said: >On Wed, 24 Mar 1999, Bob Sidebotham wrote: >> >I currently have it coded so that -j <= 0 is illegal, but I like >> >the idea of -j 0 using the normal job::serial class. That might >> >make coding scripts to call Cons a little cleaner. >> >> Actually, what I really meant was that the job::serial class might not >> be necessary, and that the default could be -j 1. But you mention >> issues with reading from STDIN, so I'll defer to whatever you decide. > >I see. I did think of trying to do this in one class, but initially >discarded the idea because: 1) you had coded it as two classes >initially (and I'm a sucker for deferring to authority!); and 2) >the job::async class forks an additional process over job::serial, >to control the sequential execution of multiple commands necessary >to build a target. That's extra process overhead in the single- >threaded cases. > >But you just spurred me to take a look at it again, and I now think >we can do this with a single job class. For efficiency, after >we've fork()'ed, we should check how many commands there are to >execute to build the target in question. If there's only one >command, we go ahead and exec() that process directly. If there's >more than one command, we let the child process control the subsequent >sequential fork()+exec() for each command in the list. Sounds reasonable. >I think this will be orthogonal to the STDIN issue, which I'm >planning to address by letting appropriately-flagged jobs single-thread >things by waiting for all the outstanding jobs to finish, executing >sequentially (able to read from STDIN), and then carrying on. I'll >code it up and see if it looks tolerable. > >This might also help with process clean-up on exit, which isn't >really there right now. [DELETED] Thanks, Bob From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Thu, 25 Mar 1999 17:30:30 -0500 From: Bob Sidebotham To: Steven Knight Subject: -j? Hi Steve, If you have a version of Cons with -j in it that works at all, I'd be happy to try it out. I'm doing a rather long compilation right now, that could benefit, I think... Bob From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Thu, 25 Mar 1999 18:04:13 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: Re: -j? > If you have a version of Cons with -j in it that works at all, I'd be > happy to try it out. I'm doing a rather long compilation right now, > that could benefit, I think... You're sure welcome to give it a shot... http://www.baldmt.com/cons-test/con-p0.pl Where -p0 is for "parallel version 0". This version is failing one of my tests (when -k is on, it's not exiting with the proper status to indicate that errors actually ocurred), but otherwise seems to work fine. This has the single-class job control we talked about (which seemed to clean up some things nicely), although I split out the WIN32 stuff into a separate job::win32 sub-class so that we don't take an if-test hit on every execution. I haven't done any benchmarking yet, so I don't know if I slowed down the normal case. Let me know what you find, or if you have other feedback. --SK From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Fri, 26 Mar 1999 00:31:43 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: cons-p1.pl Bob-- Ignore the previous version I sent. I think I grabbed a snapshot at the wrong time and -j doesn't work worth beans on -p0. There's now a new version: http://www.baldmt.com/cons-test/cons-p1.pl This still has the following problems: -- One of my -k tests (cons-test t0030a.sh) still fails with an improper exit code of 0 when there is an error in the build. I can *not* for the life of me figure out how this is is happening. Stepping through the debugger and adding debug print statements both show that $errors is non-zero, which means that the exit line: exit 0 + ($errors != 0); should turn it into an exit status of 1. Doesn't happen on my system. -- I'm noticing that I'm now getting messages about sig::hash::END not being defined, despite the fact that it's clearly there: Undefined subroutine &sig::hash::END called at /home/knight/cons.2.2.C107/cons line 292. I don't know when this was introduced. Again, I'm mystified. I'm beginning to wonder if my local perl build is bogus (it's 5.005_54). I also haven't gotten to the per-environment serial build feature. Let me know how it turns out. --SK From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Fri, 26 Mar 1999 13:22:40 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: cons-p1.pl I've had problems like that in the past. Never did figure out what they were due to, but usually managed to work around them somehow. There's also been a problem for a while with ^C not working correctly, that reports something like "sig::hash::END not found". I also vaguely remember having the exit status problem before, but I don't when I had that, or what the resolution was. It's possible that was the bug I referred to earlier, I don't know... I'll grab your latest version at some point and play with it. Bob You said: >Bob-- > >Ignore the previous version I sent. I think I grabbed a snapshot >at the wrong time and -j doesn't work worth beans on -p0. There's >now a new version: > > http://www.baldmt.com/cons-test/cons-p1.pl > >This still has the following problems: > > -- One of my -k tests (cons-test t0030a.sh) still fails with > an improper exit code of 0 when there is an error in the > build. I can *not* for the life of me figure out how this > is is happening. Stepping through the debugger and adding > debug print statements both show that $errors is non-zero, > which means that the exit line: > > exit 0 + ($errors != 0); > > should turn it into an exit status of 1. Doesn't happen > on my system. > > -- I'm noticing that I'm now getting messages about sig::hash::END > not being defined, despite the fact that it's clearly there: > > Undefined subroutine &sig::hash::END called at /home/knight/con >s.2.2.C107/cons line 292. > > I don't know when this was introduced. Again, I'm mystified. > I'm beginning to wonder if my local perl build is bogus > (it's 5.005_54). > >I also haven't gotten to the per-environment serial build feature. > >Let me know how it turns out. > > --SK From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Fri, 26 Mar 1999 13:26:58 -0500 From: Bob Sidebotham To: Steven Knight Subject: Re: cons-p1.pl I get: Forbidden You don't have permission to access /cons-test/cons-p1.pl on this server. You said: >Bob-- > >Ignore the previous version I sent. I think I grabbed a snapshot >at the wrong time and -j doesn't work worth beans on -p0. There's >now a new version: > > http://www.baldmt.com/cons-test/cons-p1.pl > From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Fri, 26 Mar 1999 15:00:02 -0500 (EST) From: Steven Knight To: Bob Sidebotham Subject: Re: cons-p1.pl > Forbidden > You don't have permission to access /cons-test/cons-p1.pl on this server. Hmm. I just tested and got the same thing. I think the .pl extension confuses the server into thinking it should execute something. I renamed it: http://www.baldmt.com/cons-test/cons-p1 And just got it successfully here at work. Let me know if you still run into a problem. --SK From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Mon, 29 Mar 1999 23:10:12 -0500 (EST) From: Steven Knight To: Rajesh Vaidheeswarran Cc: Bob Sidebotham Subject: prototype -j code Rajesh-- After the last round of discussion about adding -j support to Cons, I went ahead and asked Bob for his prototype -j code from a few years back, and I've now ported it to the 1.5 base. There's one additional feature I'd like to add, but I want to get some feedback on what I currently have before going any further. The new -j code works for my test cases. Because the code relies on fork()/exec(), which don't exist for Perl on Win32 systems, -j is *not* supported on NT. I'd welcome advice from Perl/Win32 gurus on how (or whether) -j support can be extended to those systems. There are two problems I can't eradicate, one which probably already existed in certain circumstances (complaints about sig::hash::END not being defined), and one of which may be new but may not (improper exit value of 0 even when both the debugger and print show that $errors is non-zero). Neither of these problems seems to affect how Cons actually works--that is, everything seems to get built properly. I was planning to go to the mailing list to seek beta testers who are willing to: -- Test out the new code, with and without -j; -- Report results to me; -- Especially let me know if the -j code slows down single- threaded builds (without -j); -- Give me feedback on the draft cons.pod text covering -j and parallel builds. Before doing so, though, I wanted to sync up with you and get your input. The current version of the parallelizing version of cons is available to you at: http://www.baldmt.com/cons-test/cons-p1 I'd appreciate it if you'd be able to give it at least a quick sanity check at Fore, and give me any other feedback you may have. I've appended a diff with my draft cons.pod text, below. Oh, yeah, as you might gather from the text below, the feature I'm planning to add is the ability to let people specify in the environment that a given target should be built in the foreground (single-threaded). The intent is to allow people who have constructions that need to read something from STDIN when building a target to still use -j to build the rest of their tree in parallel. Barring any indication to the contrary from you, I'm planning to go out to the list for beta testers sometime later this week. Thanks! --SK *** /usr/local/src/cons-1.5/cons.pod Tue Nov 17 14:22:53 1998 --- cons.pod Mon Mar 29 15:41:24 1999 *************** *** 66,71 **** --- 66,93 ---- into the makefiles. + =item B + + One generally-accepted technique for speeding up the software development + process is to build software components in parallel--that is, start + multiple compilations simultaneously. Although the additional context + switches usually increase the CPU time used, parallelizing a build greatly + decreases the amount of "wall clock" time that the software developer + spends waiting for the build to complete. Most modern versions of make + support a -j option which is used to specify the number of tasks that + make will execute in parallel. Unfortunately, the make -j option is of + limited usefulness for builds in large directory trees, where it would be + most helpful. The recursive use of make means that even small -j values + can threaten to swamp a system, as each recursive make invocation spawns N + separate processes for subdirectories which, in turn, execute N separate + make processes for their subdirectories... Worse still, a make -j value + that works for the directory tree today may still swamp a system tomorrow + when someone adds directories to the tree structure. Lacking any way + to coordinate the total number of processes used by the entire build, + make's parall build support doesn't adapt well to changes either in the + build process itself or in the availability of system resources. + + =head1 B A few of the difficulties with make have been cited above. In this and subsequent sections, we shall introduce Cons and show how these issues are addressed. *************** *** 916,921 **** --- 938,968 ---- Cons will search for derived files in the appropriate build subdirectories under the repository tree. + + + =head1 B + + Like make, Cons provides a -j flag that takes an argument to specify + how many targets can be built in parallel. For example: + + % cons -j 10 . + + Will build the entire directory tree, keeping (up to) ten targets building + simultaneously in the background. + + The big difference between parallel builds with Cons and with make is + that Cons coordinates, across the entire directory tree, the number of + simultaneous targets being built in the background. When you use -j + to specify that 10 targets can be built in parallel, Cons will keep 10 + targets building in the background, regardless of the subdirectory in + which the target resides. This allows for a much greater control over + parallel builds and their impact on a system's load than is possible + with recursive use of make. + + + =item B + + T.B.S. =head1 B From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 10:44:20 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code Steven, Thanks. I just browsed through the code and didn't anything questionable. 1) I did some experiments ... here are some interesting results. ForeWorks tree: (108 sources, 147 targets) ================================================== Uniprocessor machine 2 Processor Machine --------------------- ------------------- time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd 43.48u 8.22s 1:05.05 79.4% 24.66u 4.80s 0:37.80 77.9% time cons-p1 -cd time cons-p1 -cd 42.10u 8.34s 1:14.69 67.5% 23.61u 4.62s 0:52.44 53.8% time cons -cd time cons -cd 41.80u 8.29s 1:16.51 65.4% 23.60u 4.25s 1:11.98 38.6% ForeThought subtree (92 sources, 35 targets) ================================================== Uniprocessor machine 2 Processor Machine --------------------- ------------------- time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd 75.48u 10.03s 2:05.99 67.8% 42.21u 5.77s 1:30.89 52.7% time cons-p1 -cd time cons-p1 -cd 73.23u 9.61s 2:03.21 67.2% 41.05u 4.78s 1:17.36 59.2% time cons -cd time cons -cd 72.69u 9.51s 1:58.58 69.3% 41.50u 4.95s 1:26.23 53.8% Note that this subtree is part of a much larger (~3000 files) tree, and so the initial Conscript scan time accounts for some of the build time. I will try running it over the entire tree, and let you know the results. Both machines are Sparc Ultras. Unip has 128 Meg, MP has 256 Meg. 2) The make syntax of -jn is broken. (IMHO, if we support `-j' as an option, then we should support it atleast with the same usability.) cons-p1 -j8 cons-p1: unrecognized option "-j8". Use -x for a usage message. I have also made a couple of changes.. 1. whitespace cleanup, 2. correct web site. This should be available for you at http://www.dsmit.com/cons/cons-p1 The documentation looks fine. I'll make a patch, and a new beta release, and then let you know so that you can send out a mail seeking beta testers. Thanks for the work! rv -- using MH template repl.format -- In a previous message, Steven Knight writes: > Rajesh-- > > After the last round of discussion about adding -j support to Cons, > I went ahead and asked Bob for his prototype -j code from a few > years back, and I've now ported it to the 1.5 base. There's one > additional feature I'd like to add, but I want to get some feedback > on what I currently have before going any further. > > The new -j code works for my test cases. Because the code relies > on fork()/exec(), which don't exist for Perl on Win32 systems, -j > is *not* supported on NT. I'd welcome advice from Perl/Win32 gurus > on how (or whether) -j support can be extended to those systems. > > There are two problems I can't eradicate, one which probably already > existed in certain circumstances (complaints about sig::hash::END > not being defined), and one of which may be new but may not (improper > exit value of 0 even when both the debugger and print show that > $errors is non-zero). Neither of these problems seems to affect > how Cons actually works--that is, everything seems to get built > properly. > > I was planning to go to the mailing list to seek beta testers > who are willing to: > > -- Test out the new code, with and without -j; > > -- Report results to me; > > -- Especially let me know if the -j code slows down single- > threaded builds (without -j); > > -- Give me feedback on the draft cons.pod text covering > -j and parallel builds. > > Before doing so, though, I wanted to sync up with you and get your > input. The current version of the parallelizing version of cons > is available to you at: > > http://www.baldmt.com/cons-test/cons-p1 > > I'd appreciate it if you'd be able to give it at least a quick > sanity check at Fore, and give me any other feedback you may have. > I've appended a diff with my draft cons.pod text, below. > > Oh, yeah, as you might gather from the text below, the feature I'm > planning to add is the ability to let people specify in the > environment that a given target should be built in the foreground > (single-threaded). The intent is to allow people who have > constructions that need to read something from STDIN when building > a target to still use -j to build the rest of their tree in parallel. > > Barring any indication to the contrary from you, I'm planning to > go out to the list for beta testers sometime later this week. > > Thanks! > > --SK > > > > *** /usr/local/src/cons-1.5/cons.pod Tue Nov 17 14:22:53 1998 > --- cons.pod Mon Mar 29 15:41:24 1999 > *************** > *** 66,71 **** > --- 66,93 ---- > into the makefiles. > > > + =item B > + > + One generally-accepted technique for speeding up the software development > + process is to build software components in parallel--that is, start > + multiple compilations simultaneously. Although the additional context > + switches usually increase the CPU time used, parallelizing a build greatly > + decreases the amount of "wall clock" time that the software developer > + spends waiting for the build to complete. Most modern versions of make > + support a -j option which is used to specify the number of tasks that > + make will execute in parallel. Unfortunately, the make -j option is of > + limited usefulness for builds in large directory trees, where it would be > + most helpful. The recursive use of make means that even small -j values > + can threaten to swamp a system, as each recursive make invocation spawns N > + separate processes for subdirectories which, in turn, execute N separate > + make processes for their subdirectories... Worse still, a make -j value > + that works for the directory tree today may still swamp a system tomorrow > + when someone adds directories to the tree structure. Lacking any way > + to coordinate the total number of processes used by the entire build, > + make's parall build support doesn't adapt well to changes either in the > + build process itself or in the availability of system resources. > + > + > =head1 B > > A few of the difficulties with make have been cited above. In this and subs > equent sections, we shall introduce Cons and show how these issues are addres > sed. > *************** > *** 916,921 **** > --- 938,968 ---- > Cons will search for derived files > in the appropriate build subdirectories > under the repository tree. > + > + > + =head1 B > + > + Like make, Cons provides a -j flag that takes an argument to specify > + how many targets can be built in parallel. For example: > + > + % cons -j 10 . > + > + Will build the entire directory tree, keeping (up to) ten targets building > + simultaneously in the background. > + > + The big difference between parallel builds with Cons and with make is > + that Cons coordinates, across the entire directory tree, the number of > + simultaneous targets being built in the background. When you use -j > + to specify that 10 targets can be built in parallel, Cons will keep 10 > + targets building in the background, regardless of the subdirectory in > + which the target resides. This allows for a much greater control over > + parallel builds and their impact on a system's load than is possible > + with recursive use of make. > + > + > + =item B > + > + T.B.S. > > > =head1 B > From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 12:02:09 -0500 (EST) From: Steven Knight To: Rajesh Vaidheeswarran Cc: Bob Sidebotham Subject: Re: prototype -j code Hi Rajesh-- > Thanks. I just browsed through the code and didn't anything questionable. Since sending the mail, I've been thinking a bit on the other problems I mentioned, especially the sig::hash::END problem. I think the signal handling needs to be a little more sophisticated now that we're doing more with IPC. One big question I've not been able to answer: Does perl define whether signal handler registration stays in force in a child spawned by fork(), or is it system- dependent? Does either of you know? > 1) I did some experiments ... here are some interesting results. > > ForeWorks tree: (108 sources, 147 targets) > ================================================== > > Uniprocessor machine 2 Processor Machine > --------------------- ------------------- > time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd > 43.48u 8.22s 1:05.05 79.4% 24.66u 4.80s 0:37.80 77.9% > > time cons-p1 -cd time cons-p1 -cd > 42.10u 8.34s 1:14.69 67.5% 23.61u 4.62s 0:52.44 53.8% > > time cons -cd time cons -cd > 41.80u 8.29s 1:16.51 65.4% 23.60u 4.25s 1:11.98 38.6% > > ForeThought subtree (92 sources, 35 targets) > ================================================== > > Uniprocessor machine 2 Processor Machine > --------------------- ------------------- > time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd > 75.48u 10.03s 2:05.99 67.8% 42.21u 5.77s 1:30.89 52.7% > > time cons-p1 -cd time cons-p1 -cd > 73.23u 9.61s 2:03.21 67.2% 41.05u 4.78s 1:17.36 59.2% > > time cons -cd time cons -cd > 72.69u 9.51s 1:58.58 69.3% 41.50u 4.95s 1:26.23 53.8% Damn. I was hoping that the job class used by -j wouldn't have introduced that much overhead, especially for the derived-file caching. I'm going to create some derived-file-caching tests and pore over the introduced changes to see if I can cut that down. > Note that this subtree is part of a much larger (~3000 files) tree, and so > the initial Conscript scan time accounts for some of the build time. > > I will try running it over the entire tree, and let you know the results. > > Both machines are Sparc Ultras. Unip has 128 Meg, MP has 256 Meg. > > > 2) The make syntax of -jn is broken. (IMHO, if we support `-j' as an option, > then we should support it atleast with the same usability.) > > cons-p1 -j8 > cons-p1: unrecognized option "-j8". Use -x for a usage message. Good point. I agree we should make the syntax of -j as make-like as possible. Of course, the the other options that take arguments (-o, -f) can't concatenate their arguments, either, so I'll generalize the option processing a bit so that -ffile (e.g.) will work as well. > I have also made a couple of changes.. > > 1. whitespace cleanup, > 2. correct web site. > This should be available for you at http://www.dsmit.com/cons/cons-p1 Got it, many thanks. > The documentation looks fine. I'll make a patch, and a new beta release, and > then let you know so that you can send out a mail seeking beta testers. Why don't you hold off for a bit and let me take a crack at the issues above. I'll get a cons-p2 to you, and we can use that for the beta. > Thanks for the work! And thank you for the feedback and help. I'll let you know as soon as I have another version available. --SK From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 13:04:05 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code Steven, I don't know for sure the answer to your question. But I know that perl used to reap the child when a signal was sent in. I'm not sure what is up with the newer version of perl. (i.e., whenever we started seeing this sig::hash::END problem.) rv > Since sending the mail, I've been thinking a bit on the other > problems I mentioned, especially the sig::hash::END problem. I > think the signal handling needs to be a little more sophisticated > now that we're doing more with IPC. One big question I've not been > able to answer: Does perl define whether signal handler registration > stays in force in a child spawned by fork(), or is it system- > dependent? Does either of you know? From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 18:05:20 -0500 From: Bob Sidebotham To: rv@fore.com Cc: Steven Knight , Bob Sidebotham Subject: Re: prototype -j code I wouldn't expect -j 10 (uniprocessor, or not), to be effective. On a uniprocessor, I found previously (with the old version of cons and a very slow uniprocessor) that -j 3 was optimal. This is confirmed by gmake users, who have, in the past, indicated that anything beyond -j 3 tends to have a negative effect. Then again, -j 10 might be a win on a loaded system, since it would cause your jobs to be noticed more often (since they'd represent a higher percentage of the overall load). This would be a win only for you, of course, not for overall system throughput... I notice that you do a -cd every time. Do you also remove the files? The -j will not improve performance at all unless there is something to do. It doesn't distribute cons think time. Bob You said: >Steven, > >Thanks. I just browsed through the code and didn't anything questionable. > >1) I did some experiments ... here are some interesting results. > >ForeWorks tree: (108 sources, 147 targets) >================================================== > >Uniprocessor machine 2 Processor Machine >--------------------- ------------------- >time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd >43.48u 8.22s 1:05.05 79.4% 24.66u 4.80s 0:37.80 77.9% > >time cons-p1 -cd time cons-p1 -cd >42.10u 8.34s 1:14.69 67.5% 23.61u 4.62s 0:52.44 53.8% > >time cons -cd time cons -cd >41.80u 8.29s 1:16.51 65.4% 23.60u 4.25s 1:11.98 38.6% > >ForeThought subtree (92 sources, 35 targets) >================================================== > >Uniprocessor machine 2 Processor Machine >--------------------- ------------------- >time cons-p1 -j 10 -cd time cons-p1 -j 10 -cd >75.48u 10.03s 2:05.99 67.8% 42.21u 5.77s 1:30.89 52.7% > >time cons-p1 -cd time cons-p1 -cd >73.23u 9.61s 2:03.21 67.2% 41.05u 4.78s 1:17.36 59.2% > >time cons -cd time cons -cd >72.69u 9.51s 1:58.58 69.3% 41.50u 4.95s 1:26.23 53.8% > >Note that this subtree is part of a much larger (~3000 files) tree, and so >the initial Conscript scan time accounts for some of the build time. > >I will try running it over the entire tree, and let you know the results. > >Both machines are Sparc Ultras. Unip has 128 Meg, MP has 256 Meg. > > >2) The make syntax of -jn is broken. (IMHO, if we support `-j' as an option, >then we should support it atleast with the same usability.) > >cons-p1 -j8 >cons-p1: unrecognized option "-j8". Use -x for a usage message. > >I have also made a couple of changes.. > >1. whitespace cleanup, >2. correct web site. > >This should be available for you at http://www.dsmit.com/cons/cons-p1 > >The documentation looks fine. I'll make a patch, and a new beta release, and >then let you know so that you can send out a mail seeking beta testers. > >Thanks for the work! > >rv > > -- using MH template repl.format -- >In a previous message, Steven Knight writes: > >> Rajesh-- >> >> After the last round of discussion about adding -j support to Cons, >> I went ahead and asked Bob for his prototype -j code from a few >> years back, and I've now ported it to the 1.5 base. There's one >> additional feature I'd like to add, but I want to get some feedback >> on what I currently have before going any further. >> >> The new -j code works for my test cases. Because the code relies >> on fork()/exec(), which don't exist for Perl on Win32 systems, -j >> is *not* supported on NT. I'd welcome advice from Perl/Win32 gurus >> on how (or whether) -j support can be extended to those systems. >> >> There are two problems I can't eradicate, one which probably already >> existed in certain circumstances (complaints about sig::hash::END >> not being defined), and one of which may be new but may not (improper >> exit value of 0 even when both the debugger and print show that >> $errors is non-zero). Neither of these problems seems to affect >> how Cons actually works--that is, everything seems to get built >> properly. >> >> I was planning to go to the mailing list to seek beta testers >> who are willing to: >> >> -- Test out the new code, with and without -j; >> >> -- Report results to me; >> >> -- Especially let me know if the -j code slows down single- >> threaded builds (without -j); >> >> -- Give me feedback on the draft cons.pod text covering >> -j and parallel builds. >> >> Before doing so, though, I wanted to sync up with you and get your >> input. The current version of the parallelizing version of cons >> is available to you at: >> >> http://www.baldmt.com/cons-test/cons-p1 >> >> I'd appreciate it if you'd be able to give it at least a quick >> sanity check at Fore, and give me any other feedback you may have. >> I've appended a diff with my draft cons.pod text, below. >> >> Oh, yeah, as you might gather from the text below, the feature I'm >> planning to add is the ability to let people specify in the >> environment that a given target should be built in the foreground >> (single-threaded). The intent is to allow people who have >> constructions that need to read something from STDIN when building >> a target to still use -j to build the rest of their tree in parallel. >> >> Barring any indication to the contrary from you, I'm planning to >> go out to the list for beta testers sometime later this week. >> >> Thanks! >> >> --SK >> >> >> >> *** /usr/local/src/cons-1.5/cons.pod Tue Nov 17 14:22:53 1998 >> --- cons.pod Mon Mar 29 15:41:24 1999 >> *************** >> *** 66,71 **** >> --- 66,93 ---- >> into the makefiles. >> >> >> + =item B >> + >> + One generally-accepted technique for speeding up the software development >> + process is to build software components in parallel--that is, start >> + multiple compilations simultaneously. Although the additional context >> + switches usually increase the CPU time used, parallelizing a build greatly >> + decreases the amount of "wall clock" time that the software developer >> + spends waiting for the build to complete. Most modern versions of make >> + support a -j option which is used to specify the number of tasks that >> + make will execute in parallel. Unfortunately, the make -j option is of >> + limited usefulness for builds in large directory trees, where it would be >> + most helpful. The recursive use of make means that even small -j values >> + can threaten to swamp a system, as each recursive make invocation spawns N >> + separate processes for subdirectories which, in turn, execute N separate >> + make processes for their subdirectories... Worse still, a make -j value >> + that works for the directory tree today may still swamp a system tomorrow >> + when someone adds directories to the tree structure. Lacking any way >> + to coordinate the total number of processes used by the entire build, >> + make's parall build support doesn't adapt well to changes either in the >> + build process itself or in the availability of system resources. >> + >> + >> =head1 B >> >> A few of the difficulties with make have been cited above. In this and sub >s >> equent sections, we shall introduce Cons and show how these issues are addre >s >> sed. >> *************** >> *** 916,921 **** >> --- 938,968 ---- >> Cons will search for derived files >> in the appropriate build subdirectories >> under the repository tree. >> + >> + >> + =head1 B >> + >> + Like make, Cons provides a -j flag that takes an argument to specify >> + how many targets can be built in parallel. For example: >> + >> + % cons -j 10 . >> + >> + Will build the entire directory tree, keeping (up to) ten targets building >> + simultaneously in the background. >> + >> + The big difference between parallel builds with Cons and with make is >> + that Cons coordinates, across the entire directory tree, the number of >> + simultaneous targets being built in the background. When you use -j >> + to specify that 10 targets can be built in parallel, Cons will keep 10 >> + targets building in the background, regardless of the subdirectory in >> + which the target resides. This allows for a much greater control over >> + parallel builds and their impact on a system's load than is possible >> + with recursive use of make. >> + >> + >> + =item B >> + >> + T.B.S. >> >> >> =head1 B >> From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 18:21:00 -0500 From: Rajesh Vaidheeswarran To: Bob Sidebotham Cc: Steven Knight Subject: Re: prototype -j code -- using MH template repl.format -- In a previous message, Bob Sidebotham writes: > I wouldn't expect -j 10 (uniprocessor, or not), to be effective. On > a uniprocessor, I found previously (with the old version of cons > and a very slow uniprocessor) that -j 3 was optimal. This is confirmed > by gmake users, who have, in the past, indicated that anything beyond > -j 3 tends to have a negative effect. > All right, I'll do some testing with -j 3, -j 4 etc.. to get to optimal.. > > I notice that you do a -cd every time. Do you also remove the files? Yes I remove my build tree, and start afresh always. rv From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Tue, 30 Mar 1999 19:05:01 -0500 (EST) From: Steven Knight To: Bob Sidebotham Cc: rv@fore.com Subject: Re: prototype -j code > I wouldn't expect -j 10 (uniprocessor, or not), to be effective. On > a uniprocessor, I found previously (with the old version of cons > and a very slow uniprocessor) that -j 3 was optimal. This is confirmed > by gmake users, who have, in the past, indicated that anything beyond > -j 3 tends to have a negative effect. Is that true even for single-directory gmake builds? We had the same experience trying to use gmake -j at my previous job, but always attributed it to the combination of -j and recursive make leading to an exponential explosion of compilation processes. My memory is that individuals building things in single directories on their own workstations got much improved wall-clock time using higher -j values. But maybe I'm making that up due to wishful thinking... Either way, some more hard data here would be helpful. I'll work on creating additional test cases... --SK From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 14:27:00 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code Something is very broken about this cons. It seems like .consigns in various target directories are not written at all, and when cons dies due to an error in compiling one of the targets, it starts building everything all over again. This is also a general problem in cons, that it tries to overwrite a .consign instead of writing a temporary file and moving it to .consign when it is able to *successfully* close the file. When disk space runs out, cons is unable to write the .consign, and thereby causes a complete build all over again. rv From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 15:06:22 -0500 (EST) From: Steven Knight To: Rajesh Vaidheeswarran Cc: Bob Sidebotham Subject: Re: prototype -j code > Something is very broken about this cons. It seems like .consigns in various > target directories are not written at all, and when cons dies due to an > error in compiling one of the targets, it starts building everything all > over again. Good catch. I obviously need to add tests for error conditions during compilation. As it turns out, I think I know why this is happening. I was just trying to track down my exit value and sig::hash::END problems, and found the following on the perlmod manpage: An END subroutine is executed as late as possible, that is, when the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's polymorphing into another program via exec, or being blown out of the water by a signal--you have to trap that yourself (if you can).) You may have multiple END blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO). Inside an END subroutine, $? contains the value that the script is going to pass to exit(). You can modify $? to change the exit value of the script. Beware of changing $? by accident (e.g. by running something via system). The second paragraph explains the exit value problem. The job::END routine was doing a wait(), which changes the value of $?. I think the first paragraph points to the .consign problem, and possibly also the sig::hash::END problem. The cons-p1 code added a second END routine, which introduced an ordering problem: we want to wait for the jobs to finish and collect their status before writing out the .consign files, but Perl's LIFO rule for executing END routines did exactly the opposite. I plan to solve this by using a single main::END routine that explicitly orders the calls to the cleanup routines (not named END) in the other modules. > This is also a general problem in cons, that it tries to overwrite a > .consign instead of writing a temporary file and moving it to .consign when > it is able to *successfully* close the file. > > When disk space runs out, cons is unable to write the .consign, and thereby > causes a complete build all over again. Another good point. As long as I'm in here mucking with things, I'll go ahead and add this. The list is getting long enough now that I may not be able to get another version to you as soon as planned, but I'll keep you posted. One other item: In fixing the -j8 problem (concatenate the argument to the option, I ended up re-coding the option processing to get rid of the long if-elsif chain. I'll append a diff below. If either of you has a reason not to turn it into tables like I did, give a yell. --SK *** /home/software/cons/branch.2/branch.2/baseline/cons.pl Wed Mar 24 04:23:36 1999 --- /home/knight/cons.2.2.C110/cons.pl Tue Mar 30 15:30:18 1999 *************** *** 211,278 **** push (@targets, $_), next; } sub option { ! if ($_ eq 'm') { ! print($version, $cons_history), exit(0); ! } elsif ($_ eq 'v') { ! print($version); ! } elsif ($_ eq 'V') { ! print($version), exit(0); ! } elsif ($_ eq 'o') { ! $param::overfile = shift(@ARGV); ! die("$0: -o option requires a filename argument.\n") if !$param::overfile; ! } elsif ($_ eq 'f') { ! $param::topfile = shift(@ARGV); ! die("$0: -f option requires a filename argument.\n") if !$param::topfile; ! } elsif ($_ eq 'wf') { ! $param::depfile = shift(@ARGV); ! die("$0: -wf option requires a filename argument.\n") if !$param::depfile; ! } elsif ($_ eq 'k') { ! $param::kflag = 1; ! } elsif ($_ eq 'p') { ! $param::pflag = 1; ! $param::build = 0; ! } elsif ($_ eq 'pa') { ! $param::pflag = $param::aflag = 1; ! $param::build = 0; ! $indent = "... "; ! } elsif ($_ eq 'pw') { ! $param::pflag = $param::wflag = 1; ! $param::build = 0; ! } elsif ($_ eq 'r') { ! $param::rflag = 1; ! $param::build = 0; ! } elsif ($_ eq 'h') { ! $param::localhelp =1; ! } elsif ($_ eq 'x') { ! print($usage); ! exit 0; ! } elsif ($_ eq 'd') { ! $param::depends = 1; ! } elsif ($_ eq 'cc') { ! $param::cachecom = 1; ! } elsif ($_ eq 'cd') { ! $param::cachedisable = 1; ! } elsif ($_ eq 'cr') { ! $param::random = 1; ! } elsif ($_ eq 'cs') { ! $param::cachesync = 1; ! } elsif ($_ eq 'R') { ! my($repository) = shift(@ARGV); ! die("$0: -R option requires a repository argument.\n") if !$repository; ! script::Repository($repository); ! } elsif ($_ eq 'j') { ! die("$0: -j not supported (yet?) on WIN32 systems.\n") if $main::_WIN32; ! $param::maxjobs = shift(@ARGV); ! die("$0: -j option requires an argument specifying the maximum number of jobs in parallel.\n") if !$param::maxjobs; ! # We might want to only set jobclass to async if maxjobs > 1. ! # On the other hand, specifying -j 1 would be a good way to ! # check that the async class is working. ! $param::jobclass = 'job::async'; } else { ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n) if $_; } ! } # Process an equate argument (var=val). sub equate { --- 211,269 ---- push (@targets, $_), next; } + my(%opt_tab) = ( + 'cc' => sub { $param::cachecom = 1; }, + 'cd' => sub { $param::cachedisable = 1; }, + 'cr' => sub { $param::random = 1; }, + 'cs' => sub { $param::cachesync = 1; }, + 'd' => sub { $param::depends = 1; }, + 'h' => sub { $param::localhelp =1; }, + 'k' => sub { $param::kflag = 1; }, + 'm' => sub { print($version, $cons_history), exit(0); }, + 'p' => sub { $param::pflag = 1; + $param::build = 0; }, + 'pa' => sub { $param::pflag = $param::aflag = 1; + $indent = "... "; + $param::build = 0; }, + 'pw' => sub { $param::pflag = $param::wflag = 1; + $param::build = 0; }, + 'r' => sub { $param::rflag = 1; + $param::build = 0; }, + 'v' => sub { print($version); }, + 'V' => sub { print($version), exit(0); }, + 'x' => sub { print($usage), exit 0; }, + ); + + my(%opt_arg) = ( + 'f' => sub { $param::topfile = $1; }, + 'j' => sub { die("$0: -j not supported (yet?) on WIN32 systems.\n") + if $main::_WIN32; + $param::maxjobs = $1; }, + 'o' => sub { $param::overfile = $1; }, + 'R' => sub { script::Repository($1); }, + ); + sub option { ! if (defined $opt_tab{$_}) { ! &{$opt_tab{$_}}; } else { ! $_ =~ m/(.)(.*)/; ! if (defined $opt_arg{$1}) { ! if ($2) { ! $_ = $2; ! } else { ! $_ = shift(@ARGV); ! die("$0: -$1 option requires an argument.\n") if ! $_; ! } ! &{$opt_arg{$1}}($_); ! } elsif ($_ eq 'wf') { ! $param::depfile = shift(@ARGV); ! die("$0: -wf option requires a filename argument.\n") if !$param::depfile; ! } else { ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n) if $_; ! } } ! } # Process an equate argument (var=val). sub equate { From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 16:00:37 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code With the new option processing, rv>src/cons-p1 -j 2 cons-p1: unrecognized option "-j". Use -x for a usage message. rv>src/cons-p1 -x cons-p1: unrecognized option "-x". Use -x for a usage message. rv>src/cons-p1 -j2 cons-p1: unrecognized option "-j2". Use -x for a usage message. I see what the problem is ... (defined $opt_tab{$_}) and (defined $opt_arg{$_}) return false, maybe because %opt_tab and %opt_arg are not defined within the correct scope!? When I moved them into the sub option, they worked fine. rv -- using MH template repl.format -- In a previous message, Steven Knight writes: > > Something is very broken about this cons. It seems like .consigns in variou > s > > target directories are not written at all, and when cons dies due to an > > error in compiling one of the targets, it starts building everything all > > over again. > > Good catch. I obviously need to add tests for error conditions > during compilation. > > As it turns out, I think I know why this is happening. I was just > trying to track down my exit value and sig::hash::END problems, > and found the following on the perlmod manpage: > > An END subroutine is executed as late as possible, that > is, when the interpreter is being exited, even if it is > exiting as a result of a die() function. (But not if it's > polymorphing into another program via exec, or being blown > out of the water by a signal--you have to trap that yourself > (if you can).) You may have multiple END blocks within a > file--they will execute in reverse order of definition; > that is: last in, first out (LIFO). > > Inside an END subroutine, $? contains the value that the > script is going to pass to exit(). You can modify $? to > change the exit value of the script. Beware of changing $? > by accident (e.g. by running something via system). > > The second paragraph explains the exit value problem. The job::END > routine was doing a wait(), which changes the value of $?. > > I think the first paragraph points to the .consign problem, and > possibly also the sig::hash::END problem. The cons-p1 code added > a second END routine, which introduced an ordering problem: we > want to wait for the jobs to finish and collect their status before > writing out the .consign files, but Perl's LIFO rule for executing > END routines did exactly the opposite. > > I plan to solve this by using a single main::END routine that > explicitly orders the calls to the cleanup routines (not named END) > in the other modules. > > > This is also a general problem in cons, that it tries to overwrite a > > .consign instead of writing a temporary file and moving it to .consign when > > it is able to *successfully* close the file. > > > > When disk space runs out, cons is unable to write the .consign, and thereby > > causes a complete build all over again. > > Another good point. As long as I'm in here mucking with things, > I'll go ahead and add this. > > The list is getting long enough now that I may not be able to get > another version to you as soon as planned, but I'll keep you posted. > > One other item: In fixing the -j8 problem (concatenate the argument > to the option, I ended up re-coding the option processing to get > rid of the long if-elsif chain. I'll append a diff below. If > either of you has a reason not to turn it into tables like I did, > give a yell. > > --SK > > *** /home/software/cons/branch.2/branch.2/baseline/cons.pl Wed Mar 24 04:2 > 3:36 1999 > --- /home/knight/cons.2.2.C110/cons.pl Tue Mar 30 15:30:18 1999 > *************** > *** 211,278 **** > push (@targets, $_), next; > } > > sub option { > ! if ($_ eq 'm') { > ! print($version, $cons_history), exit(0); > ! } elsif ($_ eq 'v') { > ! print($version); > ! } elsif ($_ eq 'V') { > ! print($version), exit(0); > ! } elsif ($_ eq 'o') { > ! $param::overfile = shift(@ARGV); > ! die("$0: -o option requires a filename argument.\n") if !$param::overfi > le; > ! } elsif ($_ eq 'f') { > ! $param::topfile = shift(@ARGV); > ! die("$0: -f option requires a filename argument.\n") if !$param::topfil > e; > ! } elsif ($_ eq 'wf') { > ! $param::depfile = shift(@ARGV); > ! die("$0: -wf option requires a filename argument.\n") if !$param::de > pfile; > ! } elsif ($_ eq 'k') { > ! $param::kflag = 1; > ! } elsif ($_ eq 'p') { > ! $param::pflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'pa') { > ! $param::pflag = $param::aflag = 1; > ! $param::build = 0; > ! $indent = "... "; > ! } elsif ($_ eq 'pw') { > ! $param::pflag = $param::wflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'r') { > ! $param::rflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'h') { > ! $param::localhelp =1; > ! } elsif ($_ eq 'x') { > ! print($usage); > ! exit 0; > ! } elsif ($_ eq 'd') { > ! $param::depends = 1; > ! } elsif ($_ eq 'cc') { > ! $param::cachecom = 1; > ! } elsif ($_ eq 'cd') { > ! $param::cachedisable = 1; > ! } elsif ($_ eq 'cr') { > ! $param::random = 1; > ! } elsif ($_ eq 'cs') { > ! $param::cachesync = 1; > ! } elsif ($_ eq 'R') { > ! my($repository) = shift(@ARGV); > ! die("$0: -R option requires a repository argument.\n") if !$reposit > ory; > ! script::Repository($repository); > ! } elsif ($_ eq 'j') { > ! die("$0: -j not supported (yet?) on WIN32 systems.\n") if $main > ::_WIN32; > ! $param::maxjobs = shift(@ARGV); > ! die("$0: -j option requires an argument specifying the maximum number o > f jobs in parallel.\n") if !$param::maxjobs; > ! # We might want to only set jobclass to async if maxjobs > 1. > ! # On the other hand, specifying -j 1 would be a good way to > ! # check that the async class is working. > ! $param::jobclass = 'job::async'; > } else { > ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n) if > $_; > } > ! } > > # Process an equate argument (var=val). > sub equate { > --- 211,269 ---- > push (@targets, $_), next; > } > > + my(%opt_tab) = ( > + 'cc' => sub { $param::cachecom = 1; }, > + 'cd' => sub { $param::cachedisable = 1; }, > + 'cr' => sub { $param::random = 1; }, > + 'cs' => sub { $param::cachesync = 1; }, > + 'd' => sub { $param::depends = 1; }, > + 'h' => sub { $param::localhelp =1; }, > + 'k' => sub { $param::kflag = 1; }, > + 'm' => sub { print($version, $cons_history), exit(0); }, > + 'p' => sub { $param::pflag = 1; > + $param::build = 0; }, > + 'pa' => sub { $param::pflag = $param::aflag = 1; > + $indent = "... "; > + $param::build = 0; }, > + 'pw' => sub { $param::pflag = $param::wflag = 1; > + $param::build = 0; }, > + 'r' => sub { $param::rflag = 1; > + $param::build = 0; }, > + 'v' => sub { print($version); }, > + 'V' => sub { print($version), exit(0); }, > + 'x' => sub { print($usage), exit 0; }, > + ); > + > + my(%opt_arg) = ( > + 'f' => sub { $param::topfile = $1; }, > + 'j' => sub { die("$0: -j not supported (yet?) on WIN32 systems.\n") > + if $main::_WIN32; > + $param::maxjobs = $1; }, > + 'o' => sub { $param::overfile = $1; }, > + 'R' => sub { script::Repository($1); }, > + ); > + > sub option { > ! if (defined $opt_tab{$_}) { > ! &{$opt_tab{$_}}; > } else { > ! $_ =~ m/(.)(.*)/; > ! if (defined $opt_arg{$1}) { > ! if ($2) { > ! $_ = $2; > ! } else { > ! $_ = shift(@ARGV); > ! die("$0: -$1 option requires an argument.\n") if ! $_; > ! } > ! &{$opt_arg{$1}}($_); > ! } elsif ($_ eq 'wf') { > ! $param::depfile = shift(@ARGV); > ! die("$0: -wf option requires a filename argument.\n") if !$param::d > epfile; > ! } else { > ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n > ) if $_; > ! } > } > ! } > > # Process an equate argument (var=val). > sub equate { > From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 16:13:06 -0500 (EST) From: Steven Knight To: Rajesh Vaidheeswarran Cc: Bob Sidebotham Subject: Re: prototype -j code > With the new option processing, > > rv>src/cons-p1 -j 2 > cons-p1: unrecognized option "-j". Use -x for a usage message. > rv>src/cons-p1 -x > cons-p1: unrecognized option "-x". Use -x for a usage message. > rv>src/cons-p1 -j2 > cons-p1: unrecognized option "-j2". Use -x for a usage message. > > I see what the problem is ... (defined $opt_tab{$_}) and (defined > $opt_arg{$_}) return false, maybe because %opt_tab and %opt_arg are not > defined within the correct scope!? > > When I moved them into the sub option, they worked fine. Except that there was another problem in what I sent you: In the anonymous subroutines in the %opt_arg hash, the $1 arguments should be @_[0]. (That'll teach me to send out draft code without at least running it once first... :-) --SK From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 16:15:42 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code I just found that out myself ;) rv -- using MH template repl.format -- In a previous message, Steven Knight writes: > > With the new option processing, > > > > rv>src/cons-p1 -j 2 > > cons-p1: unrecognized option "-j". Use -x for a usage message. > > rv>src/cons-p1 -x > > cons-p1: unrecognized option "-x". Use -x for a usage message. > > rv>src/cons-p1 -j2 > > cons-p1: unrecognized option "-j2". Use -x for a usage message. > > > > I see what the problem is ... (defined $opt_tab{$_}) and (defined > > $opt_arg{$_}) return false, maybe because %opt_tab and %opt_arg are not > > defined within the correct scope!? > > > > When I moved them into the sub option, they worked fine. > > Except that there was another problem in what I sent you: In the > anonymous subroutines in the %opt_arg hash, the $1 arguments should > be @_[0]. (That'll teach me to send out draft code without at least > running it once first... :-) > > --SK > From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 16:43:31 -0500 From: Rajesh Vaidheeswarran To: Steven Knight Cc: Bob Sidebotham Subject: Re: prototype -j code Here is what I modified. sub option { my %opt_tab = ( 'cc' => sub { $param::cachecom = 1; }, 'cd' => sub { $param::cachedisable = 1; }, 'cr' => sub { $param::random = 1; }, 'cs' => sub { $param::cachesync = 1; }, 'd' => sub { $param::depends = 1; }, 'h' => sub { $param::localhelp =1; }, 'k' => sub { $param::kflag = 1; }, 'm' => sub { print($version, $cons_history), exit(0); }, 'p' => sub { $param::pflag = 1; $param::build = 0; }, 'pa' => sub { $param::pflag = $param::aflag = 1; $indent = "... "; $param::build = 0; }, 'pw' => sub { $param::pflag = $param::wflag = 1; $param::build = 0; }, 'r' => sub { $param::rflag = 1; $param::build = 0; }, 'v' => sub { print($version); }, 'V' => sub { print($version), exit(0); }, 'x' => sub { print($usage), exit 0; }, ); my %opt_arg = ( 'f' => sub { $param::topfile = shift; }, 'j' => sub { die("$0: -j not supported (yet?) ". "on WIN32 systems.\n") if $main::_WIN32; $param::maxjobs = shift; }, 'o' => sub { $param::overfile = shift; }, 'R' => sub { script::Repository(shift); }, ); if (defined $opt_tab{$_}) { &{$opt_tab{$_}}; } else { $_ =~ m/(.)(.*)/; if (defined $opt_arg{$1}) { if ($2) { $_ = $2; } else { $_ = shift @ARGV; die("$0: -$1 option requires an argument.\n") if ! $_; } &{$opt_arg{$1}}($_); } elsif ($_ eq 'wf') { $param::depfile = shift(@ARGV); die "$0: -wf option requires a filename argument.\n" if !$param::depfile; } else { die "$0: unrecognized option \"-$_\". " . "Use -x for a usage message.\n" if $_; } } } -- using MH template repl.format -- In a previous message, Steven Knight writes: > > Something is very broken about this cons. It seems like .consigns in variou > s > > target directories are not written at all, and when cons dies due to an > > error in compiling one of the targets, it starts building everything all > > over again. > > Good catch. I obviously need to add tests for error conditions > during compilation. > > As it turns out, I think I know why this is happening. I was just > trying to track down my exit value and sig::hash::END problems, > and found the following on the perlmod manpage: > > An END subroutine is executed as late as possible, that > is, when the interpreter is being exited, even if it is > exiting as a result of a die() function. (But not if it's > polymorphing into another program via exec, or being blown > out of the water by a signal--you have to trap that yourself > (if you can).) You may have multiple END blocks within a > file--they will execute in reverse order of definition; > that is: last in, first out (LIFO). > > Inside an END subroutine, $? contains the value that the > script is going to pass to exit(). You can modify $? to > change the exit value of the script. Beware of changing $? > by accident (e.g. by running something via system). > > The second paragraph explains the exit value problem. The job::END > routine was doing a wait(), which changes the value of $?. > > I think the first paragraph points to the .consign problem, and > possibly also the sig::hash::END problem. The cons-p1 code added > a second END routine, which introduced an ordering problem: we > want to wait for the jobs to finish and collect their status before > writing out the .consign files, but Perl's LIFO rule for executing > END routines did exactly the opposite. > > I plan to solve this by using a single main::END routine that > explicitly orders the calls to the cleanup routines (not named END) > in the other modules. > > > This is also a general problem in cons, that it tries to overwrite a > > .consign instead of writing a temporary file and moving it to .consign when > > it is able to *successfully* close the file. > > > > When disk space runs out, cons is unable to write the .consign, and thereby > > causes a complete build all over again. > > Another good point. As long as I'm in here mucking with things, > I'll go ahead and add this. > > The list is getting long enough now that I may not be able to get > another version to you as soon as planned, but I'll keep you posted. > > One other item: In fixing the -j8 problem (concatenate the argument > to the option, I ended up re-coding the option processing to get > rid of the long if-elsif chain. I'll append a diff below. If > either of you has a reason not to turn it into tables like I did, > give a yell. > > --SK > > *** /home/software/cons/branch.2/branch.2/baseline/cons.pl Wed Mar 24 04:2 > 3:36 1999 > --- /home/knight/cons.2.2.C110/cons.pl Tue Mar 30 15:30:18 1999 > *************** > *** 211,278 **** > push (@targets, $_), next; > } > > sub option { > ! if ($_ eq 'm') { > ! print($version, $cons_history), exit(0); > ! } elsif ($_ eq 'v') { > ! print($version); > ! } elsif ($_ eq 'V') { > ! print($version), exit(0); > ! } elsif ($_ eq 'o') { > ! $param::overfile = shift(@ARGV); > ! die("$0: -o option requires a filename argument.\n") if !$param::overfi > le; > ! } elsif ($_ eq 'f') { > ! $param::topfile = shift(@ARGV); > ! die("$0: -f option requires a filename argument.\n") if !$param::topfil > e; > ! } elsif ($_ eq 'wf') { > ! $param::depfile = shift(@ARGV); > ! die("$0: -wf option requires a filename argument.\n") if !$param::de > pfile; > ! } elsif ($_ eq 'k') { > ! $param::kflag = 1; > ! } elsif ($_ eq 'p') { > ! $param::pflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'pa') { > ! $param::pflag = $param::aflag = 1; > ! $param::build = 0; > ! $indent = "... "; > ! } elsif ($_ eq 'pw') { > ! $param::pflag = $param::wflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'r') { > ! $param::rflag = 1; > ! $param::build = 0; > ! } elsif ($_ eq 'h') { > ! $param::localhelp =1; > ! } elsif ($_ eq 'x') { > ! print($usage); > ! exit 0; > ! } elsif ($_ eq 'd') { > ! $param::depends = 1; > ! } elsif ($_ eq 'cc') { > ! $param::cachecom = 1; > ! } elsif ($_ eq 'cd') { > ! $param::cachedisable = 1; > ! } elsif ($_ eq 'cr') { > ! $param::random = 1; > ! } elsif ($_ eq 'cs') { > ! $param::cachesync = 1; > ! } elsif ($_ eq 'R') { > ! my($repository) = shift(@ARGV); > ! die("$0: -R option requires a repository argument.\n") if !$reposit > ory; > ! script::Repository($repository); > ! } elsif ($_ eq 'j') { > ! die("$0: -j not supported (yet?) on WIN32 systems.\n") if $main > ::_WIN32; > ! $param::maxjobs = shift(@ARGV); > ! die("$0: -j option requires an argument specifying the maximum number o > f jobs in parallel.\n") if !$param::maxjobs; > ! # We might want to only set jobclass to async if maxjobs > 1. > ! # On the other hand, specifying -j 1 would be a good way to > ! # check that the async class is working. > ! $param::jobclass = 'job::async'; > } else { > ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n) if > $_; > } > ! } > > # Process an equate argument (var=val). > sub equate { > --- 211,269 ---- > push (@targets, $_), next; > } > > + my(%opt_tab) = ( > + 'cc' => sub { $param::cachecom = 1; }, > + 'cd' => sub { $param::cachedisable = 1; }, > + 'cr' => sub { $param::random = 1; }, > + 'cs' => sub { $param::cachesync = 1; }, > + 'd' => sub { $param::depends = 1; }, > + 'h' => sub { $param::localhelp =1; }, > + 'k' => sub { $param::kflag = 1; }, > + 'm' => sub { print($version, $cons_history), exit(0); }, > + 'p' => sub { $param::pflag = 1; > + $param::build = 0; }, > + 'pa' => sub { $param::pflag = $param::aflag = 1; > + $indent = "... "; > + $param::build = 0; }, > + 'pw' => sub { $param::pflag = $param::wflag = 1; > + $param::build = 0; }, > + 'r' => sub { $param::rflag = 1; > + $param::build = 0; }, > + 'v' => sub { print($version); }, > + 'V' => sub { print($version), exit(0); }, > + 'x' => sub { print($usage), exit 0; }, > + ); > + > + my(%opt_arg) = ( > + 'f' => sub { $param::topfile = $1; }, > + 'j' => sub { die("$0: -j not supported (yet?) on WIN32 systems.\n") > + if $main::_WIN32; > + $param::maxjobs = $1; }, > + 'o' => sub { $param::overfile = $1; }, > + 'R' => sub { script::Repository($1); }, > + ); > + > sub option { > ! if (defined $opt_tab{$_}) { > ! &{$opt_tab{$_}}; > } else { > ! $_ =~ m/(.)(.*)/; > ! if (defined $opt_arg{$1}) { > ! if ($2) { > ! $_ = $2; > ! } else { > ! $_ = shift(@ARGV); > ! die("$0: -$1 option requires an argument.\n") if ! $_; > ! } > ! &{$opt_arg{$1}}($_); > ! } elsif ($_ eq 'wf') { > ! $param::depfile = shift(@ARGV); > ! die("$0: -wf option requires a filename argument.\n") if !$param::d > epfile; > ! } else { > ! die qq($0: unrecognized option "-$_". Use -x for a usage message.\n > ) if $_; > ! } > } > ! } > > # Process an equate argument (var=val). > sub equate { > From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Wed, 31 Mar 1999 23:05:33 -0500 From: Bob Sidebotham To: Steven Knight Cc: Rajesh Vaidheeswarran , Bob Sidebotham Subject: Re: prototype -j code I've been playing a little with the -j version of cons (which I picked up from rj's home directory; it's a little inconvenient to pick anything up from a web site, since it goes to my home machine, rather than to work). Anyway, some observations: 1. On a three processor machine, the best I've been able to get on a large build is about the equivalent of 2.5 processors. 2. The time command in tcsh is broken--it seems to multiply real time by the number of processors, or something approximating that, and then always displays 100% (rather than 200% or 250%). 3. The -j architecture is a little broken, in that it's not truly parallel--it will only build immediate dependencies in parallel. In particular, if you have a bunch of .c files that are generated from, say, .foo files, then you get very little parallelization. A bunch of .c files, by themselves, on the other hand, will build in parallel to create a library. A *good* aspect of this, which is more-or-less accidental, is that you typically end up with only a single ld command executing at once. This is "good" because ld can use so much memory (but then again, so perhaps can C++). Now I'd *rather* be able to control this by specifically declaring that I don't want to run lots of ld commands at once... 4. After building with -j, then rebuilding without -j, in one build, everything rebuilt, in another very little rebuilt (these were just different variants of each other; I have no idea what happened). I recall someone saying that .consign files were broken, in some way. Is that the problem? 5. For some future version of Cons, I think it would be good to use Perl threads to build a truly parallelized cons. What would this look like? I had figured it out a year or so ago, but I'd have to reconstruct it... (there's a way to do this without sacrificing the recursive descent nature of the build process, as I recall). Bob >Rajesh-- > >After the last round of discussion about adding -j support to Cons, >I went ahead and asked Bob for his prototype -j code from a few >years back, and I've now ported it to the 1.5 base. There's one >additional feature I'd like to add, but I want to get some feedback >on what I currently have before going any further. > >The new -j code works for my test cases. Because the code relies >on fork()/exec(), which don't exist for Perl on Win32 systems, -j >is *not* supported on NT. I'd welcome advice from Perl/Win32 gurus >on how (or whether) -j support can be extended to those systems. > >There are two problems I can't eradicate, one which probably already >existed in certain circumstances (complaints about sig::hash::END >not being defined), and one of which may be new but may not (improper >exit value of 0 even when both the debugger and print show that >$errors is non-zero). Neither of these problems seems to affect >how Cons actually works--that is, everything seems to get built >properly. > >I was planning to go to the mailing list to seek beta testers >who are willing to: > > -- Test out the new code, with and without -j; > > -- Report results to me; > > -- Especially let me know if the -j code slows down single- > threaded builds (without -j); > > -- Give me feedback on the draft cons.pod text covering > -j and parallel builds. > >Before doing so, though, I wanted to sync up with you and get your >input. The current version of the parallelizing version of cons >is available to you at: > > http://www.baldmt.com/cons-test/cons-p1 > >I'd appreciate it if you'd be able to give it at least a quick >sanity check at Fore, and give me any other feedback you may have. >I've appended a diff with my draft cons.pod text, below. > >Oh, yeah, as you might gather from the text below, the feature I'm >planning to add is the ability to let people specify in the >environment that a given target should be built in the foreground >(single-threaded). The intent is to allow people who have >constructions that need to read something from STDIN when building >a target to still use -j to build the rest of their tree in parallel. > >Barring any indication to the contrary from you, I'm planning to >go out to the list for beta testers sometime later this week. > >Thanks! > > --SK > > > >*** /usr/local/src/cons-1.5/cons.pod Tue Nov 17 14:22:53 1998 >--- cons.pod Mon Mar 29 15:41:24 1999 >*************** >*** 66,71 **** >--- 66,93 ---- > into the makefiles. > > >+ =item B >+ >+ One generally-accepted technique for speeding up the software development >+ process is to build software components in parallel--that is, start >+ multiple compilations simultaneously. Although the additional context >+ switches usually increase the CPU time used, parallelizing a build greatly >+ decreases the amount of "wall clock" time that the software developer >+ spends waiting for the build to complete. Most modern versions of make >+ support a -j option which is used to specify the number of tasks that >+ make will execute in parallel. Unfortunately, the make -j option is of >+ limited usefulness for builds in large directory trees, where it would be >+ most helpful. The recursive use of make means that even small -j values >+ can threaten to swamp a system, as each recursive make invocation spawns N >+ separate processes for subdirectories which, in turn, execute N separate >+ make processes for their subdirectories... Worse still, a make -j value >+ that works for the directory tree today may still swamp a system tomorrow >+ when someone adds directories to the tree structure. Lacking any way >+ to coordinate the total number of processes used by the entire build, >+ make's parall build support doesn't adapt well to changes either in the >+ build process itself or in the availability of system resources. >+ >+ > =head1 B > > A few of the difficulties with make have been cited above. In this and subse >quent sections, we shall introduce Cons and show how these issues are addresse >d. >*************** >*** 916,921 **** >--- 938,968 ---- > Cons will search for derived files > in the appropriate build subdirectories > under the repository tree. >+ >+ >+ =head1 B >+ >+ Like make, Cons provides a -j flag that takes an argument to specify >+ how many targets can be built in parallel. For example: >+ >+ % cons -j 10 . >+ >+ Will build the entire directory tree, keeping (up to) ten targets building >+ simultaneously in the background. >+ >+ The big difference between parallel builds with Cons and with make is >+ that Cons coordinates, across the entire directory tree, the number of >+ simultaneous targets being built in the background. When you use -j >+ to specify that 10 targets can be built in parallel, Cons will keep 10 >+ targets building in the background, regardless of the subdirectory in >+ which the target resides. This allows for a much greater control over >+ parallel builds and their impact on a system's load than is possible >+ with recursive use of make. >+ >+ >+ =item B >+ >+ T.B.S. > > > =head1 B From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Thu, 1 Apr 1999 00:20:58 -0500 (EST) From: Steven Knight To: Bob Sidebotham Cc: Rajesh Vaidheeswarran Subject: Re: prototype -j code Hi Bob-- > I've been playing a little with the -j version of cons (which I picked > up from rj's home directory; it's a little inconvenient to pick > anything up from a web site, since it goes to my home machine, rather > than to work). I'll be sure to attach copies directly to your email in the future. > 3. The -j architecture is a little broken, in that it's not truly > parallel--it will only build immediate dependencies in parallel. In > particular, if you have a bunch of .c files that are generated from, > say, .foo files, then you get very little parallelization. Aha! Yes. The top-level main::enumerate() method successfully descends the dependency tree and compiles a list of all targets that need building, but then it still starts generating each target in-order. Any chains of dependencies necessary to create one target then get descended serially by file::fstart() (formerly file::_build()), because you have to wait until your dependencies are built before you build yourself. Duh. The proper architecture would be to have main::enumerate() create its list already in proper dependency-order. (Not terribly difficult.) Then, as we run through the list starting jobs, anything on which a given target depends should be in one of three states: 1) already finished building; 2) already failed its build (in which case we got to the current target because -k was in force); 3) in the process of being built at the moment. Case 1 is trivial: since the dependency is built, we go ahead and build the target. Case 2 is easy: propogate the error status back up the chain. Case 3 is the interesting one: What we want to do then is queue the blocked target until its dependency finishes, but in the meantime continue to look for other targets to build that aren't waiting on dependencies. This really would mean pulling apart the logic in file::fstart() so that it doesn't do the descent, because that's the source of the serial execution of the dependencies. This will take a little thought. > A bunch of > .c files, by themselves, on the other hand, will build in parallel to > create a library. A *good* aspect of this, which is more-or-less > accidental, is that you typically end up with only a single ld command > executing at once. This is "good" because ld can use so much memory > (but then again, so perhaps can C++). Now I'd *rather* be able to > control this by specifically declaring that I don't want to run lots > of ld commands at once... This is where I think we need a flag in a build object (environment) that specifies "build this target in the foreground." This would give the Cons user explicit control. Actually, what I just mentioned isn't quite the same as what you describe. Maybe we need two flags, one that says, "stop everything else and build this target in the foreground," and one that says, "only one target for this build object can be built at any one time." By creating an ld-specific build environment with the latter flag set, you could still allow other non-ld tasks to take place in parallel while any one ld is running, for example. > 4. After building with -j, then rebuilding without -j, in one build, > everything rebuilt, in another very little rebuilt (these were just > different variants of each other; I have no idea what happened). I > recall someone saying that .consign files were broken, in some way. Is > that the problem? Yes, that would be the one. Actually, I think it's partially the .consign issue I mentioned earlier (out-of-order END routines), and partially because of certain signature values getting lost completely at the ends of compilations. The job-reaping code must be called down through file::fwait() so that the build status (and therefore the signature) can get propogated up to the file object. This wouldn't happen for any tasks that were still going on in the background when Cons processed its last target. The cleanup code was leaping right into job::END(), which happily reaped children but could do nothing useful with the build status returned from the processes. > 5. For some future version of Cons, I think it would be good to use > Perl threads to build a truly parallelized cons. What would this look > like? I had figured it out a year or so ago, but I'd have to > reconstruct it... (there's a way to do this without sacrificing the > recursive descent nature of the build process, as I recall). I haven't used Perl threads yet, so any info you can supply (pointer to documentation or tutorials?) will be news to me. --SK From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Thu, 8 Apr 1999 08:11:13 -0400 (EDT) From: Steven Knight To: Bob Sidebotham , Rajesh Vaidheeswarran Subject: -j update, sig::hash::END, and directories as targets Bob, Rajesh-- First: I'm still thinking through how to rework the -j architecture to accomodate what we found last week. Unfortunately, I don't have as much time this week or next, so it's on the back burner, but rest assured that it's still simmering and I should be able to get back to it in earnest before *too* long. Second: In the meantime, I wrote a more robust sig::hash::END that should be more graceful in writing .consign files in the face of full file systems (or other errors). Code is appended below. Tests pass. Let me know if your mileage varies. Third: You can specify a directory name as a "target" on the command line and have Cons build everything under that subdirectory. You can not, however, specify a subdirectory as a dependency and have Cons build all the targets therein. What I have in mind is being able to do something like the following for packaging software: $package = 'software-1.5'; Command $env "$package.tar", "$package", qq( tar -cf %> $package ); Install $env "$package", 'program'; This isn't something recently introduced, as trying it under both 1.5 and 1.3.1 generate errors: $ cons-1.5 Can't locate object method "source_exists" via package "dir" at /usr/local/bin/cons-1.5 line 1922. $ cons-1.3.1 Can't locate object method "srcpath" via package "dir" at /usr/local/bin/cons-1.3.1 line 1675, chunk 82. $ A quick look at the code indicates that some of the top-level logic that handles subdirectory descent (by recursively enumerating targets) would have to move, probably into the "dir" package. Thoughts? --SK *** cons-1.5 Thu Apr 8 00:44:01 1999 --- cons.N Wed Apr 7 00:04:34 1999 *************** *** 2149,2162 **** return if $called++; # May be called twice. close(CONSIGN); # in case this came in via ^C. for $dir (values %sig::hash::dirty) { ! $consign = $dir->prefix . ".consign"; ! unlink($consign); ! open(CONSIGN, ">$consign") || die("$0: can't create $consign ($!)\n"); ! my($entry, $sig); ! while(($entry,$sig) = each %{$dir->{consign}}) { ! print CONSIGN ("$entry:$sig\n"); ! } ! close(CONSIGN); } } --- 2149,2173 ---- return if $called++; # May be called twice. close(CONSIGN); # in case this came in via ^C. for $dir (values %sig::hash::dirty) { ! my($consign) = $dir->prefix . ".consign"; ! my($constemp) = $consign . ".$$"; ! if (! open(CONSIGN, ">$constemp")) { ! die("$0: can't create $constemp ($!)\n"); ! } ! my($entry, $sig); ! while (($entry, $sig) = each %{$dir->{consign}}) { ! if (! print CONSIGN "$entry:$sig\n") { ! die("$0: error writing to $constemp ($!)\n"); ! } ! } ! close(CONSIGN); ! if (! rename($constemp, $consign)) { ! if (futil::copy($constemp, $consign)) { ! unlink($constemp); ! } else { ! die("$0: could not rename or copy $constemp to $consign ($!)\n"); ! } ! } } } From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Fri, 23 Apr 1999 12:45:01 -0400 (EDT) From: Steven Knight To: Bob Sidebotham , Rajesh Vaidheeswarran Subject: cons-p2 Bob, Rajesh-- Attached is cons-p2. In case the attachment doesn't come through, it's also available at: http://www.baldmt.com/cons-test/cons-p2 I spent most of my Cons time during the past few weeks working on expanding the test cases to cover the stuff you both found in cons-p1, and then fixing cons-p2 so it would pass those new tests. Most of this was really straight cons stuff that cons-test wasn't already covering: handling build errors properly, UseCache and its options, Salt... Despite our last discussion, it turns out that I don't think we need to change the -j architecture. I have a test that builds two libraries from two .c files each, and all four compiles get kicked off in parallel. If you have a counter-example, I'd like to see it so I can add the configuration to my tests. (I would, of course, be glad to make the new tests available to you, if it would help.) cons-p2 does not have any of the synchronization/foreground build stuff that we discussed. I'd like to get a sanity check before tackling that. I also haven't tested it on Windows NT. An overview of the changes is below. Please give this a quick (or not-so-quick) test and let me know if you spot any problems. --SK Summary of the changes beween cons-p1 and cons-p2: -- Bumped version to p2 from p1, of course. -- New Cons home page URL (courtesy Rajesh) -- Rajesh's white space cleanup. -- Table-driven argument parsing that allows "-j8" concatenation of arguments (previously sent to you). -- Changed exit methodology: sig::hash::END and job::END have been changed to sig::hash::on_exit and job::on_exit, and they're now called in appropriate order by a single main::END routine. -- Passed the proper signature ($self->{started}, not $self->{sig}) to build::command::cachout. This was what busted derived-file caching, and part of what caused stuff to be rebuilt unnecessarily. -- Biggest change: Made bwait/jwait routines responsible for setting {status} by calling a new file::set_status routine. This fixes the problem with lost error status causing unnecessary rebuilds (as well as being slightly more efficient) because the reaping routines now set status more-or-less directly, instead of passing it back up the calling chain for the benefit of file::fwait. -- Slight cleanup and optimization to the interface between build::command::bstart and job::new/job::jstart. (Got rid of repeated calls to job->command to populate the command array by just passing the results from build::command::getcoms to job::new.) -- More robust flush-to-.consign logic, previously sent. -- When using UseCache, now does a mkdir of the single-letter cache subdirectory if it doesn't already exist. [ Part 2, "" Text/PLAIN (Name: "cons-p2") 1,882 lines. ] [ Unable to print this part. ] From rns@fore.com Fri Feb 4 06:55:17 2000 Date: Fri, 23 Apr 1999 12:59:09 -0400 From: Bob Sidebotham To: Steven Knight Cc: Bob Sidebotham , Rajesh Vaidheeswarran Subject: Re: cons-p2 I'm afraid I won't be able to test this. I'm leaving, and will be on the road for a while (driving to Vancouver from Pittsburgh). Please send future mail to: bob_sidebotham@yahoo.com Anyway, the changes sound great. Thanks to both of you for all your great contributions. Bob You said: > This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > Send mail to mime@docserver.cac.washington.edu for more info. > >--0-2002623338-924885901=:4037 >Content-Type: TEXT/PLAIN; charset=US-ASCII > >Bob, Rajesh-- > >Attached is cons-p2. In case the attachment doesn't come through, >it's also available at: > > http://www.baldmt.com/cons-test/cons-p2 > >I spent most of my Cons time during the past few weeks working on >expanding the test cases to cover the stuff you both found in >cons-p1, and then fixing cons-p2 so it would pass those new tests. >Most of this was really straight cons stuff that cons-test wasn't >already covering: handling build errors properly, UseCache and >its options, Salt... > >Despite our last discussion, it turns out that I don't think we >need to change the -j architecture. I have a test that builds two >libraries from two .c files each, and all four compiles get kicked >off in parallel. If you have a counter-example, I'd like to see >it so I can add the configuration to my tests. (I would, of course, >be glad to make the new tests available to you, if it would help.) > >cons-p2 does not have any of the synchronization/foreground build >stuff that we discussed. I'd like to get a sanity check before >tackling that. I also haven't tested it on Windows NT. > >An overview of the changes is below. > >Please give this a quick (or not-so-quick) test and let me know >if you spot any problems. > > --SK > > >Summary of the changes beween cons-p1 and cons-p2: > > -- Bumped version to p2 from p1, of course. > > -- New Cons home page URL (courtesy Rajesh) > > -- Rajesh's white space cleanup. > > -- Table-driven argument parsing that allows "-j8" concatenation > of arguments (previously sent to you). > > -- Changed exit methodology: sig::hash::END and job::END have > been changed to sig::hash::on_exit and job::on_exit, and > they're now called in appropriate order by a single main::END > routine. > > -- Passed the proper signature ($self->{started}, not > $self->{sig}) to build::command::cachout. This was what > busted derived-file caching, and part of what caused stuff > to be rebuilt unnecessarily. > > -- Biggest change: Made bwait/jwait routines responsible for > setting {status} by calling a new file::set_status routine. > This fixes the problem with lost error status causing > unnecessary rebuilds (as well as being slightly more > efficient) because the reaping routines now set status > more-or-less directly, instead of passing it back up the > calling chain for the benefit of file::fwait. > > -- Slight cleanup and optimization to the interface between > build::command::bstart and job::new/job::jstart. (Got rid > of repeated calls to job->command to populate the command > array by just passing the results from build::command::getcoms > to job::new.) > > -- More robust flush-to-.consign logic, previously sent. > > -- When using UseCache, now does a mkdir of the single-letter > cache subdirectory if it doesn't already exist. > [ATTACHMENT DELETED] I'm afraid I won't be able to test this. I'm leaving, and will be on the road for a while (driving to Vancouver from Pittsburgh). Please send future mail to: bob_sidebotham@yahoo.com Anyway, the changes sound great. Thanks to both of you for all your great contributions. Bob You said: > This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > Send mail to mime@docserver.cac.washington.edu for more info. > >--0-2002623338-924885901=:4037 >Content-Type: TEXT/PLAIN; charset=US-ASCII > >Bob, Rajesh-- > >Attached is cons-p2. In case the attachment doesn't come through, >it's also available at: > > http://www.baldmt.com/cons-test/cons-p2 > >I spent most of my Cons time during the past few weeks working on >expanding the test cases to cover the stuff you both found in >cons-p1, and then fixing cons-p2 so it would pass those new tests. >Most of this was really straight cons stuff that cons-test wasn't >already covering: handling build errors properly, UseCache and >its options, Salt... > >Despite our last discussion, it turns out that I don't think we >need to change the -j architecture. I have a test that builds two >libraries from two .c files each, and all four compiles get kicked >off in parallel. If you have a counter-example, I'd like to see >it so I can add the configuration to my tests. (I would, of course, >be glad to make the new tests available to you, if it would help.) > >cons-p2 does not have any of the synchronization/foreground build >stuff that we discussed. I'd like to get a sanity check before >tackling that. I also haven't tested it on Windows NT. > >An overview of the changes is below. > >Please give this a quick (or not-so-quick) test and let me know >if you spot any problems. > > --SK > > >Summary of the changes beween cons-p1 and cons-p2: > > -- Bumped version to p2 from p1, of course. > > -- New Cons home page URL (courtesy Rajesh) > > -- Rajesh's white space cleanup. > > -- Table-driven argument parsing that allows "-j8" concatenation > of arguments (previously sent to you). > > -- Changed exit methodology: sig::hash::END and job::END have > been changed to sig::hash::on_exit and job::on_exit, and > they're now called in appropriate order by a single main::END > routine. > > -- Passed the proper signature ($self->{started}, not > $self->{sig}) to build::command::cachout. This was what > busted derived-file caching, and part of what caused stuff > to be rebuilt unnecessarily. > > -- Biggest change: Made bwait/jwait routines responsible for > setting {status} by calling a new file::set_status routine. > This fixes the problem with lost error status causing > unnecessary rebuilds (as well as being slightly more > efficient) because the reaping routines now set status > more-or-less directly, instead of passing it back up the > calling chain for the benefit of file::fwait. > > -- Slight cleanup and optimization to the interface between > build::command::bstart and job::new/job::jstart. (Got rid > of repeated calls to job->command to populate the command > array by just passing the results from build::command::getcoms > to job::new.) > > -- More robust flush-to-.consign logic, previously sent. > > -- When using UseCache, now does a mkdir of the single-letter > cache subdirectory if it doesn't already exist. [ATTACHMENT DELETED] From rv@fore.com Fri Feb 4 06:55:17 2000 Date: Fri, 23 Apr 1999 13:29:05 -0400 From: Rajesh Vaidheeswarran To: Steven Knight Cc: bob_sidebotham@yahoo.com, Rajesh Vaidheeswarran Subject: Re: cons-p2 Steve, Could you base this off of 1.6a1 on the cons site? I have not sent out a notice regarding the new alpha release, but it would be nice to get new versions/diffs off of 1.6a1. Thanks rv -- using MH template repl.format -- In a previous message, Steven Knight writes: > This message is in MIME format. The first part should be readable text, > while the remaining parts are likely unreadable without MIME-aware tools. > Send mail to mime@docserver.cac.washington.edu for more info. > > --0-2002623338-924885901=:4037 > Content-Type: TEXT/PLAIN; charset=US-ASCII > > Bob, Rajesh-- > > Attached is cons-p2. In case the attachment doesn't come through, > it's also available at: > > http://www.baldmt.com/cons-test/cons-p2 > > I spent most of my Cons time during the past few weeks working on > expanding the test cases to cover the stuff you both found in > cons-p1, and then fixing cons-p2 so it would pass those new tests. > Most of this was really straight cons stuff that cons-test wasn't > already covering: handling build errors properly, UseCache and > its options, Salt... > > Despite our last discussion, it turns out that I don't think we > need to change the -j architecture. I have a test that builds two > libraries from two .c files each, and all four compiles get kicked > off in parallel. If you have a counter-example, I'd like to see > it so I can add the configuration to my tests. (I would, of course, > be glad to make the new tests available to you, if it would help.) > > cons-p2 does not have any of the synchronization/foreground build > stuff that we discussed. I'd like to get a sanity check before > tackling that. I also haven't tested it on Windows NT. > > An overview of the changes is below. > > Please give this a quick (or not-so-quick) test and let me know > if you spot any problems. > > --SK > > > Summary of the changes beween cons-p1 and cons-p2: > > -- Bumped version to p2 from p1, of course. > > -- New Cons home page URL (courtesy Rajesh) > > -- Rajesh's white space cleanup. > > -- Table-driven argument parsing that allows "-j8" concatenation > of arguments (previously sent to you). > > -- Changed exit methodology: sig::hash::END and job::END have > been changed to sig::hash::on_exit and job::on_exit, and > they're now called in appropriate order by a single main::END > routine. > > -- Passed the proper signature ($self->{started}, not > $self->{sig}) to build::command::cachout. This was what > busted derived-file caching, and part of what caused stuff > to be rebuilt unnecessarily. > > -- Biggest change: Made bwait/jwait routines responsible for > setting {status} by calling a new file::set_status routine. > This fixes the problem with lost error status causing > unnecessary rebuilds (as well as being slightly more > efficient) because the reaping routines now set status > more-or-less directly, instead of passing it back up the > calling chain for the benefit of file::fwait. > > -- Slight cleanup and optimization to the interface between > build::command::bstart and job::new/job::jstart. (Got rid > of repeated calls to job->command to populate the command > array by just passing the results from build::command::getcoms > to job::new.) > > -- More robust flush-to-.consign logic, previously sent. > > -- When using UseCache, now does a mkdir of the single-letter > cache subdirectory if it doesn't already exist. From knight@baldmt.com Fri Feb 4 06:55:17 2000 Date: Fri, 23 Apr 1999 14:41:23 -0400 (EDT) From: Steven Knight To: Rajesh Vaidheeswarran Cc: bob_sidebotham@yahoo.com Subject: Re: cons-p2 > Could you base this off of 1.6a1 on the cons site? I have not sent out a > notice regarding the new alpha release, but it would be nice to get new > versions/diffs off of 1.6a1. Will do. I didn't know 1.6a1 was out there. I'll integrate my changes based on 1.6a1 and let you know when cons-p3 is available. It turns out I want to take another look at performance issues, anyway, so don't expect cons-p3 until sometime next week. Thanks. --SK